Stability AI, a world leader in generative AI models like Stable Diffusion and Stable Cascade, has announced a new image-to-video 3D model called Stable Video 3D (SV3D). This model allows for high-quality 3D modeling and generation directly from single images. Let’s get into the details!
Table of contents
- What is Stable Video 3D (SV3D)?
- Motivation Behind Stable Video 3D
- Example 3D Videos Generated by SV3D
- Stable Video 3D (SV3D) Models
- Performance Evaluation: SV3D vs Prior Methods
- 3D Object Reconstruction Using SV3D
- Download: Models Available on Hugging Face
- Stable Video 3D Usage Instructions
- Last But Not Least
What is Stable Video 3D (SV3D)?
SV3D is a generative image-to-video model based on Stable Video Diffusion. It takes a single image as input and generates a sequence of novel multi-view images resembling an orbital video around the 3D object in the input image.
It leverages the generalization capabilities of latent diffusion models for image-to-video generation along with explicit camera control, allowing for photo-realistic novel view synthesis.
Additionally, SV3D was trained on the large-scale Objaverse 3D dataset to generate 21 frames at a resolution of 576×576. It has demonstrated state-of-the-art performance on tasks of novel view synthesis as well as 3D object reconstruction from single images.
Motivation Behind Stable Video 3D
Single image 3D reconstruction is a long-standing challenge in computer vision with wide applications. Though recent AI techniques have made progress, existing novel view synthesis methods have limitations like inconsistent views, limited viewpoint generation and artifacts in 3D outputs. SV3D addresses these gaps through its unique architecture and training methodology.
Example 3D Videos Generated by SV3D
Stable Video 3D (SV3D) Models
Stability AI trained three image-to-video models using SV3D:
1. SV3D_u
It is the unconditioned model that generates a static orbit video given only a single input image without any camera pose information. This helps evaluate the performance of SV3D without explicit pose conditioning.
2. SV3D_c
It is the pose-conditioned model that generates dynamic orbit videos conditioned on both the input image and a sequence of camera poses, defining the orbital trajectory. This introduces explicit controllability via camera poses.
3. SV3D_p
It is the progressively trained model in which Stability AI first fine-tuned SVD to generate static orbit videos unconditionally, followed by further fine-tuning on dynamic orbits with camera pose conditioning. This gradual increase in task complexity leads to the best-performing model, as per experimental results.
As of now, only two variants have been released by Stability AI: SV3D_u and SV3D_p.
Performance Evaluation: SV3D vs Prior Methods
Stable Video 3D was compared against state-of-the-art methods like SyncDreamer, EscherNet and Free3D, which employ image/video conditioning. SV3D achieves superior results compared to existing methods on metrics like LPIPS, PSNR, and CLIP score for novel view synthesis on real datasets containing objects and scenes.
SV3D’s meshes contained most geometric and texture details while remaining faithful to input images and consistent across views.
Additionally, user studies also showed a strong preference for SV3D outputs over baselines for real images.
Quantitative comparisons also show that all models achieved better scores on the 2D and 3D metrics compared to previous methods. Moreover, prior methods generated views at 256×256, while SV3D operated at a higher 576×576, evaluating its ability to capture finer details. SV3D_p achieves the top performance among SV3D models.
3D Object Reconstruction Using SV3D
Additionally, its multi-views serve as powerful guidance for 3D optimization using NeRF and DMTet representations in a coarse-to-fine manner. Masked Score Distillation Sampling loss provides additional constraints. This approach produces high-quality 3D meshes directly from 2D images, surpassing prior works.
Download: Models Available on Hugging Face
Stable Video 3D is publicly available on Hugging Face. However, you must accept the conditions to access the sv3d_p and sv3d_u models. After that, you can download these model weights for non-commercial use. For commercial purposes, you can use the Stable Video 3D models with Stability AI Membership.
Download Links For SV3D models:
Stable Video 3D Usage Instructions
For instructions to use SV3D, please visit the Stability AI generative models GitHub repository at https://github.com/Stability-AI/generative-models. It contains information on how to run both the SV3D_u and SV3D_p models on single image inputs to generate orbital videos.
Last But Not Least
Stable Video 3D opens up many possibilities, such as virtual/augmented reality and content creation from single images. Innovating on video diffusion models, SV3D establishes new state-of-the-art in quality novel view synthesis and 3D object reconstruction from single images. Stability AI will continue enhancing SV3D and using it for other generative tasks like image animation and manipulation. For more details on this model, please visit project page and technical report.
| Related:
- TripoSR by Stability AI: The Fastest Way to Generate 3D Objects from Single Images in Seconds
- AI at Your Fingertips: Transform Ideas into 3D Art with Stable Zero123
- Texture your 3D Models for Free with Stable Projectorz and Automatic1111 Stable Diffusion AI
| Also Read Latest From Us
- Forget Towers: Verizon and AST SpaceMobile Are Launching Cellular Service From Space

- This $1,600 Graphics Card Can Now Run $30,000 AI Models, Thanks to Huawei

- The Global AI Safety Train Leaves the Station: Is the U.S. Already Too Late?

- The AI Breakthrough That Solves Sparse Data: Meet the Interpolating Neural Network

- The AI Advantage: Why Defenders Must Adopt Claude to Secure Digital Infrastructure







