Digital Product Studio

NVIDIA Launches Cosmos, A Family of Open-Source Video Generation Models Trained on 20 Million Hours Video Data

NVIDIA has recently introduced the Cosmos platform, an initiative aimed at advancing the field of physical artificial intelligence (AI). NVIDIA Cosmos is a robust platform that integrates advanced generative world foundation models, sophisticated tokenizers, and accelerated data processing pipelines. With an extensive library of models trained on 20 million hours of driving and robotics video data, NVIDIA Cosmos offers a unique opportunity for developers to create AI systems. The platform’s physics-aware video models are trained using 9,000 trillion tokens, allowing for the generation of high-quality videos from various multimodal inputs.

NVIDIA Cosmos World Foundation Models

The first wave of NVIDIA Cosmos has introduced an array of pre-trained models designed to generate physics-aware videos and world states. These models are openly available to developers to facilitate physical AI development. NVIDIA Cosmos includes guardrails to filter out unsafe content and harmful prompts within generated outputs. These safety measures include blurring human faces, implementing post-generation guards to remove questionable scenarios, and applying digital watermarks on synthetic videos generated from NVIDIA NIM™ microservices. This ensures that the content produced is both safe and reliable.

Types of NVIDIA Cosmos Models

1. Autoregressive Models

These models predict future frames in a video sequence, using temporal dependencies to generate coherent and realistic motion. They include:

2. Diffusion Models

These models create videos by progressively refining random noise into coherent video frames through iterative denoising guided by learned temporal and spatial patterns. They include:

3. Workflow Enablers

These are the essential models that simplify the development and deployment of world models in physical AI applications. They include:

All the above models are available for download from NGC or Hugging Face.

Fine-Tuned Samples

Among the models, there are fine-tuned options such as Cosmos-1.0-Diffusion-7B-Text2World-Sample-MultiviewDriving, which is specifically fine-tuned for AV multi-sensor driving views and will be available soon.

Use Cases for NVIDIA Cosmos

Developers across various industries leverage NVIDIA Cosmos to enhance their projects and advance the capabilities of their physical AI systems.

1. Video Search and Dataset Creation

Cosmos facilitates the creation of bespoke datasets for AI model training. By understanding spatial and temporal patterns in video data, developers can efficiently tag and search for relevant footage. This capability is particularly beneficial for self-driving cars and robotics, where high-quality training data is critical for success.

NVIDIA Launches Cosmos, A Family of Open-Source Video Generation Models Trained on 20 Million Hours Video Data

2. Synthetic Data Generation

Using NVIDIA Omniverse, developers can generate photorealistic synthetic videos from 3D simulation data. This process allows for the creation of highly tailored datasets, ensuring that AI models are trained on scenarios that closely mimic real-world conditions. The ability to control the output based on 3D scenes enhances the relevance and accuracy of the training data.

NVIDIA Launches Cosmos, A Family of Open-Source Video Generation Models Trained on 20 Million Hours Video Data

3. Policy Model Training and Evaluation

NVIDIA Cosmos offers models fine-tuned for action-conditioned video prediction, enabling scalable training and evaluation of policy models. These models define strategies for physical AI systems, optimizing performance and ensuring reliability in real-world applications. By reducing reliance on risky real-world tests, developers can create safer and more effective AI solutions.

NVIDIA Launches Cosmos, A Family of Open-Source Video Generation Models Trained on 20 Million Hours Video Data

4. Advanced Predictive Intelligence

The foresight capabilities of NVIDIA Cosmos enable physical AI systems to anticipate future scenarios and make informed decisions. By generating predictive videos based on historical data and text prompts, developers can enhance the adaptability and safety of their AI applications in dynamic environments.

NVIDIA Launches Cosmos, A Family of Open-Source Video Generation Models Trained on 20 Million Hours Video Data

5. Multiverse Simulation

Through NVIDIA Omniverse, developers can explore multiple outcomes in real time, optimizing decision-making for robotics and autonomous vehicles. This simulation capability allows for the evaluation of various scenarios, ensuring that AI models can select the best course of action in complex situations.

Performance Evaluation

Cosmos benchmarks have been designed to assess the next generation of world models, emphasizing geometric accuracy and temporal stability. By comparing Cosmos models to baseline generative models like VideoLDM (VLDM), NVIDIA demonstrates the superior performance of its models in various scenarios, achieving higher pose estimation success rates and better fidelity in outputs. Cosmos WFMs consistently outperform VLDM on visual consistency, achieving up to 14X higher pose estimation success rates. While diffusion models deliver higher fidelity out of the box, autoregressive models deliver excellent performance for custom models.

Getting Started with NVIDIA Cosmos

Developers interested in utilizing NVIDIA Cosmos can begin by exploring the world foundation models available on the NVIDIA API catalog and Hugging Face. The platform provides an end-to-end pipeline for fine-tuning models, allowing users to leverage the NVIDIA NeMo tokenizer for efficient data processing. The world foundation models within NVIDIA Cosmos are available under an NVIDIA Open Model License, allowing for extensive customization and adaptation. 

Developers can fine-tune the models using techniques such as LoRA (Low-Rank Adaptation) and reinforcement learning from human feedback (RLHF). Moreover, Developers can build custom world models from scratch using the tools provided by NVIDIA Cosmos. By using NeMo Curator for video data preprocessing and the Cosmos tokenizer for data compression, developers can create unique models that meet their requirements. The integration of NIM microservices further facilitates the deployment of physical AI models across various environments, including cloud and data centres.

The Future of Physical AI with NVIDIA Cosmos

NVIDIA Cosmos can surely facilitate physical AI development. By providing open access to advanced models, accelerated data processing capabilities, and extensive customization options, NVIDIA can pave the way for a new era of innovation. As more developers engage with this platform, the potential for breakthroughs in robotics and autonomous vehicles continues to expand. With its commitment, NVIDIA is shaping the future of physical AI and developing an environment where creativity and innovation can flourish in the hands of developers around the globe.

| Latest From Us

SUBSCRIBE TO OUR NEWSLETTER

Stay updated with the latest news and exclusive offers!


* indicates required
Picture of Faizan Ali Naqvi
Faizan Ali Naqvi

Research is my hobby and I love to learn new skills. I make sure that every piece of content that you read on this blog is easy to understand and fact checked!

Leave a Reply

Your email address will not be published. Required fields are marked *


The reCAPTCHA verification period has expired. Please reload the page.

Don't Miss Out on AI Breakthroughs!

Advanced futuristic humanoid robot

*No spam, no sharing, no selling. Just AI updates.