NVIDIA Cosmos 3: the open world model for robotics and physical AI

Subscribe to the Newsletter

Join 10k+ people to get notified about new posts, news and tips.

Do not worry we don't spam!

By pressing the Subscribe button, you confirm that you have read and are agreeing to our Privacy Policy and Terms of Use

Log in

Have no account yet? Sign up

Create an account

Already have an account? Log in

Reset password

Remember your password? Log in

Terms of use

SingularityByte.com values the privacy of our users. Therefore, this privacy policy explains in detail how we use and protect the information we collect when you visit our website.. Read this privacy policy completely. Please refrain from visiting the site if the terms outlined below are not satisfactory to you. We reserve the right to change this policy at any time and will list these changes in the updates section of the policy. By reading this notice and visiting the site, you agree that you understand that customers will not be personally notified when this policy changes. Therefore, we advise our customers to frequently review our privacy policy so that they remain aware of its updates. By using the site, you accept that the posted policy and all its changes apply to your interaction with SingularityByte.com.

Information Collected by SingularityByte.com

Personal information may be collected by this site in many ways. This information includes:

Personal identifying information like your name, address, email, phone number, age, gender, and other personal data
Server data related to the IP address you used to visit our website, which includes your address, browser, OS, access time, and site activity.
Financial information related to your orders including your payment method and identifying payment information. We rarely store financial information collected on our site for transaction purposes. That information gets sent directly to our payment processor.
Social network data including Facebook permissions and user information from other networks, provided you log onto our site using one of these media sites.
Mobile device information such as your device ID, model, and location, if you use our site by accessing trough our website.

How We Use This Information

Our website uses information collected to:
• Manage your account information
• Customize ads
• Deliver promotions
• Email your account confirmation
• Manage purchases and payments
• Increase site efficiency
• Notify you of updates
• Offer new products
• Monitor and prevent theft
• Request your customer feedback
• Resolve account disputes
• Respond to your service requests

Information Disclosure

Normally, your information stays on our site. However, below we have listed the situations that may
require us to share the information we collect from you:
• When required by law, such as for fraud protection
• With our third-party providers for payment processing and hosting
• With your consent for marketing purposes
• When you post comments on the site
• To our advertisers, affiliates, and partners
• If this site goes bankrupt and data must be transferred

Cookies, Trackers, and Online Ads

We may use cookies, trackers, web beacons, and other technology to customize our website to improve your experience. We may customize the site using this information. These trackers do not have access to your personal information and can be removed from your browser options. In addition, third-party software provides ads for our site for marketing campaigns. These programs have access to tracking technology to optimize your ad experience. For more information about these
ads, visit [link to the privacy policies of affiliate advertisers]. Website analytics such as through Google Analytics may also be used to track users
and remarket our website. We do not give these vendors access to your personal information.

Other Sites

Our website may contain links to third-party websites in the form of policies, ads, and other non-affiliated links. Once you leave our site, we are no longer responsible for how your information is collected and disclosed. Please refer to the privacy policies of those third-party sites for more information.

Information Security

We take technical and administrative precautions to protect your data, but we cannot guarantee its safety against all types of fraud or misuse. If you provide personal information, we cannot verify its total security against all types of interception.

Do-Not-Track

Some browsers offer Do-Not-Track settings to prevent any information from being distributed. Since these settings have not been legally established as standard practice, we do acknowledge these settings.

Additional Options

At any time, you may opt to review or change your account settings, including contact information. If you wish to delete your account, you may do so to remove most of your information, however, some identifying information will be retained to prevent fraud.
You may also opt-out of emails and other correspondences from our site at any time.

Microsoft Clarity

We partner with Microsoft Clarity and Microsoft Advertising to capture how you use and interact with our website through behavioral metrics, heatmaps, and session replay to improve and market our products/services. Website usage data is captured using first and third-party cookies and other tracking technologies to determine the popularity of products/services and online activity. Additionally, we use this information for site optimization, fraud/security purposes, and advertising. For more information about how Microsoft collects and uses your data, visit the Microsoft Privacy Statement.

Contact Us

If you have questions or concerns about this privacy policy, please feel free to contact us at: desk@SingularityByte.com

Do you agree to our terms? Sign up

License OpenMDW 1.1

TL;DR

NVIDIA Cosmos 3: an open omnimodal world foundation model for physical AI, generating video, audio, and robot actions.
Two-tower reasoner-plus-diffusion design; Super (64B) and Nano (16B) are out under OpenMDW-1.1, Edge (4B) is coming.
Ships weights, datasets, and recipes for commercial use; needs an NVIDIA Ampere, Hopper, or Blackwell GPU on Linux.

☍ Announcement ⬇ Download Model

System Requirements

RAM	workstation / data-center
GPU	RTX PRO 6000 (Nano), datacenter (Super)
VRAM	BF16, NVIDIA only

Table of Contents

Most "open" AI releases this year were chatbots. NVIDIA's Cosmos 3, announced at GTC Taipei on June 1, 2026, is something different: an open world model that understands and generates not just text and video but ambient audio and robot actions, built for physical AI. The point is not to chat. It is to generate training data for robots, predict what happens next in the physical world, and output the action a robot should take. NVIDIA shipped the weights, the training data, and the recipes under a permissive license. Here is what it is, what is actually open, and what you can run.

What a world foundation model is

A language model predicts the next token. A world foundation model predicts the next state of the world: given a scene and an intent, what the camera sees next, what it sounds like, and what action moves a robot toward its goal. Cosmos 3 unifies three things earlier systems kept separate: physical reasoning (understanding a scene), world generation (producing realistic video and audio of what happens next), and action generation (the trajectory a robot should follow). That combination is why it is aimed at robotics, autonomous vehicles, and simulation rather than at your chat window.

How it is built

Cosmos 3 uses a two-tower "Mixture-of-Transformers" design. One tower is an autoregressive reasoner, a vision-language model initialized from Qwen3-VL (8B in Nano, 32B in Super). The other is a diffusion generator that produces the video, audio, and action output. They share one architecture and pass information from reasoner to generator, aligned across video, audio, and action tokens with 3D rotary position embeddings. So it is a hybrid: autoregressive where it reasons, diffusion where it generates. It was trained on roughly 20 trillion multimodal tokens, including about a billion images, 400 million videos, plus audio and human and robot action data.

It comes in three sizes. Cosmos 3 Super (64B) is the frontier-accuracy model, available now. Cosmos 3 Nano (16B) targets a single workstation GPU for near-real-time use, available now. Cosmos 3 Edge (4B), the on-device low-latency variant, is announced but not yet downloadable, so treat it as a roadmap item.

What is actually open

This is genuine open-weights, and more. The shipped models (Nano and Super) are downloadable on Hugging Face under OpenMDW-1.1, a permissive Linux Foundation license that explicitly allows commercial and non-commercial use and also covers the architecture, code, datasets, benchmarks, and training recipes. NVIDIA released six synthetic datasets and the training scripts alongside the weights, plus a separate DROID policy model for robot manipulation. Like NVIDIA's Nemotron 3 Ultra, the story is full openness, not just a weights drop.

Benchmarks and the honest caveats

NVIDIA claims Cosmos 3 ranks first among open models on a stack of physical-AI benchmarks (Physics-IQ, PAI-Bench, RoboLab, RoboArena, and a traffic-anomaly reasoning leaderboard), and tops Artificial Analysis open leaderboards for text-to-image and image-to-video. These are NVIDIA-reported; there is no independent reproduction yet, and NVIDIA did not publish head-to-head tables against its own Cosmos 2 or against Google's Genie line, so cross-family comparisons are not available.

Limitations and gotchas

It is NVIDIA-only and Linux-only: Ampere, Hopper, or Blackwell GPUs, CUDA 13, BF16. There is no Apple Silicon or AMD path.
Super needs a data-center GPU; Nano targets a workstation-class card (NVIDIA cites the RTX PRO 6000) for real-time robotics. This is not a laptop model.
The on-device Edge (4B) model, the one most edge builders want, is not released yet.
Benchmarks are NVIDIA-reported and physical-AI specific; if you are not doing robotics, AV, or simulation, this is probably not your model.

Who should use it

Roboticists training manipulation policies, autonomous-vehicle teams generating rare scenarios, and anyone who needs physics-accurate synthetic data instead of expensive real-world capture. The open weights plus the open datasets and recipes make it a genuine base to build on, not just an endpoint to call. If you build chat or coding products, skip it; this is a different branch of the tree.

Run it in about 10 minutes

The fastest taste is the hosted playground on build.nvidia.com or a NIM container; the local path is Hugging Face plus Diffusers.

# Hosted: a NIM container for the Nano reasoner (needs an NGC API key)
docker run --gpus all -e NGC_API_KEY=$NGC_API_KEY \
  -p 8000:8000 nvcr.io/nim/nvidia/cosmos3-reasoner:latest

# Or pull the open weights from Hugging Face
huggingface-cli download nvidia/Cosmos3-Nano

With the weights local, the omni pipeline rolls the world forward from a single image and a prompt:

# Generate a short world rollout from an image
from diffusers import Cosmos3OmniPipeline

pipe = Cosmos3OmniPipeline.from_pretrained("nvidia/Cosmos3-Nano").to("cuda")
out = pipe(image="kitchen.png", prompt="the robot arm picks up the red mug")
out.save("rollout.mp4")

If you do not have an NVIDIA workstation, the ten-minute move is the build.nvidia.com demo and the technical report; the two-tower reasoner-plus-generator design is the part worth studying even if you cannot run the weights yet.

Sources and further reading

Tested on: not independently tested. Cosmos 3 needs an NVIDIA Ampere, Hopper, or Blackwell GPU on Linux (Super is data-center class), beyond our bench. Benchmarks are NVIDIA-reported with no independent reproduction; the Edge (4B) variant is announced but not yet downloadable.
Date checked: 2026-06-26

Subscribe to the Newsletter

Search

GDPR Compliance