Nvidia has unveiled a new artificial intelligence (AI) model designed for training robots in simulated environments. Named Cosmos-Transfer 1, this large language model focuses on enhancing AI capabilities in robotics hardware, often referred to as physical AI. The tech company has made this model available as open source under a permissive license, allowing interested users to download it from various online repositories. Nvidia emphasized that a key feature of this model is the enhanced control it provides for generating simulations.
Nvidia Releases AI Model to Train Robots
The sector of simulation-based robotics training has been gaining traction recently, fueled by advancements in generative AI technology. This niche within robotics focuses on hardware that utilizes AI for operational tasks. The training methodology aims to equip the machine’s ‘brain’ to navigate a broader array of real-world situations, marking a significant improvement over traditional robots, particularly those in manufacturing, which are typically limited to specific tasks.
Nvidia’s Cosmos-Transfer 1 is part of the broader suite of Cosmos Transfer world foundation models (WFMs). It processes structured video inputs such as segmentation maps, depth maps, and lidar scans to create photorealistic video outputs. These outputs serve as a foundation for training physical AI systems.
According to a study published in the arXiv journal, the new model offers significantly improved customization options compared to earlier versions. It allows users to adjust the weight of various conditional inputs based on their spatial locations, leading to highly controllable world generation. Additionally, the model features real-time world generation, which facilitates faster and more diverse training sessions.
On a technical level, Cosmos-Transfer 1 is a diffusion-based model featuring seven billion parameters. It is optimized for video denoising in latent space and can be controlled through a dedicated modulation branch. The model takes both text and video inputs to create photorealistic output videos. It supports four types of input video controls, including canny edge, blurred RGB, segmentation mask, and depth map.
The AI model has been successfully tested on Nvidia’s Blackwell and Hopper series chipsets, with the inference performed on the Linux operating system. Nvidia has released the model under the Nvidia Open Model License Agreement, allowing both academic and commercial applications.
Users can download the Cosmos-Transfer 1 AI model from Nvidia’s GitHub repository as well as from Hugging Face listing. An additional AI model featuring 14 billion parameters is anticipated to be launched in the near future.