Nvidia has unveiled a new artificial intelligence (AI) model geared towards training robots through simulation. Named Cosmos-Transfer 1, this large language model (LLM) is specifically designed for AI-powered robotics, commonly referred to as physical AI. The Santa Clara-based company has made the model open source, allowing users to download it from well-known online repositories under a permissive license. The primary feature highlighted by Nvidia is the enhanced control users will have over the generated simulations.
Nvidia Releases AI Model to Train Robots
The use of simulation-based robotics training has gained momentum recently, bolstered by advancements in generative AI technology. This area of robotics focuses on hardware that utilizes AI as its core intelligence. The training approach aims to prepare the machine’s AI to navigate various real-world scenarios, thereby expanding its ability to perform a broader array of tasks. This marks a significant evolution from traditional factory robots, which are typically limited to executing a single function.
Nvidia’s Cosmos-Transfer1 is part of a broader suite of world foundation models (WFMs) developed by the company, which process structured video inputs, including segmentation maps, depth maps, and lidar scans, to produce high-fidelity video outputs. These outputs serve as a simulation platform for training physical AI systems.
A recent study released on arXiv details how this model surpasses its predecessors in terms of customizability. It allows for adjustments in the weight of various conditional inputs according to spatial location, enabling developers to create highly tailored environments. Additionally, it features real-time world generation, which facilitates quicker and more varied training sessions.
In terms of technical specifications, Cosmos-Transfer1 is a diffusion-based model comprising seven billion parameters. Its design focuses on video denoising within latent space and can be controlled through a modulation branch. The model accepts both text and video inputs, utilizing these to produce photorealistic output videos. It supports four distinct types of control input videos: canny edge, blurred RGB, segmentation mask, and depth map.
The model has been successfully tested on Nvidia’s Blackwell and Hopper series chipsets, with inference conducted on a Linux operating system. The company has released Cosmos-Transfer1 under the Nvidia Open Model License Agreement, allowing for both academic and commercial applications.
Interested users can access Nvidia’s Cosmos-Transfer1 AI model through the company’s GitHub repository and Hugging Face platform. A forthcoming AI model featuring 14 billion parameters is anticipated to be released shortly.