Jim Fan, Senior Research Manager and Head of the Generalist Embodied Agent Research (GEAR) Lab at Nvidia, discussed the future training and capabilities of artificial intelligence (AI) robots. He emphasized that forthcoming embodied AI agents will first be nurtured in virtual environments where they can master specific tasks. Fan envisions a future where entire urban areas, homes, and industrial sites will be replicated in simulations, paving the way for AI training that mirrors real-world situations. He further suggested that these AI robots might operate collectively through a shared cognitive system, often referred to as a hive mind.
Nvidia’s Jim Fan Shares Insights on AI Robot Training
Fan articulated his ideas in a post on X (formerly Twitter), highlighting Tokyo’s recent initiative of launching a high-resolution digital twin of the entire city, now available for public download. He pointed out that the trend of digitizing physical environments for AI training is set to expand significantly in the coming years.
“It’s an inevitable trend that more and more cities, houses, and factories will be transported into simulations,” remarked Fan.
He also noted that in the near future, robots will undergo training in more integrated environments rather than in isolation. Currently, these machines are trained in controlled settings designed to teach them basic functions like movement, object recognition, and task execution. However, experts argue that this approach falls short of preparing robots for the diverse challenges present in real-world applications.
To address this concern, Fan proposed that future embodied AI agents would be trained as part of a cohesive system, described as an iron fleet. These agents are AI constructs that possess physical or simulated bodies, allowing them to interact with their surroundings similarly to humans or animals.
According to Fan, upcoming AI robots will utilize real-time graphics engines and be trained across extensive computational clusters. This advanced training will generate vast numbers of high-quality training tokens, enhancing their learning. He elaborated on his vision, stating, “The majority of embodied agents will be born in sim, and transferred zero-shot to our real world when they are ready.”
Once effectively deployed in settings such as manufacturing plants, offices, or personal homes, these AI robots will be connected through a hive mind, enabling them to learn from a multitude of scenarios and collaboratively tackle complex tasks.
While these concepts may seem like elements from science fiction, Fan—who spearheads Nvidia’s embodied AI group—believes this approach is poised to transform the training of highly capable robots in the near future. He highlighted that Nvidia is already progressing in this direction, with the company’s headquarters in Santa Clara being designed and rendered using the Omniverse platform, which leverages GPU acceleration before the physical structures are built.