Google DeepMind has unveiled enhanced AI models that grant robots the ability to perform intricate tasks and seek assistance from the internet. In a recent press conference, Carolina Parada, the head of robotics at Google DeepMind, explained that the new AI models empower robots to “think multiple steps ahead” prior to taking action in real-world scenarios.
The upgrade includes the introduction of Gemini Robotics 1.5 and the embodied reasoning model, Gemini Robotics-ER 1.5, enhancements to models initially released in March. These improvements enable robots to undertake more than just simple tasks, such as folding a piece of paper or unzipping a bag. They can now accomplish complex activities like sorting laundry by color, packing a suitcase according to London’s weather, and assisting in waste sorting based on local regulations found through web searches.
Parada remarked, “Previous models were highly effective at executing single instructions in a general manner. With this update, we are transitioning from performing isolated tasks to achieving genuine understanding and problem-solving capabilities for physical activities.”
The new capabilities allow robots to leverage the enhanced Gemini Robotics-ER 1.5 model to comprehend their environment and utilize digital resources like Google Search for additional information. The findings are then converted into natural language commands for the Gemini Robotics 1.5 model, enabling robots to implement each step effectively using visual and linguistic comprehension.
In addition, the new Gemini Robotics 1.5 model allows robots to “learn” from one another, even when they differ in design. Google DeepMind’s findings showed that tasks given to the ALOHA2 robot, equipped with dual mechanical arms, are easily transferable to the bi-arm Franka robot and Apptronik’s humanoid robot, Apollo. “This opens two possibilities for us: controlling diverse robots, including humanoid types, with a single model and enabling skill transfer from one robot to another,” explained Kanishka Rao, a software engineer at Google DeepMind during the briefing.
As part of this rollout, Gemini Robotics-ER 1.5 is being made accessible to developers via the Gemini API in Google AI Studio, with select partners receiving access to Gemini Robotics 1.5.