Google DeepMind is introducing an innovative on-device version of its Gemini Robotics AI model, enabling operation without the need for an internet connection. This vision-language-action (VLA) model boasts capabilities akin to those of its predecessor released in March but is designed to be compact and efficient enough to function directly on robots.
The flagship Gemini Robotics model aims to assist robots in completing various physical tasks, even in scenarios it has not been explicitly trained for. It enables robots to adapt to new environments, comprehend commands, and carry out tasks that demand dexterity.
Carolina Parada, who leads the robotics division at Google DeepMind, shared insights with Technology News, explaining that the original Gemini model utilizes a hybrid system capable of functioning both on-device and in the cloud. In contrast, the new device-only model offers offline capabilities nearly equivalent to those of the main version.
The new on-device model is capable of handling various tasks immediately upon deployment, demonstrating adaptability to diverse situations with as few as 50 to 100 examples, according to Parada. Google initially trained this model using its ALOHA robot but has since adapted it for different types of robots, including Apptronik’s humanoid Apollo and the bi-arm Franka FR3 robot.
Parada noted, “The Gemini Robotics hybrid model is still more powerful, but we are genuinely impressed by the performance of this on-device version.” She described it as a foundational model suitable for applications in environments with limited connectivity and beneficial for companies with stringent security measures.
In conjunction with this launch, Google is providing a software development kit (SDK) for the on-device model, allowing developers the opportunity to evaluate and optimize its capabilities—marking a first for Google DeepMind’s vision-language-action offerings.
The on-device Gemini Robotics model and its accompanying SDK will initially be accessible to a select group of trusted testers, as Google focuses on addressing potential safety concerns.