On Tuesday, Google DeepMind unveiled its latest artificial intelligence model, Gemini Robotics On-Device, designed to operate fully on local devices. This innovative voice-language-action (VLA) model enables robots to execute a variety of tasks in real-world settings. According to the tech giant based in Mountain View, the model’s ability to function independently of a data network enhances its suitability for latency-sensitive applications. Currently, access to the model is limited to participants in its trusted tester program.
Google’s Innovative Robotics Model Operates Completely On-Device
Carolina Parada, a Senior Director and Head of Robotics at Google DeepMind, made the announcement via a blog post, detailing the launch of Gemini Robotics On-Device. Users interested in the new VLA model can utilize it through a Gemini Robotics software development kit (SDK) once enrolled in the tester program. Furthermore, those interested can evaluate the model using the company’s MuJoCo physics simulator.
Although specifics regarding the architecture and training techniques of this proprietary model remain undisclosed, Google has emphasized its functionality. Tailored for bi-arm robots, Gemini Robotics On-Device demands minimal computational power. Despite these limitations, the model is designed for experimentation, enabling it to adapt to new tasks with 50 to 100 demonstrations.
The VLA model can follow natural language instructions and execute sophisticated tasks such as unzipping bags or folding clothes. Based on internal evaluations, Google asserts that the AI showcases exceptional generalization performance while operating entirely offline. Additionally, it reportedly surpasses other on-device models when faced with challenging out-of-distribution tasks and intricate multi-step instructions.
Google noted that, while the AI model was initially trained for ALOHA robots, it was also successfully adapted for use with Franka FR3 and Apptronik’s Apollo humanoid robots. These adaptations underline the model’s compatibility exclusively with bi-arm configurations.
The AI demonstrated its ability to follow instructions and carry out general tasks across various robotic models. According to the company, it effectively managed previously unseen objects and scenarios, including executing industrial belt assembly tasks that necessitate a high level of precision and dexterity.