Google DeepMind Launches Groundbreaking Robot AI Models

On Thursday, Google DeepMind announced the launch of two innovative artificial intelligence (AI) models designed to enhance the capabilities of robots in various real-world settings. The newly introduced systems, named Gemini Robotics and Gemini Robotics-ER (embodied reasoning), are sophisticated vision-language models that exhibit spatial intelligence and can execute a wide array of actions. Additionally, the Mountain View tech firm is collaborating with Apptronik to develop humanoid robots powered by Gemini 2.0. The ongoing evaluation of these models aims to refine their performance and effectiveness.

Google DeepMind Unveils Gemini Robotics AI Models

In a blog post, DeepMind elaborated on the features of the new AI models. Carolina Parada, Senior Director and Head of Robotics at Google DeepMind, emphasized that for AI to be beneficial in physical environments, it must demonstrate “embodied” reasoning, which includes the capability to engage with and comprehend the physical world in order to perform tasks.

The first model, Gemini Robotics, is an advanced vision-language-action (VLA) model developed from the Gemini 2.0 framework, boasting an innovative output modality of “physical actions” that enables the model to control robots directly.

DeepMind identified three essential capabilities for AI models in robotics: generality, interactivity, and dexterity. Generality relates to a model’s adaptability to various situations. The company claims that Gemini Robotics excels in navigating new objects, diverse instructions, and unfamiliar environments. Internal testing has indicated that the AI model more than doubles the performance on a broad generalization benchmark.

The interactivity aspect, rooted in Gemini 2.0, allows the model to understand and respond to questions in everyday conversational language, accommodating multiple languages. Google stated that the model is consistently aware of its environment, can identify changes, and adjusts its actions based on incoming information.

Moreover, DeepMind indicated that Gemini Robotics is capable of executing intricate, multi-step tasks that demand precise manipulation of physical objects. According to researchers, the model can effectively direct robots to perform activities such as folding paper or packing a snack into a bag.

The second model, Gemini Robotics-ER, also operates as a vision-language model but emphasizes spatial reasoning. Leveraging capabilities from Gemini 2.0 related to coding and three-dimensional detection, this AI model is designed to understand how to manipulate objects in the physical realm. Parada noted an example where the model was able to generate a command for a two-finger grasp to pick up a coffee mug by the handle along a secure trajectory.

This model performs various steps required for robot operation in real-world settings, including perception, state estimation, spatial comprehension, planning, and code generation. Importantly, neither AI model is available for public use at this stage. DeepMind plans to first integrate the AI technology into a humanoid robot and assess its performance before potentially releasing it to the broader market.

Google DeepMind Launches Groundbreaking Robot AI Models

Comment

Google DeepMind Launches Groundbreaking Robot AI Models

Share This Post

or copy the link

Google DeepMind Unveils Gemini Robotics AI Models

Tamamen Ücretsiz Olarak Bültenimize Abone Olabilirsin

Related News

Meta to Let Candidates Use AI in Coding Interviews

AI Study Warns: Subliminal Data Can Foster Evil Traits

Google Opens Veo 3 Video AI to All Developers!

Google Unveils AI Tools to Convert Photos into Videos!

Qualcomm Unveils Cutting-Edge Auto Tech at Snapdragon 2025

Write a Reply Cancel