Google DeepMind, the artificial intelligence research division of the tech powerhouse, initially introduced Project Astra at its I/O event earlier this year. Over six months later, the company has revealed enhancements and new functionalities for the AI agent. Leveraging the capabilities of the Gemini 2.0 AI models, Project Astra can now communicate in several languages, access various Google platforms, and has an improved memory feature. While it remains in the testing stage, Google, based in Mountain View, is eager to integrate Project Astra into applications such as the Gemini app and Gemini AI assistant, as well as explore options for wearable technology like smart glasses.
Google Expands Capabilities of Project Astra
Project Astra functions as a versatile AI agent akin to the vision mode offered by OpenAI or Meta’s Ray-Ban smart glasses. The AI can incorporate camera hardware to observe its surroundings and analyze visual data, enabling it to respond to inquiries about what it “sees.” Furthermore, the AI agent’s memory capacity allows it to retain visual information even when the camera is not actively displaying it.
In a recent blog post, Google DeepMind emphasized that the team has dedicated efforts towards enhancing the AI agent since its initial reveal in May. With the introduction of Gemini 2.0, Project Astra has undergone several significant upgrades. It now supports communication in multiple and mixed languages, along with improved recognition of accents and less common vocabulary.
Additionally, the functionality of tool-use has been incorporated into Project Astra. It can now utilize Google Search, Lens, Maps, and Gemini to respond to more complex inquiries. For example, a user can point at a landmark and request directions home, with the AI capable of recognizing the landmark and providing verbal guidance.
The memory function of the AI agent has also seen improvements. Previously, Project Astra could remember visual information only for 45 seconds; this period has now been expanded to 10 minutes of in-session memory. Moreover, it has the capacity to recall more past conversations, allowing for more personalized interactions. Google asserts that the agent also processes language with the speed of human conversation, enhancing the overall user experience.