Google has announced a new AI model, Gemini 2.5 Computer Use, aimed at enabling AI agents to operate within web browsers, effectively using human-designed interfaces. This model leverages “visual understanding and reasoning capabilities” to interpret user requests and complete tasks such as form submission.
The model is designed for tasks like user interface testing and navigating sites where there isn’t a direct API connection. Other iterations of this AI have successfully performed functions in projects like AI Mode and Project Mariner, which autonomously uses AI agents to facilitate tasks in browser environments, such as shopping based on ingredient lists.
This announcement from Google comes on the heels of OpenAI’s showcase of new ChatGPT applications during its annual Dev Day, further highlighting Google’s focus on its ChatGPT Agent feature, which adeptly handles intricate tasks for users. Notably, Anthropic launched a computer use version of its Claude AI model last year.
Google shared demonstration videos of the Gemini model at work, where tasks were shown to be accelerated by three times their usual speed.
According to Google, the Gemini model “outperforms leading alternatives on multiple web and mobile benchmarks.” Unlike competing tools like ChatGPT Agent and Anthropic’s offerings, Google’s model operates solely within a browser without full access to the user’s computing environment. The company has acknowledged that it has yet to be optimized for complete desktop OS-level control and currently allows for 13 actions, including opening browsers, inputting text, and dragging elements.
Developers can access Gemini 2.5 Computer Use through Google AI Studio and Vertex AI. A demo is also available on Browserbase, showcasing capabilities such as “Playing a game of 2048” or “Browsing trending discussions on Hacker News.”