Hugging Face has unveiled a new demonstration of an artificial intelligence (AI) agent, known as the Open Computer Agent, capable of executing various web-based tasks. This free tool can be easily accessed by anyone through its dedicated website. With the ability to operate in web browsers, it can autonomously navigate platforms like Google Search, Google Maps, and ticket booking sites to accomplish tasks.
Open Computer Agent Now Open to All Users
Aymeric Roucher, the Project Lead for Agents at Hugging Face, shared the release details of the Open Computer Agent in a recent post on X, the platform formerly known as Twitter. This open-source agent is designed to autonomously handle a broad array of tasks and is equipped with a Linux virtual machine along with several applications, including the Mozilla Firefox web browser.
According to Roucher, the AI agent utilizes Qwen2-VL-72B, a vision-language AI model that helps it identify elements on the screen by their coordinates. This functionality allows the agent to analyze visual data, perform necessary actions, and proceed to subsequent steps. The agentic features are supported by Hugging Face’s smolagents tool.
We’re launching Computer Use in smolagents! 🥳
-> As vision models become more capable, they become able to power complex agentic workflows. Especially Qwen-VL models, that support built-in grounding, i.e. ability to locate any element in an image by its coordinates, thus to… pic.twitter.com/mI8MuWZkIS
— m_ric (@AymericRoucher) May 6, 2025
The AI agent can be utilized free of charge, and interested users can visit this link to experiment with the Open Computer Agent. For example, users can request the agent to find directions to a specific location, prompting it to open Google Maps, enter the starting point and destination, and provide navigation assistance.
Staff from Gadgets 360 recently tested the AI agent. While it operates according to its intended purpose, they noted that its speed in task completion could be sluggish. The agent sometimes struggles with complex prompts, leading to mistakes or incomplete results. Furthermore, being a cloud-based tool, users may experience long wait times due to a queue, which could delay the start of the agent’s task execution.