On Wednesday, OpenAI unveiled two new artificial intelligence models named o3 and o4-mini. These latest offerings from the San Francisco-based company are designed with a focus on reasoning and feature visible chain-of-thought (CoT) capabilities. The addition of visual reasoning allows these models to analyze images, thus enhancing their ability to respond to more intricate user queries. The new models succeed the previous o1 and o3-mini versions and are currently accessible to ChatGPT’s paid subscribers. Earlier this week, OpenAI also introduced the GPT-4.1 series of AI models.
OpenAI’s New Reasoning Models Arrive With Improved Performance
In a recent announcement on X (formerly Twitter), OpenAI highlighted the release of these new large language models (LLMs), describing them as the “smartest and most capable” within the company’s portfolio. The models are particularly notable for their enhanced visual reasoning capabilities.
This visual reasoning feature enables the AI models to analyze images more effectively, extracting contextual and implicit information. OpenAI’s website states that these are the first models capable of agentically utilizing and combining all tools available within ChatGPT, which include web search, Python programming, image analysis, file interpretation, and image generation.
The o3 and o4-mini models can perform a range of tasks, such as searching for images online, manipulating them through zooming, cropping, and enhancing, and executing Python code to gather information. This functionality allows the models to retrieve details even from less-than-perfect images.
Some of the specific tasks these new models can accomplish include deciphering handwriting from an upside-down notebook, reading distant signs with faint text, identifying particular questions from extensive lists, locating bus schedules from images, solving puzzles, and more.
In terms of performance metrics, OpenAI asserts that the o3 and o4-mini models exceed the capabilities of the previous GPT-4o and o1 models on benchmarks like MMMU, MathVista, VLMs are blind, and CharXiv. However, no comparative performance data with third-party AI models has been released.
OpenAI has also pointed out certain limitations of these models. For example, they may engage in unnecessary image manipulation steps or make tool calls that lead to excessively lengthy chains of thought. Additionally, the o3 and o4-mini models could experience perception errors, misinterpreting visual information and providing incorrect answers. Reliability issues have also been cited as a concern.
Both the o3 and o4-mini AI models are being offered to ChatGPT Plus, Pro, and Team users, replacing the previous o1, o3-mini, and o3-mini-high models in the model selector. Enterprise and educational users will gain access next week, while developers can utilize the models through the Chat Completions and Responses application programming interfaces (APIs).