OpenAI introduced two new artificial intelligence models, o3 and o4-mini, on Wednesday. These advanced models focus on enhanced reasoning capabilities and come equipped with visible chain-of-thought (CoT) functionalities. The San Francisco-based company announced that these models possess visual reasoning abilities, allowing them to analyze images and respond to more intricate user inquiries. Serving as successors to the earlier o1 and o3-mini models, the new offerings are presently available to paid ChatGPT subscribers. Additionally, the company launched the GPT-4.1 series of AI models earlier in the week.
OpenAI’s New Reasoning Models Arrive With Improved Performance
In a tweet on X, previously known as Twitter, OpenAI announced the release of its new large language models (LLMs). Described as the company’s “smartest and most capable models,” these latest iterations boast the inclusion of visual reasoning functionality.
Visual reasoning enhances these AI models’ ability to analyze images, facilitating the extraction of contextual and implicit information. According to information on OpenAI’s website, these models represent the company’s first to utilize and integrate all available tools within ChatGPT. Tools include web searches, Python, image analysis, file interpretation, and image generation.
The o3 and o4-mini models can conduct web searches related to images, manipulate them by zooming, cropping, flipping, and enhancing, as well as execute Python code to extract data. OpenAI stated that this functionality enables the models to retrieve information from less-than-perfect images.
Some of the tasks these models can handle include deciphering handwriting from a notebook positioned upside down, interpreting distant signs with challenging text, identifying specific questions from extensive lists, locating bus schedules from images, and solving puzzles.
Regarding performance, OpenAI claims that the o3 and o4-mini models exhibit superior results compared to the GPT-4o and o1 models on benchmarks such as MMMU, MathVista, VLMs are blind, and CharXiv. However, the company did not provide any performance comparisons with competing third-party AI models.
OpenAI acknowledged several limitations inherent in these models. For instance, they may perform unnecessary image manipulation steps and tool calls, leading to unnecessarily lengthy chains of thought. Additionally, the o3 and o4-mini models remain vulnerable to perception errors, potentially misinterpreting visual data and delivering incorrect answers. The company also pointed out reliability concerns related to these models.
Both the o3 and o4-mini models will be accessible to users of ChatGPT Plus, Pro, and Team tiers, replacing the previous o1, o3-mini, and o3-mini-high models in the selection options. Enterprise and educational users can expect access to these new models next week. Developers will have the opportunity to utilize them through the Chat Completions and Responses application programming interfaces (APIs).