On Tuesday, OpenAI unveiled two new open-source artificial intelligence (AI) models, marking the San Francisco-based company’s first major release to the open community since the introduction of GPT-2 in 2019. The newly launched models, named gpt-oss-120b and gpt-oss-20b, are reported to deliver performance levels similar to the organization’s o3 and o3-mini models. Both models utilize a mixture-of-experts (MoE) architecture and have undergone extensive safety training and evaluations. The open weights for these models can be downloaded from Hugging Face.
OpenAI’s Open-Source AI Models Enhance Native Reasoning
OpenAI’s CEO, Sam Altman, made the announcement on X (formerly Twitter), emphasizing that “gpt-oss-120b performs about as well as o3 on challenging health issues.” Both models are currently available for access on OpenAI’s Hugging Face collection, allowing users to download and run the open weights locally.
According to information provided on OpenAI’s website, the models are designed to work seamlessly with the company’s Responses API and can integrate into agentic workflows. They are also capable of utilizing tools like web search and Python code execution. Enhanced with native reasoning capabilities, these models exhibit transparent chain-of-thought (CoT), which can be customized for either high-quality responses or quicker outputs.
Technically, the models are constructed using MoE architecture to optimize processing efficiency by minimizing the number of active parameters. The gpt-oss-120b engages 5.1 billion parameters per token, while gpt-oss-20b operates at 3.6 billion parameters per token. The gpt-oss-120b model encapsulates a total of 117 billion parameters, whereas the gpt-oss-20b comprises 21 billion parameters. Both models support input lengths of up to 128,000 tokens.
The training process for these open-source models primarily employed a database of English text, with a strong emphasis on Science, Technology, Engineering, and Mathematics (STEM), along with coding and general knowledge. Following the initial training, OpenAI implemented reinforcement learning (RL)-based fine-tuning techniques.
Benchmark performance of the open-source OpenAI models
Photo Credit: OpenAI
Based on internal evaluations, gpt-oss-120b has shown superior performance compared to o3-mini in areas such as competitive coding (Codeforces), general problem solving (MMLU and Humanity’s Last Exam), and tool calling (TauBench). However, across some other benchmarks, including GPQA Diamond, the models exhibit slightly lower performance than both o3 and o3-mini.
OpenAI emphasized that these models were subjected to intensive safety training. During the pre-training phase, the organization filtered out harmful data related to chemical, biological, radiological, and nuclear (CBRN) threats. Furthermore, specific techniques were applied to ensure that the models reject unsafe prompts and are safeguarded against prompt injections.
Even though the models are open-source, OpenAI claims they have been trained with safeguards that prevent misuse by bad actors seeking to generate harmful outputs.