Researchers at Stanford University and Washington University have launched an open-source artificial intelligence (AI) model that matches the performance of OpenAI’s o1 model. Their primary aim was to gain insights into the techniques used by the San Francisco-based company for enhancing its o1 series models during test time scaling. Impressively, the study demonstrated that this methodology could be executed at a significantly reduced cost and with minimal computing resources.
Researchers Develop S1-32B AI Model
The methodology and development details of this new model are outlined in a study available on the pre-print platform arXiv. The researchers constructed a synthetic dataset derived from an existing AI model and employed various innovative techniques, including ablation and supervised fine-tuning (SFT). For those interested, the model can be accessed through a GitHub repository.
It is important to mention that the AI model was not entirely built from the ground up. The team utilized the Qwen2.5-32B-Instruct model and distilled it to develop the s1-32B large language model (LLM). Launched in September 2024, this model demonstrates competence; however, its size and limited reasoning capabilities prevent it from rivaling OpenAI’s o1.
In their methodology, researchers leveraged the Gemini Flash Thinking application programming interface (API) to capture reasoning traces and responses. They extracted a set of 59,000 triplets consisting of questions, reasoning traces (the chain of thought or CoT), and corresponding answers from the API. Subsequently, they curated a dataset named s1K, which comprised 1,000 high-quality, diverse, and challenging questions along with their reasoning traces and answers.
Once the s1K dataset was established, the researchers engaged in supervised fine-tuning of the Qwen2.5-32B-Instruct model, applying basic fine-tuning hyperparameters. This distillation process was completed in just 26 minutes, utilizing 16 Nvidia H100 GPUs.
At this stage, the researchers had yet to discern how OpenAI trained its models to think effectively or how they managed to prevent unnecessary prolongation of the thinking process. Understanding this is crucial, as a model left unchecked may risk overthinking, leading to inefficient use of processing power.
During the fine-tuning process, researchers made an intriguing discovery: by introducing XML tags for thinking, they could successfully control the inference time. Upon reaching the end tag, the model was instructed to adopt an authoritative tone for its final response. Inference time refers to the near real-time outputs generated by standard AI models, and exceeding this necessitates careful code manipulation.
Using the s1-32B model, the researchers introduced a “wait” command to extend its thinking period beyond the normal inference timeframe. This addition prompted the model to begin second-guessing and verifying its outputs, allowing further adjustment of the thinking phase based on the tag—either shortening or lengthening it.
The team also experimented with various other phrases, such as “alternatively” and “hmm.” However, they found that the best performance metrics were achieved when utilizing the “wait” tag. By bringing the model’s capabilities closer to those of o1, the researchers suggest that this approach may reflect the techniques employed by OpenAI to refine its reasoning models.
A report by TechCrunch indicates that the development of the s1-32B AI model cost less than $50 (approximately Rs. 4,380), emphasizing that creating a post-training structure for reasoning models can be achieved at a remarkably low cost.