1. News
  2. AI
  3. Hugging Face Launches Open-R1 to Rival DeepSeek-R1

Hugging Face Launches Open-R1 to Rival DeepSeek-R1

featured
Share

Share This Post

or copy the link

On Tuesday, Hugging Face unveiled a new project aimed at creating Open-R1, an entirely open reproduction of the DeepSeek-R1 model. This announcement follows the recent release of the DeepSeek-R1 artificial intelligence model by a hedge fund-supported Chinese AI company, which disrupted the tech landscape in Silicon Valley and on NASDAQ. One significant factor contributing to the excitement is that such an advanced and sizable AI model—capable of rivaling OpenAI’s o1 model—had previously not been made available in an open-source format. However, the model’s release was not completely open-source, prompting researchers at Hugging Face to seek out key missing components.

Reasons Behind Hugging Face’s Open-R1 Initiative

In a blog post, Hugging Face researchers elaborated on their motivation for replicating the acclaimed DeepSeek AI model. DeepSeek-R1 is classified as a “black-box” release; while the code and certain assets are publicly accessible, the dataset and training code required for complete replication are not available. This limits users to downloading the AI model for local use but does not provide the necessary information for full reproduction.

Key missing information includes the reasoning-specific datasets utilized in training the base model, the training code that establishes hyperparameters for deciphering complex queries, and the trade-offs made regarding computing power and data during the training phase.

The researchers aim to create a fully open-source rendition of DeepSeek-R1 to enhance transparency in reinforcement learning outcomes and to offer reproducible insights to the wider community.

Overview of Hugging Face’s Open-R1 Project

Given that DeepSeek-R1 is now in the public domain, researchers have begun to explore certain elements of the AI model. For instance, they found that DeepSeek-V3, the foundational model for R1, was constructed using pure reinforcement learning techniques without any human intervention. The R1 model, focused on reasoning, incorporated several refinement processes aimed at eliminating low-quality outputs, yielding more polished and consistent responses.

To achieve their objectives, Hugging Face researchers have outlined a three-step strategy. Initially, they will produce a distilled version of R1 based on its dataset. Following this, they plan to replicate the pure reinforcement learning approach before introducing supervised fine-tuning and further reinforcement learning to align the responses with those of R1.

The synthetic dataset generated from the distilled R1 model, along with the training methodologies, will subsequently be shared with the open-source community. Developers will then have the opportunity to enhance existing large language models (LLMs) into reasoning models with targeted fine-tuning.

Hugging Face employed a similar methodology to distill the Llama 3B AI model, demonstrating that improvements in test time compute, also referred to as inference time compute, can significantly elevate the performance of smaller language models.

Hugging Face Launches Open-R1 to Rival DeepSeek-R1
Comment

Tamamen Ücretsiz Olarak Bültenimize Abone Olabilirsin

Yeni haberlerden haberdar olmak için fırsatı kaçırma ve ücretsiz e-posta aboneliğini hemen başlat.

Your email address will not be published. Required fields are marked *

Login

To enjoy Technology Newso privileges, log in or create an account now, and it's completely free!