On Thursday, DeepSeek, a Chinese artificial intelligence company, unveiled its latest creation, the DeepSeek-V3 AI model. This new open-source large language model (LLM) boasts an impressive 671 billion parameters, eclipsing the Meta Llama 3.1 model, which has 405 billion parameters. The research team emphasizes the model’s efficiency, achieved through a mixture-of-expert (MoE) architecture that activates only those parameters that are pertinent to the given task, enhancing both efficiency and accuracy. It’s important to note that the model is text-based and does not support multimodal features.
DeepSeek-V3 AI Model Released
The DeepSeek-V3 AI model is currently available on Hugging Face, where it is hosted as open-source software. The listing indicates that this LLM is designed for efficient inference and economical training, utilizing Multi-head Latent Attention (MLA) and DeepSeekMoE techniques.
This innovative architecture allows the AI model to activate only relevant parameters, which contributes to quicker processing times and improved accuracy when compared to traditional models of comparable size. DeepSeek-V3 was pre-trained on an extensive dataset of 14.8 trillion tokens and utilizes advanced methodologies such as supervised fine-tuning and reinforcement learning to produce high-quality outputs.
According to DeepSeek, the training process for the AI model took 2.788 million hours using Nvidia H800 GPUs. The model’s architecture incorporates a load-balancing method to prevent performance degradation, a strategy first employed on the predecessor model.
In terms of performance, internal evaluations suggest that DeepSeek-V3 surpasses the Meta Llama 3.1 and Qwen 2.5 models across various benchmarks, including Big-Bench High-Performance (BBH), Massive Multitask Language Understanding (MMLU), HumanEval, and MATH. However, these claims have yet to be validated by independent researchers.
A significant feature of the DeepSeek-V3 is its substantial size of 671 billion parameters. While there are larger models on the market, such as the Gemini 1.5 Pro, which contains one trillion parameters, such vast sizes are uncommon within the open-source domain. Prior to this release, the largest open-source AI model was Meta’s Llama 3.1 with 405 billion parameters.