1. News
  2. INTERNET
  3. DeepSeek Unveils Groundbreaking 671B Parameter AI Model!

DeepSeek Unveils Groundbreaking 671B Parameter AI Model!

featured
Share

Share This Post

or copy the link

On Thursday, DeepSeek, a Chinese artificial intelligence firm, launched its latest innovation: the DeepSeek-V3 AI model. This cutting-edge open-source large language model (LLM) boasts an impressive 671 billion parameters, outpacing the Meta Llama 3.1, which is equipped with 405 billion parameters. Researchers assert that the model is designed for efficiency, utilizing a mixture-of-expert (MoE) architecture that allows it to activate only specific parameters pertinent to each task, ensuring both efficacy and precision. However, it is important to note that DeepSeek-V3 is a text-only model and lacks multimodal capabilities.

DeepSeek-V3 AI Model Released

Currently, the open-source DeepSeek-V3 AI model is accessible on Hugging Face. The listing highlights the LLM’s focus on efficient inference and cost-effective training methodologies. To achieve these goals, the researchers implemented Multi-head Latent Attention (MLA) and DeepSeekMoE architectures.

The AI model operates by activating only the parameters relevant to the prompt, resulting in quicker processing times and enhanced accuracy compared to conventional models of similar size. Pre-trained on 14.8 trillion tokens, DeepSeek-V3 employs strategies such as supervised fine-tuning and reinforcement learning to produce high-quality outputs.

According to DeepSeek, the extensive model was trained over a span of 2.788 million hours using the Nvidia H800 GPU. Additionally, the architecture incorporates a load-balancing technique aimed at minimizing performance degradation, a method first introduced in its predecessor.

In terms of performance, internal evaluations claim that DeepSeek-V3 surpasses models like Meta Llama 3.1 and Qwen 2.5 across metrics such as Big-Bench High-Performance (BBH), Massive Multitask Language Understanding (MMLU), HumanEval, and MATH, among other benchmarks. Nonetheless, these assessments have yet to be validated by independent third-party researchers.

The standout feature of DeepSeek-V3 is undoubtedly its sheer size of 671 billion parameters. Though larger models do exist, such as the Gemini 1.5 Pro with one trillion parameters, the magnitude of this model in the open-source realm is notably uncommon. Previously, the largest open-source AI model was Meta’s Llama 3.1 with 405 billion parameters.

At this time, developers can access DeepSeek-V3’s code via its Hugging Face listing, governed by an MIT license for both personal and commercial use. Furthermore, users interested in experimenting with the AI can do so through the company’s online chatbot platform, and an API is also available for those looking to incorporate the model into their projects.

DeepSeek Unveils Groundbreaking 671B Parameter AI Model!
Comment

Tamamen Ücretsiz Olarak Bültenimize Abone Olabilirsin

Yeni haberlerden haberdar olmak için fırsatı kaçırma ve ücretsiz e-posta aboneliğini hemen başlat.

Your email address will not be published. Required fields are marked *

Login

To enjoy Technology Newso privileges, log in or create an account now, and it's completely free!