On Saturday, Meta announced the launch of the first artificial intelligence models in the Llama 4 series. The tech company, headquartered in Menlo Park, introduced two variants — Llama 4 Scout and Llama 4 Maverick — both featuring native multimodal capabilities and made available to the open community. These models are distinguished as the first open-source versions designed with Mixture-of-Experts (MoE) architecture, offering improved context windows and enhanced power efficiency compared to their predecessors. Furthermore, Meta provided a preview of Llama 4 Behemoth, the largest model in the lineup to date.
In a blog post, Meta elaborated on the specifications and functionalities of its new AI offerings. Like their earlier counterparts, Llama 4 Scout and Llama 4 Maverick are open-source models, available for download from both the Hugging Face listing and the dedicated Llama website. Starting today, users can interact with the Llama 4 AI models across platforms such as WhatsApp, Messenger, Instagram Direct, and the Meta.AI website.
The Llama 4 Scout is equipped with 17 billion active parameters and utilizes 16 experts, allowing it to operate on a single Nvidia H100 GPU. The Maverick model, on the other hand, also features 17 billion active parameters but employs 128 experts. Meta asserts that the forthcoming Llama 4 Behemoth will outperform competitors like GPT-4.5, Claude Sonnet 3.7, and Gemini 2.0 Pro across various benchmarks, boasting 288 billion active parameters and 16 experts, although it remains in training and is not yet publicly available.
The MoE architecture in Llama 4 AI models
Photo Credit: Meta
The foundation of the Llama 4 models is MoE architecture, which selectively activates a portion of the overall parameters based on the specific needs of the input prompt. This approach promotes computational efficiency during both training and inference phases. During pre-training, Meta utilized innovative strategies such as early fusion to simultaneously process text and vision tokens, alongside MetaP to refine vital model hyper-parameters and initialisation scales.
For post-training refinement, Meta began with lightweight supervised fine-tuning (SFT), followed by online reinforcement learning (RL) and lightweight direct preference optimization (DPO), strategically avoiding over-constraining the model. The SFT process was conducted on only half of the “harder” dataset to ensure effective training.
Based on internal evaluations, the Maverick model has shown superior performance relative to Gemini 2.0 Flash, DeepSeek v3.1, and GPT-4o across multiple benchmarks, including MMMU (image reasoning), ChartQA (image understanding), GPQA Diamond (reasoning and knowledge), and MTOB (long context). Similarly, the Scout model has demonstrated stronger capabilities than Gemma 3, Mistral 3.1, and Gemini 2.0 on benchmarks like MMMU, ChartQA, MMLU (reasoning and knowledge), GPQA Diamond, and MTOB.
Meta has also implemented measures to enhance safety throughout the pre-training and post-training stages of model development. During pre-training, the company employed data filtering processes to exclude harmful information from the models’ knowledge base. For post-training safety, open-source tools such as Llama Guard and Prompt Guard have been integrated to mitigate the risk of external attacks. Moreover, internal stress tests and red-teaming activities have been conducted for both the Llama 4 Scout and Maverick models.
Importantly, the models are accessible to the open community under a permissive Llama 4 license, which permits both academic and commercial utilization. However, Meta has restricted access to its AI models for companies that exceed 700 million monthly active users.