Alibaba Unveils QVQ-72B: AI Model Exceeds OpenAI's o1!

Alibaba’s Qwen research team has unveiled a new open-source artificial intelligence (AI) model, named QVQ-72B, which focuses on vision-based reasoning. This cutting-edge model is designed to analyze visual information from images and comprehend their underlying context. Alongside the model’s release, Alibaba published benchmark scores demonstrating that QVQ-72B surpassed OpenAI’s o1 model in a specific evaluation. The release adds to Alibaba’s growing portfolio of open-source AI models, which includes the QwQ-32B and Marco-o1, both of which prioritize reasoning capabilities.

Launch of Alibaba’s Vision-Based QVQ-72B AI Model

Described as an experimental research model in a listing on Hugging Face, the Qwen team emphasized the QVQ-72B’s advanced visual reasoning features. This new model integrates two distinct branches of performance, united to enhance its analytical capacity.

Numerous vision-based AI models exist, typically incorporating an image encoder that interprets visual data and context. Conversely, reasoning-oriented models, such as o1 and QwQ-32B, are equipped with compute scaling capabilities that allow for extended processing times during evaluations. This feature helps models to dissect problems step by step, evaluate outputs, and make corrections in collaboration with a verifying system.

The QVQ-72B combines both functionalities, enabling it to interpret visual information while answering intricate questions using reasoning frameworks. Researchers have noted marked improvements in the model’s performance metrics.

In internal evaluations, the Qwen team reported that QVQ-72B achieved a score of 71.4 percent on the MathVista (mini) benchmark, surpassing the o1 model’s score of 71.0 percent. The model also recorded a score of 70.3 percent on the Multimodal Massive Multi-task Understanding (MMMU) benchmark.

However, despite these advancements, the model is not without limitations. The Qwen team acknowledged that QVQ-72B occasionally experiences code-switching, transitioning between languages in unexpected ways. Additionally, the model can become trapped in recursive reasoning loops, which may impact the accuracy of its outputs.

Alibaba Unveils QVQ-72B: AI Model Exceeds OpenAI’s o1!

Comment

Alibaba Unveils QVQ-72B: AI Model Exceeds OpenAI’s o1!

Share This Post

or copy the link

Launch of Alibaba’s Vision-Based QVQ-72B AI Model

Tamamen Ücretsiz Olarak Bültenimize Abone Olabilirsin

Related News

Gemini Live Heads to Google Chrome: What to Expect!

Asus Set to Unveil ‘World’s Lightest’ Laptop at CES 2025!

Google’s Gemini AI Evaluated Against Anthropic’s Claude

Samsung Eyes Smart Glasses with Audio and AR Features

Microsoft Unveils AIOpsLab: Revolutionizing Cloud AI

Write a Reply Cancel