1. News
  2. AI
  3. Alibaba Unveils Groundbreaking Qwen 2.5 Omni AI

Alibaba Unveils Groundbreaking Qwen 2.5 Omni AI

featured
Share

Share This Post

or copy the link

Alibaba’s Qwen team unveiled their latest artificial intelligence (AI) model, Qwen 2.5 Omni, on Wednesday. This flagship multimodal model is designed to handle a variety of inputs—including text, images, audio, and video—while producing real-time text responses and natural speech. The company asserts that this versatile model facilitates the creation and deployment of cost-efficient AI agents, bolstered by the innovative “Thinker-Talker” architecture utilized in the Qwen 2.5 Omni.

Qwen 2.5 Omni AI Model Released

The Qwen team elaborated on the new AI model in a blog post, detailing that it comprises seven billion parameters. A standout feature of the Qwen 2.5 Omni is its ability to generate speech in real time and facilitate video chat, enabling the large language model (LLM) to interact with users in a conversational, human-like manner. While similar features are found in models offered by Google and OpenAI, those remain closed-source, whereas Alibaba has opted to make its technology open-source.

Regarding functionality, the model accommodates inputs and outputs of text, images, audio, and video. It is equipped to conduct real-time voice interactions and video chats, with the Qwen team showcasing its capability for natural speech streaming. Additionally, the model promises improved performance in end-to-end speech instruction.

The architecture of the Omni model incorporates a unique “Thinker-Talker” framework. The Thinker component operates akin to a brain, responsible for processing and comprehending inputs from various modalities and generating textual outputs. Functioning as a Transformer decoder, it encodes both audio and images to aid in information extraction.

qwen omni benchmark Qwen Omni benchmark

Qwen 2.5 Omni benchmark
Photo Credit: Alibaba

 

Conversely, the Talker component functions similarly to a human mouth. According to researchers, this part relays information from the Thinker, producing fluid, speech-like output. It is structured as a dual-track autoregressive Transformer decoder, allowing the entire architecture to function as a comprehensive model capable of real-time text and speech generation, thereby supporting seamless training and inference.

Internal evaluations indicate that the Qwen 2.5 Omni outperforms the Gemini 1.5 Pro model on the OmniBench, along with superior performance in single-modality tasks compared to Qwen 2.5-VL-7B and Qwen2-Audio.

The AI model is currently accessible through Alibaba’s listings on Hugging Face here and GitHub here. Users can also experience the model through Qwen Chat and the company’s community platform, ModelScope.

Alibaba Unveils Groundbreaking Qwen 2.5 Omni AI
Comment

Tamamen Ücretsiz Olarak Bültenimize Abone Olabilirsin

Yeni haberlerden haberdar olmak için fırsatı kaçırma ve ücretsiz e-posta aboneliğini hemen başlat.

Your email address will not be published. Required fields are marked *

Login

To enjoy Technology Newso privileges, log in or create an account now, and it's completely free!