Alibaba Qwen 2.5 Omni AI Model With Real-Time Speech Generation Released

The Qwen 2.5 Omni is an end-to-end multimodal model that can process text, images, audio, and video.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 27 March 2025 15:47 IST
Highlights
  • Alibaba’s latest AI model is capable of real-time voice and video chat
  • Qwen2.5-Omni outperforms the Qwen2-Audio in audio capabilities
  • Alibaba said the AI model uses the Thinker-Talker architecture

The open-source Qwen 2.5 Omni AI model is available via Hugging Face and GitHub

Photo Credit: Alibaba

Alibaba's Qwen team released a new artificial intelligence (AI) model in the Qwen 2.5 family on Wednesday. Dubbed Qwen 2.5 Omni, it is a flagship-tier end-to-end multimodal model. The company claims it can process a wide range of inputs, including text, images, audio, and videos, while generating real-time text and natural speech responses. It is said to enable the building and deployment of cost-effective AI agents due to its diverse skill set. Alibaba has also employed a new “Thinker-Talker” architecture for the Qwen 2.5 Omni AI model.

Qwen 2.5 Omni AI Model Released

In a blog post, the Qwen team detailed the new Qwen 2.5 Omni AI model, which is a seven-billion-parameter system. The most notable capability of this omnimodal model is the real-time speech generation and video chat capability, which will allow the large language model (LLM) to answer queries and interact with users verbally in a humanlike manner. So far, this capability is only available with Google and OpenAI's models, which are closed-source. Alibaba, on the other hand, has open-sourced the technology.

Advertisement

Coming to the features, it accepts text, images, audio, and video as input as well as output. The model is also capable of real-time voice interactions and video chats. The Qwen team also highlights that the model will also offer real-time streaming of speech in a natural manner. Additionally, it is claimed to come with enhanced performance in end-to-end speech instruction.

The Qwen team highlighted that the Omni model is built on a novel “Thinker-Talker” architecture. The Thinker component functions like a brain and is responsible for processing and understanding input across modalities, and generating text output. It is essentially a Transformer decoder that encodes audio and image and assists with information extraction.

Advertisement

Qwen 2.5 Omni benchmark
Photo Credit: Alibaba

 

On the other hand, the Talker component operates like a human mouth, the researchers said. It streams the information produced by the Thinker component and generates a stream-like output for speech fluidity. It is designed as a dual-track autoregressive Transformer decoder. This entire architecture operates as a single model, allowing real-time text and speech generation, enabling end-to-end training and inference.

Based on internal testing, the Qwen 2.5 Omni AI model is said to outperform the Gemini 1.5 Pro model on the OmniBench. It also outperforms Qwen 2.5-VL-7B, Qwen2-Audio on single-modality tasks.

Advertisement

The AI model is now available on Alibaba's Hugging Face listing and GitHub listing. Additionally, users can test out the new model via Qwen Chat as well as the company's community ModelScope.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. Nothing's Ear 3a Could Arrive With Familiar Price Tag, New Colourway
  2. Moto G Max 5G With a 200-Megapixel Rear Camera Arrives at This Price
  3. Samsung Galaxy S25 Edge Now Listed at Half of Its Launch Price in India
  4. New OTT Releases This Week: Bhooth Bangla, Raakh, Dridam, Karuppu, and More
  5. iPhone 18 Pro Max Design and Colourways Revealed in New Leak
  1. Starlink Constellation Crosses 10,600 Satellites After Latest SpaceX Launch
  2. WhatsApp Could Soon Offer Meta One Plus, Meta One Premium Subscriptions With Additional Features
  3. Honor Tipped to Launch Smartphone With 10,000-Nit Display and 10,000mAh Battery
  4. Samsung Galaxy A27 5G Listing on Czech Website Leaves Little to the Imagination Ahead of Imminent Debut
  5. Asus Chromebook CM32 Detachable With 2.5K Display Launched in India Alongside Chromebook CM14, CM15
  6. Apple's iPhone 18 Pro Max Leaks in New Hands-On Images Ahead of Anticipated September Launch Event
  7. Authorities Shut $390 Million Crypto Money-Laundering Scheme in International Sting Operation
  8. Astronomers Discover Why Massive Galaxies Died Early in the Universe
  9. Samsung Galaxy Z Fold 8, Z Fold 8 Ultra and Z Flip 8 Display Shapes Revealed via Leaked Image of Screen Protectors
  10. Nothing CEO Carl Pei Predicts Smartphones May Not Get Major Discounts During Sales Due to Ongoing Chip Shortage
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.