Search

Alibaba Qwen 2.5 Omni AI Model With Real-Time Speech Generation Released

The Qwen 2.5 Omni is an end-to-end multimodal model that can process text, images, audio, and video.

Advertisement
Highlights
  • Alibaba’s latest AI model is capable of real-time voice and video chat
  • Qwen2.5-Omni outperforms the Qwen2-Audio in audio capabilities
  • Alibaba said the AI model uses the Thinker-Talker architecture
Alibaba Qwen 2.5 Omni AI Model With Real-Time Speech Generation Released

The open-source Qwen 2.5 Omni AI model is available via Hugging Face and GitHub

Photo Credit: Alibaba

Alibaba's Qwen team released a new artificial intelligence (AI) model in the Qwen 2.5 family on Wednesday. Dubbed Qwen 2.5 Omni, it is a flagship-tier end-to-end multimodal model. The company claims it can process a wide range of inputs, including text, images, audio, and videos, while generating real-time text and natural speech responses. It is said to enable the building and deployment of cost-effective AI agents due to its diverse skill set. Alibaba has also employed a new “Thinker-Talker” architecture for the Qwen 2.5 Omni AI model.

Qwen 2.5 Omni AI Model Released

In a blog post, the Qwen team detailed the new Qwen 2.5 Omni AI model, which is a seven-billion-parameter system. The most notable capability of this omnimodal model is the real-time speech generation and video chat capability, which will allow the large language model (LLM) to answer queries and interact with users verbally in a humanlike manner. So far, this capability is only available with Google and OpenAI's models, which are closed-source. Alibaba, on the other hand, has open-sourced the technology.

Coming to the features, it accepts text, images, audio, and video as input as well as output. The model is also capable of real-time voice interactions and video chats. The Qwen team also highlights that the model will also offer real-time streaming of speech in a natural manner. Additionally, it is claimed to come with enhanced performance in end-to-end speech instruction.

The Qwen team highlighted that the Omni model is built on a novel “Thinker-Talker” architecture. The Thinker component functions like a brain and is responsible for processing and understanding input across modalities, and generating text output. It is essentially a Transformer decoder that encodes audio and image and assists with information extraction.

qwen omni benchmark Qwen Omni benchmark

Qwen 2.5 Omni benchmark
Photo Credit: Alibaba

 

On the other hand, the Talker component operates like a human mouth, the researchers said. It streams the information produced by the Thinker component and generates a stream-like output for speech fluidity. It is designed as a dual-track autoregressive Transformer decoder. This entire architecture operates as a single model, allowing real-time text and speech generation, enabling end-to-end training and inference.

Based on internal testing, the Qwen 2.5 Omni AI model is said to outperform the Gemini 1.5 Pro model on the OmniBench. It also outperforms Qwen 2.5-VL-7B, Qwen2-Audio on single-modality tasks.

The AI model is now available on Alibaba's Hugging Face listing and GitHub listing. Additionally, users can test out the new model via Qwen Chat as well as the company's community ModelScope.

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

 
Show Full Article
Please wait...
Advertisement

Related Stories

Popular Mobile Brands
  1. Redmi Pad 2 With 11-Inch 2.5K Display, 9,000mAh Battery Launched in India
  2. iQOO Z10 Lite 5G With 6,000mAh Battery Launched in India: Price, Features
  3. Nothing Phone 3 to Be Equipped With the Snapdragon 8s Gen 4 SoC
  4. Vivo X200 FE Launch Date, Colours, and Design Revealed Ahead of Launch
  5. Samsung Galaxy M36 5G to Launch in India Soon; Design, Price Range Teased
  6. Nothing Headphone 1 Price, Colour Options Leaked Ahead of Launch
  7. Apple Back to School Offer Brings Discounts on iPad Air, Other Products
  8. Top Smartphones Under Rs 50,000 in India (June 2025): Check List
  9. Trump Mobile T1 Phone With 5,000mAh Battery Announced; See Price, Features
  10. ChatGPT on WhatsApp Can Now Generate Images, But There's a Catch
  1. Google’s Gemini 2.5 Pro and Flash AI Models Are Now Generally Available to All Users
  2. Apple Bringing Journal App to iPad and Mac With iPadOS 26 and macOS 26 Tahoe
  3. Ancient Scrolls Found in Qumran Caves Unlock Secrets of Jewish History and Biblical Texts
  4. Ancient Xiaohe Burials Uncovered: Boat Coffins, Cattle Symbols, and More
  5. Supernovas May Have Triggered Deadly Ancient Climate Shifts, and They Could Happen Again
  6. Astrophotographer Captures Stunning Portrait of Lagoon and Trifid Nebulas in Glowing Detail
  7. NSF’s DKIST Captures Sharpest-Ever Images of Fine Solar Magnetic Striations
  8. Microsoft Says Next-Gen Xbox Will Run on AMD Chip, Not Be 'Locked to a Single Store'
  9. Redmi Pad 2 With 11-Inch 2.5K Display, 9,000mAh Battery Launched in India: Price, Specifications
  10. iQOO Z10 Lite 5G With MediaTek Dimensity 6300 SoC, 6,000mAh Battery Launched in India: Price, Specifications
Gadgets 360 is available in
Download Our Apps
App Store App Store
Available in Hindi
App Store
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.
Trending Products »
Latest Tech News »