Alibaba’s Qwen Team Releases QVQ-72B Open Source Vision AI Model in Preview

The QVQ-72B AI model outperformed OpenAI o1 on the Math Vista benchmark.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 26 December 2024 14:48 IST
Highlights
  • The QVQ-72B AI model combines vision and reasoning-based capabilities
  • Alibaba’s latest model is built on the Qwen2-VL-72B
  • It also scored 70.3 percent on the MMLU benchmark

The QVQ-72B AI model can be accessed from Hugging Face

Photo Credit: Unsplash/Markus Winkler

Alibaba's Qwen research team has released another open-source artificial intelligence (AI) model in preview. Dubbed QVQ-72B, it is a vision-based reasoning model that can analyse visual information from images and understand the context behind them. The tech giant has also shared benchmark scores of the AI model and highlighted that on one specific test, it was able to outperform OpenAI's o1 model. Notably, Alibaba has released several open-source AI models recently, including the QwQ-32B and Marco-o1 reasoning-focused large language models (LLMs).

Alibaba's Vision-Based QVQ-72B AI Model Launched

In a Hugging Face listing, Alibaba's Qwen team detailed the new open-source AI model. Calling it an experimental research model, the researchers highlighted that the QVQ-72B comes with enhanced visual reasoning capabilities. Interestingly, these are two separate branches of performance, that the researchers have combined in this model.

Advertisement

Vision-based AI models are plenty. These include an image encoder and can analyse the visual information and context behind them. Similarly, reasoning-focused models such as o1 and QwQ-32B come with test-time compute scaling capabilities that allow them to increase the processing time for the model. This enables the model to break down the problem, solve it in a step-by-step manner, assess the output and correct it against a verifier.

With QVQ-72B's preview model, Alibaba has combined these two functionalities. It can now analyse information from images and answer complex queries by using reasoning-focused structures. The team highlights that it has significantly improved the performance of the model.

Advertisement

Sharing evals from internal testing, the researchers claimed that the QVQ-72B was able to score 71.4 percent in the MathVista (mini) benchmark, outperforming the o1 model (71.0). It is also said to score 70.3 percent on the Multimodal Massive Multi-task Understanding (MMMU) benchmark.

Despite the improved performance, there are several limitations, as is the case with most experimental models. The Qwen team stated that the AI model occasionally mixes different languages or unexpectedly switches between them. The code-switching issue is also prominent in the model. Additionally, the model is prone to getting caught in recursive reasoning loops, affecting the final output.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. Vivo Y6 5G Debuts With 7,200mAh Battery, 6.75-Inch Screen at This Price
  2. OTT Releases This Week: 24, Band Melam, Nukkad Naatak, Prathichaya, and More
  3. Leaked Dummy Gives Us an Early Look at the Design of the iPhone 18 Pro Max
  4. Instagram Launches Instants App With Disappearing Photos to Rival Snapchat
  5. Xbox Chief Asha Sharma Sets New Strategy, Says Will Reevaluate Exclusives
  6. Xiaomi Mix Fold 5 Might Be in Development With This In-House Chip
  7. Assassin's Creed Black Flag Resynced Revealed: Everything You Need to Know
  8. New Marketing-Focused Agentic AI Workflows Previewed at Adobe Summit 2026
  9. Redmi Note 17 Pro Max Leak Reveals Chipset, Camera Details
  10. Honor Earbuds 4 With Up to 46 Hours of Total Battery Life Debut Globally
  1. Realme C100x Tipped to Launch in India Soon as Key Specifications and Design Surface Online
  2. Jio Youth and Gaming Plan With Snapchat+, FanCode and Gemini Pro Launched: Price, Benefits
  3. Infinix GT 50 Pro Launched With Dimensity 8400 Ultimate, HydroFlow Liquid Cooling, Shoulder Triggers: Price, Features
  4. Adobe Previews New Agentic AI Workflows for Marketing Tasks at Adobe Summit 2026
  5. Microsoft Gaming Rebrands to Xbox, Debuts New Logo as Xbox Chief Says Company Reevaluating Exclusive Games
  6. Instagram Launches Instants App With Disappearing Photos to Rival Snapchat, BeReal
  7. Prathichaya (2026) Now Streaming Online: What You Need to Know
  8. Vivo X500 Series Tipped to Launch With 144Hz Displays, Ultrasonic Fingerprint Scanners
  9. Kelp Exploit Aftermath: DeFi Protocols Join Hands to Restore rsETH Following $293 Million Hack
  10. Microsoft Makes Copilot’s Agentic Features in Word, Excel and PowerPoint Generally Available
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.