Alibaba Qwen 2.5 Vision Language Model Released in a Smaller Size, Packs Agentic Capabilities

The new AI model is dubbed Qwen 2.5-VL-32B Instruct, and it joins the 3B, 7B and 72B sizes.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 26 March 2025 17:54 IST
Highlights
  • The AI model is available to the open community with Apache 2.0 licence
  • Alibaba says its responses are more aligned with human preferences
  • Qwen-2.5-VL-32B outperforms AI models of comparable size

Besides visual capabilities, the latest Qwen model also comes with improvements in text functions

Photo Credit: Reuters

Alibaba's Qwen team released another artificial intelligence (AI) model to the Qwen 2.5 family on Monday. Dubbed Qwen 2.5-VL-32B Instruct, the AI model comes with improved performance and optimisations. It is a vision language model with 32 billion parameters, and joins the three billion, seven billion, and 72 billion parameter size models in the Qwen 2.5 family. Just like all previous models by the team, it is also an open-source AI model available under a permissive license.

Alibaba Releases Qwen 2.5-VL-32B AI Model

In a blog post, the Qwen team detailed the company's latest vision language model (VLM). It is more capable than the Qwen 2.5 3B and 7B models, and smaller than the foundation 72B model. The large language model's (LLM) older versions outperformed DeepSeek-V3, and the 32B model is said to be outperforming Google and Mistral's similar sized systems.

Advertisement

Coming to its features, the Qwen 2.5-VL-32B-Instruct has an adjusted output style that provides more detailed and better-formatted responses. The researchers claimed that the responses are closely aligned with human preferences. Mathematical reasoning capability has also been improved, and the AI model can solve more complex problems.

The accuracy of image understanding capability and reasoning-focused analysis, including image parsing, content recognition, and visual logic deduction, has also been improved.

Advertisement

Qwen 2.5-VL-32B-Instruct
Photo Credit: Qwen

 

Based on internal testing, the Qwen 2.5-VL-32B is claimed to have surpassed the capabilities of comparable models, such as Mistral-Small-3.1-24B and Google's Gemma-3-27B, on the MMMU, MMMU-Pro, and MathVista benchmarks. Interestingly, the LLM was also claimed to have outperformed the much larger Qwen 2-VL-72B model on the MM-MT-Bench.

The Qwen team highlights that the latest model can directly play as a visual agent that can reason and direct tools. It is inherently capable of computer use and phone use. It accepts text, images, and videos with more than one hour of duration as input. It also supports JSON and structured outputs.

Advertisement

The baseline architecture and training remain the same as the older Qwen 2.5 models, however, the researchers implemented a dynamic fps sampling to enable the model to comprehend videos at varying sampling rates. Another enhancement also lets it pinpoint specific moments in a video by gaining an understanding of temporal sequence and speed.

Qwen 2.5-VL-32B-Instruct is available to download on GitHub and its Hugging Face listing. The model comes with Apache 2.0 licence, which allows both academic and commercial usage.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. Oppo Reno 16 Series Price, Storage Variants Leak Ahead of Launch
  2. Realme Narzo Days Sale Brings Discounts on These Narzo Series Phones
  3. Samsung Galaxy S25 Edge Now Listed at Half of Its Launch Price in India
  4. iPhone 18 Pro Max Design and Colourways Revealed in New Leak
  5. New OTT Releases This Week: Bhooth Bangla, Raakh, Dridam, Karuppu, and More
  6. Moto G Max 5G With a 200-Megapixel Rear Camera Arrives at This Price
  1. Starlink Constellation Crosses 10,600 Satellites After Latest SpaceX Launch
  2. WhatsApp Could Soon Offer Meta One Plus, Meta One Premium Subscriptions With Additional Features
  3. Honor Tipped to Launch Smartphone With 10,000-Nit Display and 10,000mAh Battery
  4. Samsung Galaxy A27 5G Listing on Czech Website Leaves Little to the Imagination Ahead of Imminent Debut
  5. Asus Chromebook CM32 Detachable With 2.5K Display Launched in India Alongside Chromebook CM14, CM15
  6. Apple's iPhone 18 Pro Max Leaks in New Hands-On Images Ahead of Anticipated September Launch Event
  7. Authorities Shut $390 Million Crypto Money-Laundering Scheme in International Sting Operation
  8. Astronomers Discover Why Massive Galaxies Died Early in the Universe
  9. Samsung Galaxy Z Fold 8, Z Fold 8 Ultra and Z Flip 8 Display Shapes Revealed via Leaked Image of Screen Protectors
  10. Nothing CEO Carl Pei Predicts Smartphones May Not Get Major Discounts During Sales Due to Ongoing Chip Shortage
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.