Hugging Face Introduces Compact Versions of SmolVLM Vision Language Model That Can Run on Consumer Laptops

Hugging Face claimed that the SmolVLM-256M is the world’s smallest vision language model.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 27 January 2025 17:14 IST
Highlights
  • The new SmolVLM models are available in 256M and 500M parameter sizes
  • SmolVLM can analyse images and process visual information at high speeds
  • The open-source models are available with an Apache 2.0 licence

Hugging Face introduced the SmolVLM 2B model in December 2024

Photo Credit: Hugging Face

Hugging Face introduced two new variants to its SmolVLM vision language models last week. The new artificial intelligence (AI) models are available in 256 million and 500 million parameter sizes, with the former being claimed as the world's smallest vision model by the company. The new variants focus on retaining the efficiency of the older two-billion parameter model while reducing the size significantly. The company highlighted that the new models can be locally run on constrained devices, consumer laptops, or even potentially browser-based inference.

Hugging Face Introduces Smaller SmolVLM AI Models

In a blog post, the company announced the SmolVLM-256M and SmolVLM-500M vision language models, in addition to the existing 2 billion parameter model. The release brings two base models and two instruction fine-tuned models in the abovementioned parameter sizes.

Hugging Face said that these models can be loaded directly to transformers, Machine Learning Exchange (MLX), and Open Neural Network Exchange (ONNX) platforms and developers can build on top of the base models. Notably, these are open-source models available with an Apache 2.0 licence for both personal and commercial usage.

Advertisement

With the new AI models, Hugging Face aims to bring multimodal models focused on computer vision to portable devices. The 256 million parameter model, for instance, can be run on less than one GB of GPU memory and 15GB of RAM to process 16 images per second (with a batch size of 64).

Advertisement

Andrés Marafioti, a machine learning research engineer at Hugging Face told VentureBeat, “For a mid-sized company processing 1 million images monthly, this translates to substantial annual savings in compute costs.”

To reduce the size of the AI models, the researchers switched the vision encoder from the previous SigLIP 400M to a 93M-parameter SigLIP base patch. Additionally, the tokenisation was also optimised. The new vision models encode images at a rate of 4096 pixels per token, compared to 1820 pixels per token in the 2B model.

Advertisement

Notably, the smaller models are also marginally behind the 2B model in terms of performance, but the company said this trade-off has been kept at a minimum. As per Hugging Face, the 256M variant can be used for captioning images or short videos, answering questions about documents, and basic visual reasoning tasks.

Developers can use transformers and MLX for inference and fine-tuning the AI model as they work with the old SmolVLM code out-of-the-box. These models are also listed on Hugging Face.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. Xiaomi 17 Ultra Finally Arrives in India at This Price
  2. Poco X8 Pro Series Confirmed to Launch in India With This Battery
  3. Vivo Y51 Pro 5G Launched With 7,200mAh Battery at This Price in India
  4. This 'Digital Lutera' Android Malware Can Hijack Your UPI Account
  5. Samsung Galaxy A57 Renders Leak Online Again; Launch Expected Soon
  6. Xiaomi 17 Launched in India With Snapdragon 8 Elite Gen 5, Leica Cameras
  7. DxOMark Ranks iPhone 17 Pro Above Galaxy S26 Ultra in Camera Performance
  8. Canva's AI-Powered Magic Layers Turns Images Into Editable Designs
  9. Samsung Galaxy S26 Series Goes on Sale in India: See Price, Features
  10. WhatsApp Adds Support for Parent-Managed Accounts for Children Under 13
  1. James Webb Telescope Captures Rare Infrared Footprints of Io and Ganymede Inside Jupiter’s Auroras
  2. WhatsApp Adds Support for Parent-Managed Accounts With Stricter Controls for Children Under 13
  3. Crimson Desert PC and Console Specs Revealed: Here's How the Game Will Run on PS5 and Xbox Series S/X
  4. Perplexity Ordered to Stop Deploying Shopping AI Agents on Amazon: Report
  5. Sonos Play and Sonos Era 100 SL Launched With Wi-Fi 6 Connectivity, AirPlay 2 Support: Price, Features
  6. Oppo Find N6 Colourways, Storage Variants Revealed as Company Teases Crease-Free Display's Components
  7. Canva’s New AI-Powered Magic Layers Feature Turns Images Into Editable Designs
  8. Tokenised Real-World Assets See 66 Percent Jump in 2026, DeFiLlama Data Shows
  9. The Society Season 2 OTT Release: Where to Watch Munawar Faruqui and Shreya Kalra’s Reality Survival Series?
  10. YouTube’s Likeness Detection Tool Expanded to Government Officials and Journalists
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.