Hugging Face Introduces Open-Source SmolVLM Vision Language Model Focused on Efficiency

Hugging Face’s SmolVLM is part of the company’s 2B small vision language models.

Advertisement
Written by Akash Dutta, Edited by Manas Mitul | Updated: 2 December 2024 17:40 IST
Highlights
  • Hugging Face used a similar architecture to the Idefics3
  • The vision model is based on the SmolLM2 1.7B language model
  • SmolVLM requires 5.02GB of GPU RAM to operate
Hugging Face Introduces Open-Source SmolVLM Vision Language Model Focused on Efficiency

SmolVLM uses 81 visual tokens to encode image patches of size 384 x 384

Photo Credit: Hugging Face

Hugging Face, the artificial intelligence (AI) and machine learning (ML) platform, introduced a new vision-focused AI model last week. Dubbed SmolVLM (where VLM is an acronym for vision language model), it is a compact-sized model that is focused on efficiency. The company claims that due to its smaller size and high efficiency, it can be useful for enterprises and AI enthusiasts who want AI capabilities without investing a lot in its infrastructure. Hugging Face has also open-sourced the SmolVLM vision model under the Apache 2.0 license for both personal and commercial usage.

Hugging Face Introduces SmolVLM

In a blog post, Hugging Face detailed the new open-source vision model. The company called the AI model “state-of-the-art” for its efficient usage of memory and fast inference. Highlighting the usefulness of a small vision model, the company noted the recent trend of AI firms scaling down models to make them more efficient and cost-effective.

Small vision model ecosystem
Photo Credit: Hugging Face

The SmolVLM family has three AI model variants, each with two billion parameters. The first is SmolVLM-Base, which is the standard model. Apart from this, SmolVLM-Synthetic is the fine-tuned variant trained on synthetic data (data generated by AI or computer), and SmolVLM Instruct is the instruction variant that can be used to build end-user-centric applications.

Advertisement

Coming to technical details, the vision model can operate with just 5.02GB of GPU RAM, which is significantly lower than Qwen2-VL 2B's requirement of 13.7GB of GPU RAM and InternVL2 2B's 10.52GB of GPU RAM. Due to this, Hugging Face claims that the AI model can run on-device on a laptop.

SmolVLM can accept a sequence of text and images in any order and analyse them to generate responses to user queries. It encodes 384 x 384p resolution image patches to 81 visual data tokens. The company claimed that this enables the AI to encode test prompts and a single image in 1,200 tokens, as opposed to the 16,000 tokens required by Qwen2-VL.

Advertisement

With these specifications, Hugging Face highlights that SmolVLM can be easily used by smaller enterprises and AI enthusiasts and be deployed to localised systems without the tech stack requiring a major upgrade. Enterprises will also be able to run the AI model for text and image-based inferences without incurring significant costs.

 

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

Advertisement
Popular Mobile Brands
  1. Vivo Y400 Pro 5G India Launch Today: All You Need to Know
  2. Oppo Reno 14 5G Series Teased to Launch in India Soon
  3. OTT Releases This Week: Ground Zero, Detective Sherdil, Found S2, and More
  4. Samsung Galaxy M36 5G India Launch Date and Key Features Revealed
  5. Samsung Galaxy Z Fold 7 Leaked Renders Suggest Design Changes
  6. Nothing Phone 3 to Get New Glyph Matrix Interface on the Rear Panel
  7. YouTube Shorts Will Soon Let You Create AI Video Clips With Veo 3 Model
  8. Vodafone Idea to Bring Direct-to-Device Satellite Connectivity to India
  9. OnePlus Bullets Wireless Z3 With Up to 36 Hours Battery Launched in India
  10. Realme Buds Air 7 Pro Review: Eye-Catching Design, Thumping Bass
  1. YouTube Shorts to Bring Google’s Veo 3 Video Generation Model With Audio Support 'This Summer'
  2. Samsung Galaxy Z Fold 7 Leaked Renders Hint at Design Changes; Storage Options Tipped
  3. Vivo Y400 Pro 5G Launching Today: Price in India, Expected Features and Specifications
  4. Fast Radio Bursts Reveal Universe’s Missing Matter Hidden in Cosmic Intergalactic Fog
  5. Apollo Astronauts Found Orange Glass Beads on the Moon, Scientists Now Know Why
  6. World’s Oldest Tailored Dress Found in Egyptian Tomb Dates Back Over 5,000 Years
  7. Ancient Footprints in White Sands Confirm Humans Reached America 23,000 Years Ago
  8. Humanoid Robot Achieves Controlled Flight Using Jet Propulsion and AI Systems
  9. Curiosity Rover Reaches Uyuni Quad, Begins New Mars Mapping and Surface Analysis Campaign
  10. NASA to Gather Reentry Imagery of European Commercial Capsule Using High-Altitude Aircraft
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.