Google Introduces PaliGemma 2 Family of Open Source AI Vision-Language Models

PaliGemma 2 AI models can see, understand, and interact with visual input.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 6 December 2024 16:29 IST
Highlights
  • PaliGemma 2 is available in 3B, 10B, and 28B parameter sizes
  • The new vision models are built on Google’s Gemma 2 AI models
  • Google says PaliGemma 2 can describe actions and emotions in an image

PaliGemma 2 is the successor to Google’s PaliGemma which was released in May

Photo Credit: Google

Google introduced the successor to its PaliGemma artificial intelligence (AI) vision-language model on Thursday. Dubbed PaliGemma 2, the family of AI models improve upon the capabilities of the older generation. The Mountain View-based tech giant said the vision-language model can see, understand, and interact with visual input such as images and other visual assets. It is built using the Gemma 2 small language models (SLM) which were released in August. Interestingly, the tech giant claimed that the model can analyse emotions in the uploaded images.

Google PaliGemma AI Model

In a blog post, the tech giant detailed the new PaliGemma 2 AI model. While Google has several vision-language models, PaliGemma was the first such model in the Gemma family. Vision models are different from typical large language models (LLMs) in that they have additional encoders that can analyse visual content and convert it into familiar data form. This way, vision models can technically “see” and understand the external world.

One benefit of a smaller vision model is that it can be used for a large number of applications as smaller models are optimised for speed and accuracy. With PaliGemma 2 being open-sourced, developers can use its capabilities to build into apps.

Advertisement

The PaliGemma 2 comes in three different parameter sizes of 3 billion, 10 billion, and 28 billion. It is also available in 224p, 448p, 896p resolutions. Due to this, the tech giant claims that it is easy to optimise the AI model's performance for a wide range of tasks. Google says it generates detailed, contextually relevant captions for images. It can not only identify objects but also describe actions, emotions, and overall narrative of the scene.

Advertisement

Google highlighted that the tool can be used for chemical formula recognition, music score recognition, spatial reasoning, and chest X-ray report generation. The company has also published a paper in the online pre-print journal arXiv.

Developers and AI enthusiasts can download the PaliGemma 2 model and its code on Hugging Face and Kaggle here and here. The AI model supports frameworks such as Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cpp.

 

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

Advertisement

Related Stories

Popular Mobile Brands
  1. Samsung Galaxy S25 FE Tipped to Go On Sale At This Price in India
  2. You Can Now Sign Up to Test Xiaomi's HyperOS 3 Update
  3. OTT Releases This Week: Coolie, Saiyaara, a Tamannaah Bhatia Web Series
  4. Oppo F31 Series Specifications Confirmed Ahead of India Launch
  5. Samsung Galaxy F17 5G With 5,000mAh Battery Launched in India
  6. Vivo X300 Pro Will Launch With These Upgrades Over the Vivo X200 Pro
  7. Acer Nitro V15 (2025) Launched in India With This Nvidia RTX 50-Series GPU
  8. Experts Warn Against Charlie Kirk Tokens Amidst Backlash, Volatility
  9. Realme P3 Lite 5G Price in India Revealed Ahead of Launch
  1. Love Is Blind Season 9: Release Date, Cast, Trailer, and What to Expect
  2. Beauty in Black Season 2 Is Now Streaming on Netflix: This Is What You Need to Know
  3. Experts Warn Against Crypto Tokens Linked to Charlie Kirk Amidst Backlash, Volatility
  4. Vivo X300 Series Key Specifications, Performance Upgrades Revealed Ahead of Anticipated Launch
  5. Xiaomi Moaan InkPalm Mini Plus 2 E-Reader Launched With 5.84-Inch Display, 512GB Storage
  6. iPhone 17 Series Still Behind Samsung Smartphones in Battery Longevity: Report
  7. Police Police OTT Release: Know When, Where to Watch the Tamil Crime Drama Series
  8. Union Minister Jayant Chaudhary, Wife Disclose Over Rs. 43 Lakh in Crypto Assets
  9. The Naked Gun OTT Release Date Revealed: Know When and Where to Watch the Liam Neeson Starrer Online
  10. Samsung Galaxy S25 FE Price in India Leaked; Might Be Similar to Price of Galaxy S24 FE at Launch
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.