Google Introduces PaliGemma 2 Family of Open Source AI Vision-Language Models

PaliGemma 2 AI models can see, understand, and interact with visual input.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 6 December 2024 16:29 IST
Highlights
  • PaliGemma 2 is available in 3B, 10B, and 28B parameter sizes
  • The new vision models are built on Google’s Gemma 2 AI models
  • Google says PaliGemma 2 can describe actions and emotions in an image

PaliGemma 2 is the successor to Google’s PaliGemma which was released in May

Photo Credit: Google

Google introduced the successor to its PaliGemma artificial intelligence (AI) vision-language model on Thursday. Dubbed PaliGemma 2, the family of AI models improve upon the capabilities of the older generation. The Mountain View-based tech giant said the vision-language model can see, understand, and interact with visual input such as images and other visual assets. It is built using the Gemma 2 small language models (SLM) which were released in August. Interestingly, the tech giant claimed that the model can analyse emotions in the uploaded images.

Google PaliGemma AI Model

In a blog post, the tech giant detailed the new PaliGemma 2 AI model. While Google has several vision-language models, PaliGemma was the first such model in the Gemma family. Vision models are different from typical large language models (LLMs) in that they have additional encoders that can analyse visual content and convert it into familiar data form. This way, vision models can technically “see” and understand the external world.

Advertisement

One benefit of a smaller vision model is that it can be used for a large number of applications as smaller models are optimised for speed and accuracy. With PaliGemma 2 being open-sourced, developers can use its capabilities to build into apps.

The PaliGemma 2 comes in three different parameter sizes of 3 billion, 10 billion, and 28 billion. It is also available in 224p, 448p, 896p resolutions. Due to this, the tech giant claims that it is easy to optimise the AI model's performance for a wide range of tasks. Google says it generates detailed, contextually relevant captions for images. It can not only identify objects but also describe actions, emotions, and overall narrative of the scene.

Advertisement

Google highlighted that the tool can be used for chemical formula recognition, music score recognition, spatial reasoning, and chest X-ray report generation. The company has also published a paper in the online pre-print journal arXiv.

Developers and AI enthusiasts can download the PaliGemma 2 model and its code on Hugging Face and Kaggle here and here. The AI model supports frameworks such as Hugging Face Transformers, Keras, PyTorch, JAX, and Gemma.cpp.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. OnePlus Nord CE 6, Nord CE 6 Lite Launched in India at These Prices
  2. Redmi Teases Launch of New Device in India, Amazon Availability Confirmed
  3. Qualcomm Launches Two New Mobile Chipsets at Snapdragon for India Event
  4. Vivo X300 Ultra First Impressions
  5. OnePlus 16 Could Get 200-Megapixel Camera, These Other Upgrades
  6. Vivo X300 Ultra Debuts in India With 200-Megapixel Zeiss Cameras: See Price
  7. Apple Agrees to Pay $250 Million Settlement for Misleading Claims on AI
  1. Oppo Find X9 Ultra, Oppo Find X9s Availability Details Confirmed, Will Go on Sale via Amazon and Flipkart
  2. Motorola Razr Fold India Launch Date, Colourways and Memory Configurations Announced
  3. iQOO 15T, Pad 6 Pro and iQOO TWS 5i Posters Reveal Design as Company Opens Pre-Orders in China
  4. Boat Nirvana Eutopia 2 Pro With Snapdragon Chip Teased at Snapdragon for India Event Ahead of Launch in India
  5. Apple's iPhone Ultra to Feature Internal Design That Makes It the Most Repairable Foldable Yet, Tipster Claims
  6. Stranger Than Heaven Gets In-Depth Look at Story, Setting and Combat; Launch Set for This Winter
  7. Acer Iconia iM11-22M5G Launched in India With 11.45-Inch Display, MediaTek 8791 SoC: Price, Features
  8. Will Tamil Movie OTT Release: Where to Watch it Online?
  9. Euphoria Season 3 OTT Release: Cast, Plot, Trailer, Where to Watch, and More
  10. Snapdragon 6 Gen 5, Snapdragon 4 Gen 5 Chipsets Launched at Qualcomm's Snapdragon for India Event
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.