Google Unveils Gemini Embedding 2, Its First AI Model to Map Text, Images and Video Together

Gemini Embedding 2 understands text, images, and videos in the same language for easier retrieval.

Advertisement
Written by Akash Dutta, Edited by Ketan Pratap | Updated: 11 March 2026 11:14 IST
Highlights
  • Gemini Embedding 2 is available in public preview via API and Vertex AI
  • It also captures semantic intent across over 100 languages
  • The model can process up to six images per request

Gemini Embedding 2 can also understand interleaved input across multiple modalities

Photo Credit: Google

Google released its first fully multimodal embedding model on Tuesday. Dubbed Gemini Embedding 2, the artificial intelligence (AI) model maps text, images, audio, and videos into a single, unified embedding space. This means it uses an architecture to understand concepts whether they are written as words, spoken aloud, or shown in an image or a video. The Mountain View-based tech giant says this new system will simplify the way a large language model (LLM) understands information and will allow it to perform more complex actions.

Google's First Multimodal Embedding Model Is Here

In a blog post, the tech giant detailed the new AI model. It is the successor to the text-only embedding model that was released last year, and it captures semantic intent across more than 100 languages. Gemini Embedding 2 is currently available in public preview via the Gemini application programming interface (API) and Vertex AI.

Advertisement

AI models typically have different digital file cabinets to store text, photos, videos, and audio files. Whenever a user requests information in a specific format, it begins looking into that specific cabinet. Usually, an LLM treats a "cat" in a text document and a "cat" in a video as two completely different things. And to make matters more complex, the method to obtain information differs with each format.

Gemini Embedding 2 solves this problem by creating a new architecture that can only use a single cabinet for all kinds of information. This allows it to process a document that has both text and images at the same time, as humans do. Google says this new system simplifies “complex pipelines and enhances a wide variety of multimodal downstream tasks.” Some of these include Retrieval-Augmented Generation (RAG) and semantic search, sentiment analysis, and data clustering.

Advertisement

Coming to the AI model's capabilities, it has a text context window of up to 8,192 input tokens. It can also process up to six images per request in PNG and JPEG formats, and supports up to 120 seconds of video input in MP4 and MOV formats. Additionally, it can natively process and map audio data without needing text transcriptions. Further, it can also embed up to six-page-long PDFs.

The Gemini Embedding 2 can also understand interleaved input, so users can send across multiple modalities (such as text and image) in the same request. Google says this capability allows the model to gain a more accurate understanding of complex, real-world data.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. Apple Unveils iOS 27 With Revamped Siri and Liquid Glass Improvements
  2. Oppo Reno 16 Indian Variant Surfaces on Benchmarking Site Ahead of Debut
  3. Here Are Apple's Top Announcements From Its Annual Developer Conference
  4. iOS 27 Release Date and How to Update: Supported iPhones
  5. OnePlus 15 Reportedly Gains AirDrop Support Through Quick Share
  1. James Webb Space Telescope Weighs Most Distant Dormant Black Hole Ever Detected
  2. Stellar Blade: Blood Rain Protagonist Will Have More of a Personality, Says Shift Up
  3. Samsung Galaxy Tab Active 6 Reportedly Set to Launch in 2027 With 5G Connectivity
  4. iOS 27 Finally Adds Separate Volume Controls for Ringtones and Alarms, Just Like Android Phones
  5. UK Regulator Proposes Allowing Retail Funds to Hold Up to 10 Percent in Crypto ETNs
  6. Samsung Galaxy Z Fold 8 Ultra Reportedly Listed on BIS Database, Tipster Leaks Key Specifications
  7. Redmi Note 17 Visits EEC Certification Database Along With a New Vivo Handset, Hinting at Imminent Global Launch
  8. OnePlus 15 Gains AirDrop Support via Quick Share as Google Expands Availability Beyond Pixel, Samsung Phones
  9. Apple Will Soon Allow Android, Windows Users to Share Photos to iCloud Shared Albums
  10. WhatsApp Claims NSO Group-Linked Entity Unsuccessfully Carried Out Fresh Phishing Attacks Against Users
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.