Google Unveils Gemini 1.5, Meta Introduces Predictive Visual Machine Learning Model V-JEPA

Google says Gemini 1.5 will have a limited version with a context window of up to 1 million tokens.

Advertisement
Written by Akash Dutta, Edited by David Delima | Updated: 16 February 2024 13:59 IST
Highlights
  • Google’s Gemini 1.5 model is built on Transformer and MoE architecture
  • It can process 1 hour of video or over 7,00,000 words in one go
  • Meta’s V-JEPA model helps machines learn by watching videos

Meta’s V-Jepa is a non-generative model that learns by predicting missing or masked parts of a video

Photo Credit: Google

Google and Meta made notable artificial intelligence (AI) announcements on Thursday, unveiling new models with significant advancements. The search giant unveiled Gemini 1.5, an updated AI model that comes with long-context understanding across different modalities. Meanwhile, Meta announced the release of its Video Joint Embedding Predictive Architecture (V-JEPA) model, a non-generative teaching method for advanced machine learning (ML) through visual media. Both products offer newer ways of exploring AI capabilities. Notably, OpenAI also introduced its first text-to-video generation model Sora on Thursday.

Google Gemini 1.5 model details

Demis Hassabis, CEO of Google DeepMind, announced the release of Gemini 1.5 via a blog post. The newer model is built on the Transformer and Mixture of Experts (MoE) architecture. While it is expected to have different versions, currently, only the Gemini 1.5 Pro model has been released for early testing. Hassabis said that the mid-size multimodal model can perform tasks at a similar level to Gemini 1.0 Ultra which is the company's largest generative model and is available as the Gemini Advanced subscription with Google One AI Premium plan.

The biggest improvement with Gemini 1.5 is its capability to process long-context information. The standard Pro version comes with a 1,28,000 token context window. In comparison, Gemini 1.0 had a context window of 32,000 tokens. Tokens can be understood as entire parts or subsections of words, images, videos, audio or code, which act as building blocks for processing information by a foundation model. “The bigger a model's context window, the more information it can take in and process in a given prompt — making its output more consistent, relevant and useful,” Hassabis explained.

Advertisement

Alongside the standard Pro version, Google is also releasing a special model with a context window of up to 1 million tokens. This is being offered to a limited group of developers and its enterprise clients in a private preview. While there is no dedicated platform for it, it can be tried out via Google's AI Studio, a cloud console tool for testing generative AI models, and Vertex AI. Google says this version can process one hour of video, 11 hours of audio, codebases with over 30,000 lines of code, or over 7,00,000 words in one go.

Advertisement

Meta V-JEPA details

In a post on X (formerly known as Twitter), Meta publicly released V-JEPA. It is not a generative AI model, but a teaching method that enables ML systems to understand and model the physical world by watching videos. The company called it an important step towards advanced machine intelligence (AMI), a vision of one of the three 'Godfathers of AI', Yann LeCun.

In essence, it is a predictive analysis model, that learns entirely from visual media. It can not only understand what's going on in a video but also predict what comes next. To train it, the company claims to have used a new masking technology, where parts of the video were masked in both time and space. This means that some frames in a video were entirely removed, while some other frames had blacked-out fragments, which forced the model to predict both the current frame as well as the next frame. As per the company, the model was able to do both efficiently. Notably, the model can predict and analyse videos of up to 10 seconds in length.

Advertisement

“For example, if the model needs to be able to distinguish between someone putting down a pen, picking up a pen, and pretending to put down a pen but not actually doing it, V-JEPA is quite good compared to previous methods for that high-grade action recognition task,” Meta said in a blog post.

At present, the V-JEPA model only uses visual data, which means the videos do not contain any audio input. Meta is now planning to incorporate audio alongside video in the ML model. Another goal for the company is to improve its capabilities in longer videos.

Advertisement


Is the Samsung Galaxy Z Flip 5 the best foldable phone you can buy in India right now? We discuss the company's new clamshell-style foldable handset on the latest episode of Orbital, the Gadgets 360 podcast. Orbital is available on Spotify, Gaana, JioSaavn, Google Podcasts, Apple Podcasts, Amazon Music and wherever you get your podcasts.
Affiliate links may be automatically generated - see our ethics statement for details.
 

Catch the latest from the Consumer Electronics Show on Gadgets 360, at our CES 2026 hub.

Advertisement

Related Stories

Popular Mobile Brands
  1. iQOO 15 Ultra to Feature Shoulder Triggers, More Gaming Features
  2. Young Sherlock Now Set for OTT Release on OTT: All the Details
  3. Apple Sees Record Growth in iPhone Shipments in India
  4. Oppo Find N6 Bags Certification Ahead of Launch in the UAE
  1. Giant Ancient Collision May Have ‘Flipped’ the Moon’s Interior, Study Suggests
  2. VLT’s GRAVITY Instrument Detects ‘Tug’ from Colossal Exomoon; Could Be Largest Natural Satellite Ever Found
  3. Young Sherlock Now Set for OTT Release on OTT: What You Need to Know About Guy Ritchie’s Mystery Thriller
  4. NASA’s Miner++ AI Brings Machine Digs Into TESS Archive to the Hunt for Nearby Earth-Like Worlds
  5. iQOO 15 Ultra Confirmed to Feature Touch-based Shoulder Triggers With Haptic Feedback
  6. Invincible Season 4 OTT Release: When and Where to Watch the Highly Anticipated Viltrumite War Online?
  7. iPhone Shipments in India Rise to 14 Million Units in 2025 as Apple Sees Record Year: Report
  8. Oppo Find N6 Listed on TDRA Website, Hinting at Imminent Launch in the UAE
  9. NASA’s JWST Uncovers a ‘Feeding Frenzy’ That Births Supermassive Black Holes
  10. NASA Confirms Historic Artifacts Will Fly on Artemis II Moon Mission
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.