Google Unveils Gemini 1.5, Meta Introduces Predictive Visual Machine Learning Model V-JEPA

Google says Gemini 1.5 will have a limited version with a context window of up to 1 million tokens.

Advertisement
Written by Akash Dutta, Edited by David Delima | Updated: 16 February 2024 13:59 IST
Highlights
  • Google’s Gemini 1.5 model is built on Transformer and MoE architecture
  • It can process 1 hour of video or over 7,00,000 words in one go
  • Meta’s V-JEPA model helps machines learn by watching videos

Meta’s V-Jepa is a non-generative model that learns by predicting missing or masked parts of a video

Photo Credit: Google

Google and Meta made notable artificial intelligence (AI) announcements on Thursday, unveiling new models with significant advancements. The search giant unveiled Gemini 1.5, an updated AI model that comes with long-context understanding across different modalities. Meanwhile, Meta announced the release of its Video Joint Embedding Predictive Architecture (V-JEPA) model, a non-generative teaching method for advanced machine learning (ML) through visual media. Both products offer newer ways of exploring AI capabilities. Notably, OpenAI also introduced its first text-to-video generation model Sora on Thursday.

Google Gemini 1.5 model details

Demis Hassabis, CEO of Google DeepMind, announced the release of Gemini 1.5 via a blog post. The newer model is built on the Transformer and Mixture of Experts (MoE) architecture. While it is expected to have different versions, currently, only the Gemini 1.5 Pro model has been released for early testing. Hassabis said that the mid-size multimodal model can perform tasks at a similar level to Gemini 1.0 Ultra which is the company's largest generative model and is available as the Gemini Advanced subscription with Google One AI Premium plan.

The biggest improvement with Gemini 1.5 is its capability to process long-context information. The standard Pro version comes with a 1,28,000 token context window. In comparison, Gemini 1.0 had a context window of 32,000 tokens. Tokens can be understood as entire parts or subsections of words, images, videos, audio or code, which act as building blocks for processing information by a foundation model. “The bigger a model's context window, the more information it can take in and process in a given prompt — making its output more consistent, relevant and useful,” Hassabis explained.

Advertisement

Alongside the standard Pro version, Google is also releasing a special model with a context window of up to 1 million tokens. This is being offered to a limited group of developers and its enterprise clients in a private preview. While there is no dedicated platform for it, it can be tried out via Google's AI Studio, a cloud console tool for testing generative AI models, and Vertex AI. Google says this version can process one hour of video, 11 hours of audio, codebases with over 30,000 lines of code, or over 7,00,000 words in one go.

Advertisement

Meta V-JEPA details

In a post on X (formerly known as Twitter), Meta publicly released V-JEPA. It is not a generative AI model, but a teaching method that enables ML systems to understand and model the physical world by watching videos. The company called it an important step towards advanced machine intelligence (AMI), a vision of one of the three 'Godfathers of AI', Yann LeCun.

In essence, it is a predictive analysis model, that learns entirely from visual media. It can not only understand what's going on in a video but also predict what comes next. To train it, the company claims to have used a new masking technology, where parts of the video were masked in both time and space. This means that some frames in a video were entirely removed, while some other frames had blacked-out fragments, which forced the model to predict both the current frame as well as the next frame. As per the company, the model was able to do both efficiently. Notably, the model can predict and analyse videos of up to 10 seconds in length.

Advertisement

“For example, if the model needs to be able to distinguish between someone putting down a pen, picking up a pen, and pretending to put down a pen but not actually doing it, V-JEPA is quite good compared to previous methods for that high-grade action recognition task,” Meta said in a blog post.

At present, the V-JEPA model only uses visual data, which means the videos do not contain any audio input. Meta is now planning to incorporate audio alongside video in the ML model. Another goal for the company is to improve its capabilities in longer videos.

Advertisement


Is the Samsung Galaxy Z Flip 5 the best foldable phone you can buy in India right now? We discuss the company's new clamshell-style foldable handset on the latest episode of Orbital, the Gadgets 360 podcast. Orbital is available on Spotify, Gaana, JioSaavn, Google Podcasts, Apple Podcasts, Amazon Music and wherever you get your podcasts.
Affiliate links may be automatically generated - see our ethics statement for details.
 

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

Advertisement

Related Stories

Popular Mobile Brands
  1. Apple Launches iPhone 17 at 'Awe Dropping' Event With These Upgrades
  2. iPhone 17 Launch Highlights: iPhone 17 Series, AirPods 3, and More Launched
  3. Apple Launches iPhone 17 Pro, 17 Pro Max With These Massive Upgrades
  4. Apple Watch Series 11, Ultra 3, SE Launched With These Health Features
  5. Apple MacBook Air M4 Available With Up to Rs. 16,000 Discount via Amazon
  6. Apple Launches iPhone Air as the Slimmest iPhone to Date
  7. Xiaomi Confirms Authorised Retailers Ahead of Amazon, Flipkart Festive Sales
  8. AirPods Pro 3 Launched: Featuring Lossless Audio and a Redesigned Case
  9. Flipkart Big Billion Days Sale: Google Pixel 9 to Get This Huge Price Cut
  10. Apple Event 2025: Everything You Need to Know Ahead of iPhone 17 Launch
  1. iPhone 17 Pro, iPhone 17 Pro Max Are Here: Massive Camera Upgrades, and A19 Pro Chip
  2. iPhone Air Launched: Ultra-Slim Form Factor, Apple Intelligence Features, and More
  3. iPhone 17 Launched: A19 Chip, Apple Intelligence, and More
  4. Apple Watch Series 11, Ultra 3, and SE Launched: Thinner Design and New Health Sensors
  5. AirPods Pro 3 Launched: Featuring Lossless Audio and a Redesigned Case
  6. Tecno Spark Slim Full Specifications Revealed; Features MediaTek Helio G200 SoC, 5.93mm Thick Build
  7. Samsung Galaxy S26 Ultra Tipped to Feature Thicker Rear Camera Module Comprising 50-Megapixel Telephoto Camera
  8. Hollow Knight: Silksong Has Reportedly Crossed 5 Million Players in 3 Days
  9. Apple Powerbeats Fit Colour Options, Key Features Leaked; May Offer Up to 30 Hours Total Battery Life
  10. Global Premium Smartphone Sales Hit Record High in H1 2025 as Google Re-Enters Top Five: Counterpoint
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.