Google Unveils Gemini 1.5, Meta Introduces Predictive Visual Machine Learning Model V-JEPA

Google says Gemini 1.5 will have a limited version with a context window of up to 1 million tokens.

Advertisement
Written by Akash Dutta, Edited by David Delima | Updated: 16 February 2024 13:59 IST
Highlights
  • Google’s Gemini 1.5 model is built on Transformer and MoE architecture
  • It can process 1 hour of video or over 7,00,000 words in one go
  • Meta’s V-JEPA model helps machines learn by watching videos

Meta’s V-Jepa is a non-generative model that learns by predicting missing or masked parts of a video

Photo Credit: Google

Google and Meta made notable artificial intelligence (AI) announcements on Thursday, unveiling new models with significant advancements. The search giant unveiled Gemini 1.5, an updated AI model that comes with long-context understanding across different modalities. Meanwhile, Meta announced the release of its Video Joint Embedding Predictive Architecture (V-JEPA) model, a non-generative teaching method for advanced machine learning (ML) through visual media. Both products offer newer ways of exploring AI capabilities. Notably, OpenAI also introduced its first text-to-video generation model Sora on Thursday.

Google Gemini 1.5 model details

Demis Hassabis, CEO of Google DeepMind, announced the release of Gemini 1.5 via a blog post. The newer model is built on the Transformer and Mixture of Experts (MoE) architecture. While it is expected to have different versions, currently, only the Gemini 1.5 Pro model has been released for early testing. Hassabis said that the mid-size multimodal model can perform tasks at a similar level to Gemini 1.0 Ultra which is the company's largest generative model and is available as the Gemini Advanced subscription with Google One AI Premium plan.

Advertisement

The biggest improvement with Gemini 1.5 is its capability to process long-context information. The standard Pro version comes with a 1,28,000 token context window. In comparison, Gemini 1.0 had a context window of 32,000 tokens. Tokens can be understood as entire parts or subsections of words, images, videos, audio or code, which act as building blocks for processing information by a foundation model. “The bigger a model's context window, the more information it can take in and process in a given prompt — making its output more consistent, relevant and useful,” Hassabis explained.

Alongside the standard Pro version, Google is also releasing a special model with a context window of up to 1 million tokens. This is being offered to a limited group of developers and its enterprise clients in a private preview. While there is no dedicated platform for it, it can be tried out via Google's AI Studio, a cloud console tool for testing generative AI models, and Vertex AI. Google says this version can process one hour of video, 11 hours of audio, codebases with over 30,000 lines of code, or over 7,00,000 words in one go.

Advertisement

Meta V-JEPA details

In a post on X (formerly known as Twitter), Meta publicly released V-JEPA. It is not a generative AI model, but a teaching method that enables ML systems to understand and model the physical world by watching videos. The company called it an important step towards advanced machine intelligence (AMI), a vision of one of the three 'Godfathers of AI', Yann LeCun.

In essence, it is a predictive analysis model, that learns entirely from visual media. It can not only understand what's going on in a video but also predict what comes next. To train it, the company claims to have used a new masking technology, where parts of the video were masked in both time and space. This means that some frames in a video were entirely removed, while some other frames had blacked-out fragments, which forced the model to predict both the current frame as well as the next frame. As per the company, the model was able to do both efficiently. Notably, the model can predict and analyse videos of up to 10 seconds in length.

Advertisement

“For example, if the model needs to be able to distinguish between someone putting down a pen, picking up a pen, and pretending to put down a pen but not actually doing it, V-JEPA is quite good compared to previous methods for that high-grade action recognition task,” Meta said in a blog post.

At present, the V-JEPA model only uses visual data, which means the videos do not contain any audio input. Meta is now planning to incorporate audio alongside video in the ML model. Another goal for the company is to improve its capabilities in longer videos.

Advertisement


Is the Samsung Galaxy Z Flip 5 the best foldable phone you can buy in India right now? We discuss the company's new clamshell-style foldable handset on the latest episode of Orbital, the Gadgets 360 podcast. Orbital is available on Spotify, Gaana, JioSaavn, Google Podcasts, Apple Podcasts, Amazon Music and wherever you get your podcasts.
Affiliate links may be automatically generated - see our ethics statement for details.
 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. OnePlus 16 Said to Feature 185Hz Refresh Rate Display
  2. Xiaomi 18 Pro Could Launch Before Standard Xiaomi 18 Model, Tipster Claims
  3. Microsoft Surface, Surface Pro Launched With Snapdragon X2 Chips: See Price
  4. Google's New Update Brings These Android 17 Features to Pixel Phones
  5. Samsung Galaxy Z Fold 8 Wide IMDA Certification Hints at Imminent Launch
  6. Athiradi OTT Release Date: When and Where to Watch it Online?
  7. Lenovo Tab Plus Gen 2 Launched With JBL Speaker System
  8. Here's Why the iPhone 18 Could Ship With More RAM
  9. Redmi Turbo 5 vs Motorola Edge 70 Pro vs Samsung Galaxy A37 5G Compared
  10. Apple's Next Big Bet? AI AirPods and a Redesigned Anniversary iPhone
  1. Silo Season 3 OTT Release Date Revealed: When and Where to Watch it Online?
  2. Samsung Galaxy Z Fold 8 Wide Appears on IMDA Database, New Wide Foldable Phone Could Arrive Soon
  3. Xiaomi 18 Pro Could Launch Before Standard Xiaomi 18 Model, Tipster Claims
  4. Google Pixel Drop for June Brings Android 17’s Real-Time Screen Reactions, Bubbles Features to Pixel Phones
  5. iPhone 18 to Launch With More RAM to Enable Support for More Advanced Siri AI Features: Report
  6. Cryptocurrency Prices Stabilise as US Fed Rate Cut Outlook Guides Investor Sentiment
  7. Kolahalamedu Out on OTT: Know Where to Stream This Malayalam Survival Thriller Film Online
  8. Microsoft's Copilot Cowork Feature Rolls Out Globally for Microsoft 365 Customers
  9. Sony Unveils Lytia L910 CMOS Image Sensor With LOFIC Structure, 4K 60fps Video Support
  10. Epson Expands EcoTank Portfolio in India With 15 New Printer Models
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.