Google Unveils Gemini 1.5, Meta Introduces Predictive Visual Machine Learning Model V-JEPA

Google says Gemini 1.5 will have a limited version with a context window of up to 1 million tokens.

Advertisement
Written by Akash Dutta, Edited by David Delima | Updated: 16 February 2024 13:59 IST
Highlights
  • Google’s Gemini 1.5 model is built on Transformer and MoE architecture
  • It can process 1 hour of video or over 7,00,000 words in one go
  • Meta’s V-JEPA model helps machines learn by watching videos

Meta’s V-Jepa is a non-generative model that learns by predicting missing or masked parts of a video

Photo Credit: Google

Google and Meta made notable artificial intelligence (AI) announcements on Thursday, unveiling new models with significant advancements. The search giant unveiled Gemini 1.5, an updated AI model that comes with long-context understanding across different modalities. Meanwhile, Meta announced the release of its Video Joint Embedding Predictive Architecture (V-JEPA) model, a non-generative teaching method for advanced machine learning (ML) through visual media. Both products offer newer ways of exploring AI capabilities. Notably, OpenAI also introduced its first text-to-video generation model Sora on Thursday.

Google Gemini 1.5 model details

Demis Hassabis, CEO of Google DeepMind, announced the release of Gemini 1.5 via a blog post. The newer model is built on the Transformer and Mixture of Experts (MoE) architecture. While it is expected to have different versions, currently, only the Gemini 1.5 Pro model has been released for early testing. Hassabis said that the mid-size multimodal model can perform tasks at a similar level to Gemini 1.0 Ultra which is the company's largest generative model and is available as the Gemini Advanced subscription with Google One AI Premium plan.

The biggest improvement with Gemini 1.5 is its capability to process long-context information. The standard Pro version comes with a 1,28,000 token context window. In comparison, Gemini 1.0 had a context window of 32,000 tokens. Tokens can be understood as entire parts or subsections of words, images, videos, audio or code, which act as building blocks for processing information by a foundation model. “The bigger a model's context window, the more information it can take in and process in a given prompt — making its output more consistent, relevant and useful,” Hassabis explained.

Advertisement

Alongside the standard Pro version, Google is also releasing a special model with a context window of up to 1 million tokens. This is being offered to a limited group of developers and its enterprise clients in a private preview. While there is no dedicated platform for it, it can be tried out via Google's AI Studio, a cloud console tool for testing generative AI models, and Vertex AI. Google says this version can process one hour of video, 11 hours of audio, codebases with over 30,000 lines of code, or over 7,00,000 words in one go.

Advertisement

Meta V-JEPA details

In a post on X (formerly known as Twitter), Meta publicly released V-JEPA. It is not a generative AI model, but a teaching method that enables ML systems to understand and model the physical world by watching videos. The company called it an important step towards advanced machine intelligence (AMI), a vision of one of the three 'Godfathers of AI', Yann LeCun.

In essence, it is a predictive analysis model, that learns entirely from visual media. It can not only understand what's going on in a video but also predict what comes next. To train it, the company claims to have used a new masking technology, where parts of the video were masked in both time and space. This means that some frames in a video were entirely removed, while some other frames had blacked-out fragments, which forced the model to predict both the current frame as well as the next frame. As per the company, the model was able to do both efficiently. Notably, the model can predict and analyse videos of up to 10 seconds in length.

Advertisement

“For example, if the model needs to be able to distinguish between someone putting down a pen, picking up a pen, and pretending to put down a pen but not actually doing it, V-JEPA is quite good compared to previous methods for that high-grade action recognition task,” Meta said in a blog post.

At present, the V-JEPA model only uses visual data, which means the videos do not contain any audio input. Meta is now planning to incorporate audio alongside video in the ML model. Another goal for the company is to improve its capabilities in longer videos.

Advertisement


Is the Samsung Galaxy Z Flip 5 the best foldable phone you can buy in India right now? We discuss the company's new clamshell-style foldable handset on the latest episode of Orbital, the Gadgets 360 podcast. Orbital is available on Spotify, Gaana, JioSaavn, Google Podcasts, Apple Podcasts, Amazon Music and wherever you get your podcasts.
Affiliate links may be automatically generated - see our ethics statement for details.
 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. Nothing Phone 3a Community Edition Launched: Here's What Makes It Special
  2. OpenAI's Code Red to Reportedly Continue Till Two More AI Models Are Released
  3. Paramount Launches Hostile Bid to Derail Netflix-Warner Bros. Deal
  4. Qualcomm's Acquisition of Augentix to Boost Its Smart Camera Portfolio
  5. Lava Play Max Launched in India With Vapour Chamber Cooling at This Price
  1. Microsoft to Invest $17.5 Billion to Scale India’s AI and Cloud, Joins Google and OpenAI’s Recent Push
  2. Massive Sunspot Complex on the Sun Raises Risk of Strong Solar Storms
  3. Ronkini Bhavan OTT Release: Know Where to Watch This Bengali Web Series Online?
  4. The Great Shamsuddin Family OTT Release Date: When and Where to Watch it Online?
  5. Angels Fallen OTT Release Date: When and Where to Watch it Online?
  6. OpenAI to Reportedly Release GPT-5.2 AI Model This Week, But ‘Code Red’ Will Continue
  7. Top Cooku Dupe Cooku Season 2 Now Streaming Online: Know Where to Watch This Reality Cooking Series
  8. Nothing Phone 3a Community Edition Launched in India With Custom Hardware Design and Custom UI Elements: Price, Features
  9. Google Shares Safety Guardrails for Chrome Browser’s Agentic Capabilities
  10. Google Pixel 9 Pro, Pixel 9 Pro XL and Pixel 9 Pro Fold Extended Repair Program for Specific Hardware Issues Announced
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.