Google DeepMind Is Integrating Gemini 1.5 Pro in Robots That Can Navigate Real-World Environments

Google DeepMind shared a video demonstration of a Gemini AI integrated robot that can guide users to the desired destination.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 12 July 2024 15:33 IST
Highlights
  • Google DeepMind has published a paper sharing impact of Gemini in robots
  • The robots were leveraging Gemini 1.5 Pro’s 2 million context window
  • DeepMind also used its Robotic Transformer 2 (RT-2) AI model

DeepMind said that the Gemini-integrated robots were able to perform Multimodal Instruction Navigation

Photo Credit: Google DeepMind

Google DeepMind shared new advancements made in the field of robotics and vision language models (VLMs) on Thursday. The artificial intelligence (AI) research division of the tech giant has been working with advanced vision models to develop new capabilities in robots. In a new study, DeepMind highlighted that using Gemini 1.5 Pro and its long context window has now enabled the division to make breakthroughs in navigation and real-world understanding of its robots. Earlier this year, Nvidia also unveiled new AI technology that powers advanced capabilities in humanoid robots.

Google DeepMind Uses Gemini AI to Improve Robots

In a post on X (formerly known as Twitter), Google DeepMind revealed that it has been training its robots using Gemini 1.5 Pro's 2 million token context window. Context windows can be understood as the window of knowledge visible to an AI model, using which it processes tangential information around the queried topic.

Advertisement

For instance, if a user asks an AI model about “most popular ice cream flavours”, the AI model will check the keyword ice cream and flavours to find information to that question. If this information window is too small, then the AI will only be able to respond with the names of different ice cream flavours. However, if it is larger, the AI will also be able to see the number of articles about each ice cream flavour to find which has been mentioned the most and deduce the “popularity factor”.

DeepMind is taking advantage of this long context window to train its robots in real-world environments. The division aims to see if the robot can remember the details of an environment and assist users when asked about the environment with contextual or vague terms. In a video shared on Instagram, the AI division showcased that a robot was able to guide a user to a whiteboard when he asked it for a place where he could draw.

Advertisement

“Powered with 1.5 Pro's 1 million token context length, our robots can use human instructions, video tours, and common sense reasoning to successfully find their way around a space,” Google DeepMind stated in a post.

In a study published on arXiv (a non-peer-reviewed online journal), DeepMind explained the technology behind the breakthrough. In addition to Gemini, it is also using its own Robotic Transformer 2 (RT-2) model. It is a vision-language-action (VLA) model that learns from both web and robotics data. It utilises computer vision to process real-world environments and use that information to create datasets. This dataset can later be processed by the generative AI to break down contextual commands and produce desired outcomes.

Advertisement

At present, Google DeepMind is using this architecture to train its robots on a broad category known as Multimodal Instruction Navigation (MIN) which includes environment exploration and instruction-guided navigation. If the demonstration shared by the division is legitimate, this technology might further advance robotics.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. OnePlus Nord 6 Series India Launch Teased as New Model Surfaces Online
  2. iQOO Z11 Surfaces on Benchmarking Site Ahead of Its Launch in China
  3. Tecno Spark Go 3 Review: Last of the Sub-Rs. 10,000 Budget Phones?
  4. Huawei Teases an Imminent Return to India With the Launch of This Tablet
  5. Claude Is Doubling the Usage Limits for the Next Two Weeks: Details
  6. Microsoft Is Trying to Reduce Copilot-Branded AI Bloat in Windows 11
  7. JBL Grip Portable Speaker With Up to 12 Hours Battery Life Launched in India
  8. Best Mobiles Under Rs. 25,000 in India
  9. iQOO Z11x 5G With 7,200mAh Battery Goes on Sale in India: See Price, Offers
  10. Poco X8 Pro Series Camera, Display Features Revealed a Day Before Launch
  1. Vivo, iQOO Smartphones to Get More Expensive in China as Component Prices Continue to Rise: Report
  2. iQOO Z11 With MediaTek Dimensity 8500 SoC Surfaces on Geekbench Ahead of China Launch
  3. AirPods Max 2 Launched in India With H2 Chip, Adaptive Audio, and 20-Hour Battery Life: Price, Specifications
  4. Arc Raiders' AI Voice Lines Were Re-Recorded by Human Actors After Launch, Says Embark CEO
  5. Apple's iPhone 19e Said to Launch in 2028 With Upgraded LPTO OLED Display
  6. WLFI Governance Vote Passes Proposal Introducing Token Lock-Up Incentives
  7. Xiaomi Book Pro 14, Xiaomi Watch S5 China Launch Date Announced; Key Features Teased
  8. Realme C100 5G Listed on Retail Website With 6.8-Inch Display and 7,000mAh Battery
  9. Anthropic Doubles Claude’s Usage Limits for the Next Two Weeks: Details
  10. Australian Lawmakers Advance New Bill to Regulate Crypto Platforms
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.