Elon Musk’s xAI Unveils Grok 1.5 Vision AI Model in Preview, To Compete With GPT-4 Vision and Gemini Pro 1.5

xAI said Grok 1.5 Vision can process a wide variety of visual information, including documents, diagrams, charts, and more.

Advertisement
Written by Akash Dutta, Edited by Manas Mitul | Updated: 15 April 2024 14:09 IST
Highlights
  • This is xAI’s first-generation multimodal AI model
  • Grok 1.5 Vision has a context length of 1,28,000 tokens
  • Recently, Grok AI was made available in open-source
Elon Musk’s xAI Unveils Grok 1.5 Vision AI Model in Preview, To Compete With GPT-4 Vision and Gemini Pro 1.5

xAI said Grok 1.5 Vision outperforms existing AI models in the company’s new RealWorldQA benchmark

Photo Credit: xAI

Elon Musk's artificial intelligence (AI) firm xAI has unveiled a new AI model dubbed Grok 1.5 Vision. This large language model (LLM) is an enhanced version of the recently released Grok 1.5 model. With this upgrade, the AI model is now equipped with computer vision, making it capable of accepting visual media as input. It can process images and answer questions about it. Notably, the announcement came just days after OpenAI introduced its own computer vision-powered GPT-4 model.

The announcement was made by the official X (formerly known as Twitter) account of xAI. The firm shared a blog post detailing the new AI model and shared some of its benchmark scores. Since the vision capabilities were added to the recently unveiled Grok 1.5 model, most of the details remain the same. It has the same context window of 1,28,000 tokens and the general benchmark scores are also likely to remain the same.

xAI also shared benchmark scores of Grok 1.5 Vision tested on a benchmark developed by the company. The AI firm calls it the RealWorldQA benchmark and it measures “real-world spatial understanding”. It also tested the model in several other benchmarks such as MMMU, Mathvista, ChartQA, and more. While Grok outperformed OpenAI's GPT-4 with Vision and Gemini 1.5 Pro in RealWorldQA, it scored less in MMMU and ChartQA.

For the unversed, computer vision is a branch of computer science that deals with equipping computers (and AI models) with the ability to identify and understand objects in the real world using images and videos. This is designed to help computers see and process visual signals the way humans do. With the rise of multimodal AI models, many firms are now focusing on developing vision-focused models. Google's Gemini 1.5 Pro and OpenAI's GPT-4 with Vision both have this capability.

Advertisement

This technology also offers a wide range of applications. The Indian calorie tracking and nutrition feedback platform Healthify recently added a feature called Snap where users can click a picture of a food item or cuisine, and GPT-4 with Vision-powered AI chatbot suggests how the recipe can be made healthier, and how much exercise one needs to do to burn the extra calories. In future, AI models with computer vision can assist in the diagnosis of diseases, building self-driving cars, and more.


Is the Samsung Galaxy Z Flip 5 the best foldable phone you can buy in India right now? We discuss the company's new clamshell-style foldable handset on the latest episode of Orbital, the Gadgets 360 podcast. Orbital is available on Spotify, Gaana, JioSaavn, Google Podcasts, Apple Podcasts, Amazon Music and wherever you get your podcasts.
Affiliate links may be automatically generated - see our ethics statement for details.
 

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

Further reading: xAI, Elon Musk, Grok, X, Artificial intelligence, AI
Advertisement

Related Stories

Popular Mobile Brands
  1. Nothing Phone 3 to Be Equipped With the Snapdragon 8s Gen 4 SoC
  2. OnePlus Nord 5 and Nord CE 5 Colour Options, Key Features Leaked
  3. Samsung Galaxy M36 5G to Launch in India Soon; Design, Price Range Teased
  4. Google Pixel 10 Series May Get a Tele-Macro Camera: All Details
  5. Samsung Galaxy S25 Ultra Price in India Discounted for a Limited Time
  6. Apple Said to Launch Watch Ultra 3 and Watch Series 11 This Year
  7. Poco F7 5G to Launch in India and Global Markets on This Date
  8. Sony Bravia 8 II QD-OLED TV Series Launched in India: See Price, Features
  9. Samsung Galaxy Watch to Soon Get Bedtime Guidance, Vascular Load Features
  1. Nintendo Direct Livestream Featuring Donkey Kong Bananza Announced for June 18
  2. Amazfit Active 2 Square Debuts With 1.75-Inch AMOLED Display and Up to 10 Days Battery Life
  3. Samsung’s Exynos 2500 SoC Confirmed to Feature Satellite Connectivity Ahead of Galaxy Z Flip 7 Launch
  4. Nothing Phone 3 Confirmed to Come With Snapdragon 8s Gen 4 SoC Ahead of July 1 Launch
  5. Samsung Galaxy M36 5G India Launch Teased; Rear Design and Price Range Revealed
  6. Reddit Unveils Reddit Community Intelligence, Its Suite of AI-Powered Ad Tools for Enterprises
  7. Sony Bravia 8 II QD-OLED TV Series With Acoustic Surface+ Audio, Studio Calibrated Mode Launched in India
  8. Asus Unveils Refreshed Vivobook S16, S16 OLED Laptops in India Alongside Vivobook S14: Price, Features
  9. Apple Watch Ultra 3 Said to Launch This Year; Product Roadmap for Next Three Years Leaked
  10. Google Unveils India-Focused Safety Charter, Shares How It Is Using AI to Combat Online Frauds and Scams
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.