Grab Superapp Says AI Models Struggle to Understand Asian Languages

Grab said that it had to develop an in-house AI model due to the unreliability of both proprietary and open-source AI models.

Advertisement
Written by Akash Dutta, Edited by Ketan Pratap | Updated: 4 November 2025 13:41 IST
Highlights
  • Grab has now built a specialised vision LLM for the eKYC process
  • The model is extracting information from user-submitted documents
  • Grab used both online and synthetic datasets to train the model

AI models struggle to understand non-English languages due to the limited datasets

Photo Credit: Unsplash/Rohan Solankurkar

Grab, the Singapore-based superapp company, highlighted on Monday that it was forced to develop an in-house artificial intelligence (AI) model for internal use. It is a lightweight vision large language model (LLM) that can scan documents and extract information from them. The company said the decision to develop the model was made as both proprietary and open-source models were not good at understanding Southeast Asian languages. The company's statement has raised fresh concerns around the accessibility of frontier models by Google, OpenAI, and Anthropic.

AI Models' Struggle With Non-English Languages

In a blog post detailing the architecture and training process of their in-house vision model, Grab highlighted the shortcomings they experienced when they tried to outsource the technology. “While powerful proprietary Large Language Models (LLMs) were an option, they often fell short in understanding SEA languages, produced errors, hallucinations, and had high latency. On the other hand, open-sourced Vision LLMs were more efficient but not accurate enough for production,” the post mentioned.

AI models' struggle with non-English languages is not a new finding. For years, researchers have pointed it out, and AI players have tried to fix the issue. However, despite gaining basic competence in popular foreign languages such as Hindi, Japanese, Spanish (Latin America and Spain), and Chinese, the models have yet to understand the lexicon enough to differentiate between the nuances. So, they might be useful in general conversations, but for enterprise or research-based needs, the applicability falls short.

Advertisement

For instance, a paper published earlier this year found that even AI models developed by Chinese companies are as bad in Chinese minority languages as are Western models. And the issue persists in both proprietary models from Google, OpenAI, Meta, and Anthropic, as well as in open-source models.

Advertisement

The reason behind this struggle is the lack of readily available, adequate datasets to train the model on these languages. This is one of the reasons major AI companies are partnering with Indian companies and institutions to collect more Indic language datasets. In July, Google teamed up with IIT Bombay to develop Indic language AI speech models. Meta is reportedly paying $55 an hour to contractors to train its models in the Hindi language, and OpenAI has announced a research collaboration with IIT Madras, backed by $500,000 from the ChatGPT maker.

While collecting data this way is expensive, it is still possible to eventually build large enough datasets in prominent Asian and other languages. However, the minority languages, such as the non-scheduled Indian languages, will still be a struggle for these models to gain competence in. And unless they can learn these languages, accessibility and functionality will always be limited.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. Oppo K14x India Launch Date, Key Features Confirmed Ahead of Debut
  2. iQOO 15R Battery Capacity, Thickness Announced by Company
  3. Realme Buds Air 8 Review: Big on Features, but There's A Catch
  4. Samsung Galaxy F70e 5G India Will Launch in India on This Date
  5. Scientists Discover Cosmic Clock in Zircon Crystals That Tracks Earth's Landscape History
  6. Samsung Galaxy S26 Hits Geekbench With This Chipset, Specifications
  7. Samsung's July Product Lineup Leaks via Listing on IMEI Database
  8. Xiaomi to Open Out Premium Service Centres in These 15 Cities
  9. Sony WF-1000XM6 Price, Launch Timeline and Key Features Leaked
  10. Samsung Galaxy S26 Could Arrive With This Pixel-Exclusive Calling Feature
  1. Google Disrupts Massive Proxy Network That Hijacked Millions of Smartphones, PCs for Cyberattacks
  2. Samsung Galaxy Watch Ultra 2, Galaxy Watch 9 and Galaxy Tab S12 Series Reportedly Listed on IMEI Database
  3. iQOO 15R Battery Capacity and Thickness Revealed Ahead of Launch in India
  4. Scientists Discover Cosmic Clock in Zircon Crystals That Tracks Earth’s Landscape History
  5. NASA Confirms Axiom Mission 5 Private Astronaut Launch to ISS in Early 2027
  6. Mountain Climbing Indie Game Cairn Sells 200,000 Copies on PC, PS5 in 3 Days
  7. Sony WF-1000XM6 Price, Launch Timeline and Key Specifications Leaked
  8. Vivo Y21 5G and Vivo Y11d Listed on Malaysia's SIRIM Database, Might Launch Soon
  9. UK Watchdog Wants Google to Let Publishers Opt Out of AI Overviews
  10. Budget 2026: Government Proposes Penalties for Inaccurate Reporting of Crypto Assets
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.