Home
Ai
Ai News
Grab Superapp Says AI Models Struggle to Understand Asian Languages

Grab Superapp Says AI Models Struggle to Understand Asian Languages

Grab said that it had to develop an in-house AI model due to the unreliability of both proprietary and open-source AI models.

Written by Akash Dutta, Edited by Ketan Pratap | Updated: 4 November 2025 13:41 IST

Grab Superapp Says AI Models Struggle to Understand Asian Languages

Photo Credit: Unsplash/Rohan Solankurkar

AI models struggle to understand non-English languages due to the limited datasets

Click Here to Add Gadgets360 As A Trusted Source

Highlights

Grab has now built a specialised vision LLM for the eKYC process
The model is extracting information from user-submitted documents
Grab used both online and synthetic datasets to train the model

Grab, the Singapore-based superapp company, highlighted on Monday that it was forced to develop an in-house artificial intelligence (AI) model for internal use. It is a lightweight vision large language model (LLM) that can scan documents and extract information from them. The company said the decision to develop the model was made as both proprietary and open-source models were not good at understanding Southeast Asian languages. The company's statement has raised fresh concerns around the accessibility of frontier models by Google, OpenAI, and Anthropic.

AI Models' Struggle With Non-English Languages

In a blog post detailing the architecture and training process of their in-house vision model, Grab highlighted the shortcomings they experienced when they tried to outsource the technology. “While powerful proprietary Large Language Models (LLMs) were an option, they often fell short in understanding SEA languages, produced errors, hallucinations, and had high latency. On the other hand, open-sourced Vision LLMs were more efficient but not accurate enough for production,” the post mentioned.

Grab Discussion

Explore More...

AI models' struggle with non-English languages is not a new finding. For years, researchers have pointed it out, and AI players have tried to fix the issue. However, despite gaining basic competence in popular foreign languages such as Hindi, Japanese, Spanish (Latin America and Spain), and Chinese, the models have yet to understand the lexicon enough to differentiate between the nuances. So, they might be useful in general conversations, but for enterprise or research-based needs, the applicability falls short.

OpenAI Turns to Amazon in $38 Billion Cloud Services Deal After Restructuring

For instance, a paper published earlier this year found that even AI models developed by Chinese companies are as bad in Chinese minority languages as are Western models. And the issue persists in both proprietary models from Google, OpenAI, Meta, and Anthropic, as well as in open-source models.

The reason behind this struggle is the lack of readily available, adequate datasets to train the model on these languages. This is one of the reasons major AI companies are partnering with Indian companies and institutions to collect more Indic language datasets. In July, Google teamed up with IIT Bombay to develop Indic language AI speech models. Meta is reportedly paying $55 an hour to contractors to train its models in the Hindi language, and OpenAI has announced a research collaboration with IIT Madras, backed by $500,000 from the ChatGPT maker.

While collecting data this way is expensive, it is still possible to eventually build large enough datasets in prominent Asian and other languages. However, the minority languages, such as the non-scheduled Indian languages, will still be a struggle for these models to gain competence in. And unless they can learn these languages, accessibility and functionality will always be limited.

Comments

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.