Google’s New Benchmark Will Rank the Best AI Models to Build Android Apps

Android Bench will act as a leaderboard to rank the AI models that perform the best when developing an Android app.

Written by Akash Dutta, Edited by Ketan Pratap | Updated: 9 March 2026 14:02 IST

Highlights

Gemini 3.1 Pro currently ranks on top of the leaderboard
Android Bench focuses on common Android development areas
Google said the methodology was validated by several LLM makers

Android Bench’s methodology, dataset, and tests are publicly available on GitHub

Photo Credit: Android

Google introduced a new benchmark last week that evaluates artificial intelligence (AI) models based on their proficiency in developing Android apps. Dubbed Android Bench, the platform also ranks the models that perform the best in the tests, to help the developer community pick the right AI tools when building new apps and experiences for Android. The Mountain View-based tech giant said that the curated set of tests and evaluation system was validated by several AI model developers. Additionally, the methodology, dataset, and tests have also been made publicly available.

Google Develops Android Bench

In a post on the Android Developers Blog, the company announced the release of Android Bench. It is described as the operating system's official leaderboard of large language models (LLMs) for Android development. Google says the benchmark was developed to provide developers of AI models with “a clear, reliable baseline for what high-quality Android development looks like.”

Google Discussion

Explore More...

The benchmark is said to be created using a set of tasks around a range of common Android development areas, such as networking on wearables and migrating to the latest version of Jetpack Compose. These tasks were sourced from public GitHub Android repositories, the post added. The company said the tasks were validated via several LLM makers.

Google Starts Warning Users About Battery-Draining Apps on the Play Store

The initial version of Android Bench only focuses on model performance and does not include agentic capabilities or tool use. Additionally, the methodology, dataset, and test harness are publicly available on GitHub. To avoid data contamination (where the answers to the questions are added to an AI model's training process), the tasks are said to focus on reasoning instead of memorisation or guessing.

Currently, Gemini 3.1 Pro ranks on top of the Android Bench leaderboard, followed by Claude Opus 4.6, GPT-5.2-Codex, Opus 4.5, and Gemini 3 Pro, respectively. The tech giant says that all of the listed AI models can be tried out by developers by using API keys in the latest stable version of Android Studio.

Google says it will continue to improve the methodology to preserve the integrity of the dataset and is also planning to make improvements for future releases of the benchmark. The next iteration of the Android Bench will see increased quantity and complexity of tasks.

Googlebook Lineup Tipped to Include Eight Devices; Intel May Power Four Models

5 June 2026

Google Rolls Out Search Profiles for Publishers and Creators: Here's How It Works

5 June 2026

Google Expands Gemini Avatar to More Paid Users, Lets You Generate AI Content Featuring Yourself

5 June 2026

Google Begins Testing New Tools to Let Website Owners Opt Out of AI Overviews, AI Mode in Search

4 June 2026

Google Introduces Fake Call Detection for Android Phones to Curb Call Spoofing Attacks

3 June 2026

Google’s New Benchmark Will Rank the Best AI Models to Build Android Apps

Google Develops Android Bench

Related Stories