Android Bench will act as a leaderboard to rank the AI models that perform the best when developing an Android app.
Android Bench’s methodology, dataset, and tests are publicly available on GitHub
Photo Credit: Android
Google introduced a new benchmark last week that evaluates artificial intelligence (AI) models based on their proficiency in developing Android apps. Dubbed Android Bench, the platform also ranks the models that perform the best in the tests, to help the developer community pick the right AI tools when building new apps and experiences for Android. The Mountain View-based tech giant said that the curated set of tests and evaluation system was validated by several AI model developers. Additionally, the methodology, dataset, and tests have also been made publicly available.
In a post on the Android Developers Blog, the company announced the release of Android Bench. It is described as the operating system's official leaderboard of large language models (LLMs) for Android development. Google says the benchmark was developed to provide developers of AI models with “a clear, reliable baseline for what high-quality Android development looks like.”
The benchmark is said to be created using a set of tasks around a range of common Android development areas, such as networking on wearables and migrating to the latest version of Jetpack Compose. These tasks were sourced from public GitHub Android repositories, the post added. The company said the tasks were validated via several LLM makers.
The initial version of Android Bench only focuses on model performance and does not include agentic capabilities or tool use. Additionally, the methodology, dataset, and test harness are publicly available on GitHub. To avoid data contamination (where the answers to the questions are added to an AI model's training process), the tasks are said to focus on reasoning instead of memorisation or guessing.
Currently, Gemini 3.1 Pro ranks on top of the Android Bench leaderboard, followed by Claude Opus 4.6, GPT-5.2-Codex, Opus 4.5, and Gemini 3 Pro, respectively. The tech giant says that all of the listed AI models can be tried out by developers by using API keys in the latest stable version of Android Studio.
Google says it will continue to improve the methodology to preserve the integrity of the dataset and is also planning to make improvements for future releases of the benchmark. The next iteration of the Android Bench will see increased quantity and complexity of tasks.
Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.