Google Introduces Gemini 3.1 Flash-Lite as Its Fastest and Most Cost-Efficient AI Model

The Gemini 3.1 Flash-Lite is rolling out in preview via the Gemini API in AI Studio and Vertex AI.

Written by Akash Dutta, Edited by Rohan Pal | Updated: 5 March 2026 15:32 IST

Highlights

The AI model costs $0.25 per million input tokens
Gemini 3.1 Flash-Lite costs $1.50 per million output tokens
Google claims the new model outperforms 2.5 Flash in response speed

Gemini 3.1 Flash-Lite achieved an Elo score of 1432 on the Arena.ai Leaderboard

Photo Credit: Google

Google introduced the Gemini 3.1 Flash-Lite artificial intelligence (AI) model on Thursday. Calling it the fastest and the most cost-efficient AI model in the Gemini 3 series, the Mountain View-based tech giant said it is designed for high-volume developer workloads. The model is currently not available to end users and has been reserved for developers and enterprises via specific channels. The company also claimed that the model's output speed is higher than that of the 2.5 series. Notably, the Gemini 3.1 Flash-Lite is currently only available in preview.

Gemini 3.1 Flash-Lite Is Here

In a blog post, the tech giant announced and detailed its latest Gemini 3.1 series large language model (LLM). Currently, the Gemini 3.1 Flash-Lite can be accessed in preview via the Gemini application programming interface (API) in Google AI Studio, and via Vertex AI for enterprises.

Google Discussion

Explore More...

Coming to capabilities, the company said the 3.1 Flash-Lite outperforms 2.5 Flash with a “2.5X faster Time to First Answer Token,” and a 45 percent increase in output speed, citing the Artificial Analysis benchmark. It is also said to have achieved an Elo score of 1432 on the Arena.ai leaderboard. It is also claimed to outperform GPT-5 mini, Claude 4.5 Haiku, and Grok 4.1 Fast in terms of output speed.

Meta Tests Shopping Capabilities in AI Assistant to Rival ChatGPT, Gemini

In AI Studio and Vertex AI, developers will be able to access the LLM in standard and thinking modes, with the latter allowing users to control the thinking time for a task. Highlighting some use cases, Google said the model can handle high-volume translation and content moderation, and can also be used for complex tasks, such as generating user interfaces and dashboards, creating simulations, or just following instructions.

The company also claimed that the Gemini 3.1 Flash-Lite is a cost-efficient AI model, with one million input tokens priced at $0.25 (roughly Rs. 23) and output tokens priced at $1.5 (roughly Rs. 137) per million tokens. In comparison, the Gemini 2.5 Flash costs $0.3 (roughly Rs. 27.5) per million input and $2.5 (roughly Rs. 229) per million output tokens.

Google Pixel 11a Key Specifications, Colour Options Leak; May Get Tensor G6 Chip, MediaTek Modem

20 July 2026

Chromebook vs Windows Laptop: Which One Should You Buy in India?

18 July 2026

Google Pixel 11a Codename Reportedly Spotted in Phone App

17 July 2026

Android 17 QPR1 Beta 7 Update Brings Refinements, Resolves Battery Share Bug in Quick Settings

17 July 2026

Google AI Mode Now Supports More Connected Apps Including YouTube Music for Everyday Tasks

17 July 2026

Google Introduces Gemini 3.1 Flash-Lite as Its Fastest and Most Cost-Efficient AI Model

Gemini 3.1 Flash-Lite Is Here

Related Stories