Gemini 1.5 Flash-8B With Lowest Token Cost Among Gemini Family Now Available

Gemini 1.5 Flash-8B is an experimental version of Gemini 1.5 Flash, first released last month.

Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 4 October 2024 14:22 IST

Highlights

Google has doubled the rate limits with Gemini 1.5 Flash-8B
The AI model costs $0.15 (roughly Rs. 12.5) per 1 million output tokens
Gemini 1.5 Flash-8B is said to be optimised for speed and efficiency

Developers can access Gemini-1.5 Flash-8B for free via Google AI Studio and the Gemini API

Photo Credit: Google

Gemini 1.5 Flash-8B, the latest entrant in the Gemini family of artificial intelligence (AI) models, is now generally available for production use. On Thursday, Google announced the general availability of the model, highlighting that it was a smaller and faster version of the Gemini 1.5 Flash which was introduced at Google I/O. Due to being fast, it has a low latency inference and more efficient output generation. More importantly, the tech giant stated that the Flash-8B AI model is the “lowest cost per intelligence of any Gemini model”.

Gemini 1.5 Flash-8B Now Generally Available

In a developer blog post, the Mountain View-based tech giant detailed the new AI model. The Gemini 1.5 Flash-8B was distilled from the Gemini 1.5 Flash AI model, which was focused on faster processing and more efficient output generation. The company now claims that Google DeepMind developed this even smaller and faster version of the AI model in the last few months.

Gemini Discussion

Explore More...

Despite being a smaller model, the tech giant claims that it “nearly matches” the performance of the 1.5 Flash model across multiple benchmarks. Some of these include chat, transcription, and long context language translation.

You Might Soon Be Able to Share Images Directly With Google's Gemini App

One major benefit of the AI model is its price effectiveness. Google said that the Gemini 1.5 Flash-8B will offer the lowest token pricing in the Gemini family. Developers will have to pay $0.15 (roughly Rs. 12.5) per one million output tokens, $0.0375 (roughly Rs. 3) per one million input tokens, and $0.01 (roughly Rs. 0.8) per one million tokens on cached prompts.

Additionally, Google is doubling the rate limits of the 1.5 Flash-8B AI model. Now, developers can send up to 4,000 requests per minute (RPM) while using this model. Explaining the decision, the tech giant stated that the model is suited for simple, high-volume tasks. Developers who wish to try out the model can do so via Google AI Studio and the Gemini API free of charge.

Google Expands Personalised Gemini AI Image Creation to US Users; Nano Banana 2 Lite Unveiled

1 July 2026

Gemini in Chrome Gets a New 'Select From Screen' Feature for Faster AI-Powered Searches

25 June 2026

Google Home Speaker Finally Makes Its Global Debut, Available to Pre-Order in Select Markets: Price, Features

18 June 2026

ChatGPT’s Market Share Falls Below 50 Percent for First Time as Gemini, Claude Gain Ground: Report

18 June 2026

Google's Next Pixel Drop to Reportedly Bring Screen Reactions, Gemini Omni Features to Pixel Phones

16 June 2026

Gemini 1.5 Flash-8B With Lowest Token Cost Among Gemini Family Now Available

Gemini 1.5 Flash-8B Now Generally Available

Related Stories