Google Releases Gemma 3n Open-Source AI Model That Can Run Locally on 2GB RAM

Google says Gemma 3n natively supports image, audio, video, and text inputs as well as text outputs.

Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 27 June 2025 14:23 IST

Highlights

Gemma 3n was released as an early preview in May
The AI model is available in two variants — E2B and E4B
It is built on the MatFormer architecture

The open-source Gemma 3n AI model is available to download on Hugging Face and Kaggle

Photo Credit: Google

Google released the full version of Gemma 3n, its latest open-source model in the Gemma 3 family of artificial intelligence (AI) models, on Thursday. First announced in May, the new model is designed and optimised for on-device use cases and features several new architecture-based improvements. Interestingly, the large language model (LLM) can be run locally on just 2GB of RAM. This means the model can be deployed and operated even on a smartphone, provided it comes with AI-enabled processing power.

Gemma 3n Is a Multimodal AI Model

In a blog post, the Mountain View-based tech giant announced the release of the full version of Gemma 3n. The model follows the launch of the Gemma 3 and GemmaSign models and joins the Gemmaverse. Since it is an open-source model, the company has provided its model weights as well as the cookbook to the community. The model itself is available to use under a permissive Gemma license, which allows both academic and commercial usages.

Google Discussion

Explore More...

Gemma 3n is a multimodal AI model. It natively supports image, audio, video, and text inputs. However, it can only generate text outputs. It is also a multilingual model and supports 140 languages for text, and 35 languages when the input is multimodal.

Google’s New Change for Gemini Turns Out to Not Be a Privacy Concern

Google says that Gemma 3n has a “mobile-first architecture,” which is built on Matryoshka Transformer or MatFormer architecture. It is a nested transformer, named after the Russian nesting dolls, where one fits inside another. This architecture offers a unique way of training AI models with different parameter sizes.

Gemma 3n comes in two sizes — E2B and E4B — short for effective parameters. This means, despite being five billion and eight billion parameters in size, the active parameters are just two and four billion.

This is achieved using a technique called Per-Layer Embeddings (PLE), where only the most essential parameters are required to be loaded into the fast memory (VRAM). The rest remains in the extra layer embeddings and can be handled by the CPU.

You Can Now See Captions While Talking to Gemini Live

So, with the MatFormer system, the E4B variant nests the E2B model, and when the larger model is being trained, it simultaneously trains the smaller model. This gives users the convenience of either using E4B for more advanced operations or E2B for faster outputs without finding any noticeable differences in the quality of the processing or output.

Google is also letting users create custom-sized models by tweaking certain internal parts. For this, the company is releasing the MatFormer Lab tool that will let developers test different combinations to help them find the custom model sizes.

You Can Now Code in the Terminal With Google’s Free Gemini CLI Tool

Currently, Gemma 3n is available to download via Google's Hugging Face listing and Kaggle listing. Users can also visit Google AI Studio to try Gemma 3n. Notably, Gemma models can also be deployed directly to Cloud Run from AI Studio.

Google Releases Gemma 3n Open-Source AI Model That Can Run Locally on 2GB RAM

Gemma 3n Is a Multimodal AI Model

Related Stories