Google VaultGemma: 5 Things to Know About the AI Model That Puts Privacy First

Last week, Google Research introduced VaultGemma, an AI model which was trained on differential privacy.

Written by Akash Dutta, Edited by Ketan Pratap | Updated: 15 September 2025 13:51 IST

Highlights

Google added calibrated noise in the model to prevent memorisation
The model’s privacy approach comes with some performance trade-offs
Google said the AI model requires more compute and data

VaultGemma’s stronger privacy focus can result in lower accuracy in responses

Photo Credit: Google

Privacy has been a long-debated topic in the artificial intelligence (AI) space. While companies have taken steps to safeguard user privacy in the post-deployment phase, not a lot has been done in the pre-deployment or pre-training phase of AI models. To tackle this, Google, on Friday, released a privacy-centric large language model (LLM), which has been trained using a new privacy differential technique to ensure that the model cannot memorise sensitive information in the training phase. This measure ensures that prompt hackers cannot trick the AI into spilling identifiable information.

Google's VaultGemma: 5 Things You Should Know

1. Google's VaultGemma is a one-billion-parameter AI model. The tech giant used privacy differentiation in the pre-training phase, combining sensitive data, where the identifiers such as people's names, addresses, emails, and similar information, with calibrated noise. The noise prevents the AI model from memorising the identifier.

Google Discussion

Explore More...

2. So, what does it really protect? VaultGemma prevents the model from memorising and regurgitating sensitive snippets such as credit card numbers or someone's address that were present in the training data. The noise-batch ratio also ensures that one document, sentence, or person's data does not influence the response generated by the model. Essentially, this training strategy would not let an attacker reliably figure out whether or not the target's data was present in the dataset.

Gemini Overtakes ChatGPT on App Store, Reaches the Top Spot

3. The Privacy focus comes with certain performance trade-offs. The first thing it impacts is the accuracy. To increase privacy, the researchers will have to add more noise to the dataset. This means the AI model is not able to learn finer details, reducing the accuracy of responses somewhat when compared to non-private models.

For instance, without privacy, an AI model might know exact Shakespeare quotes, but with the differential privacy strategy, it will only capture the style but struggle in identifying the exact words.

4. There are trade-offs with compute and model size as well. To balance out the noise with performance, a model needs to be trained with larger datasets and more powerful computers. This makes differential privacy training slower and more expensive, and requires more compute.

Gemini Might Soon Let More Users Switch to Split Screen

Coming to the model size, Google noted that with differential privacy, a larger model size does not mean better performance, unlike what has been observed in traditional model training with scaling laws. Smaller models, when trained with the right settings, can outperform a model with more parameters. This requires a rethinking of the scaling laws of an LLM. However, not changing anything would give diminished results.

Google has also compared the performance of VaultGemma with Gemma 3 (a non-privacy model with the same parameters), and GPT-2, an older baseline model.

VaultGemma performance
Photo Credit: Google

5. So, what is the advantage to the end consumer? One privacy-focused model in itself is not going to change anything for the consumer. However, what Google has shown here is that it is possible to train and build a privacy-focused AI model that still delivers relatively decent performance.

Google Updates Gemini App’s Prompt Bar With an Open-Box Design

If this standard is adopted by all major AI players, it will significantly contribute to protecting the data of people globally. This is important at a time when companies such as Google, OpenAI, and Anthropic are training their models on users' conversations.

Google VaultGemma: 5 Things to Know About the AI Model That Puts Privacy First

Google's VaultGemma: 5 Things You Should Know

Related Stories