• Home
  • Ai
  • Ai Features
  • Google VaultGemma: 5 Things to Know About the AI Model That Puts Privacy First

Google VaultGemma: 5 Things to Know About the AI Model That Puts Privacy First

Last week, Google Research introduced VaultGemma, an AI model which was trained on differential privacy.

Google VaultGemma: 5 Things to Know About the AI Model That Puts Privacy First

Photo Credit: Google

VaultGemma’s stronger privacy focus can result in lower accuracy in responses

Highlights
  • Google added calibrated noise in the model to prevent memorisation
  • The model’s privacy approach comes with some performance trade-offs
  • Google said the AI model requires more compute and data
Advertisement

Privacy has been a long-debated topic in the artificial intelligence (AI) space. While companies have taken steps to safeguard user privacy in the post-deployment phase, not a lot has been done in the pre-deployment or pre-training phase of AI models. To tackle this, Google, on Friday, released a privacy-centric large language model (LLM), which has been trained using a new privacy differential technique to ensure that the model cannot memorise sensitive information in the training phase. This measure ensures that prompt hackers cannot trick the AI into spilling identifiable information.

Google's VaultGemma: 5 Things You Should Know

1. Google's VaultGemma is a one-billion-parameter AI model. The tech giant used privacy differentiation in the pre-training phase, combining sensitive data, where the identifiers such as people's names, addresses, emails, and similar information, with calibrated noise. The noise prevents the AI model from memorising the identifier.

2. So, what does it really protect? VaultGemma prevents the model from memorising and regurgitating sensitive snippets such as credit card numbers or someone's address that were present in the training data. The noise-batch ratio also ensures that one document, sentence, or person's data does not influence the response generated by the model. Essentially, this training strategy would not let an attacker reliably figure out whether or not the target's data was present in the dataset.

3. The Privacy focus comes with certain performance trade-offs. The first thing it impacts is the accuracy. To increase privacy, the researchers will have to add more noise to the dataset. This means the AI model is not able to learn finer details, reducing the accuracy of responses somewhat when compared to non-private models.

For instance, without privacy, an AI model might know exact Shakespeare quotes, but with the differential privacy strategy, it will only capture the style but struggle in identifying the exact words.

4. There are trade-offs with compute and model size as well. To balance out the noise with performance, a model needs to be trained with larger datasets and more powerful computers. This makes differential privacy training slower and more expensive, and requires more compute.

Coming to the model size, Google noted that with differential privacy, a larger model size does not mean better performance, unlike what has been observed in traditional model training with scaling laws. Smaller models, when trained with the right settings, can outperform a model with more parameters. This requires a rethinking of the scaling laws of an LLM. However, not changing anything would give diminished results.

Google has also compared the performance of VaultGemma with Gemma 3 (a non-privacy model with the same parameters), and GPT-2, an older baseline model.

vaultgemma performance VaultGemma performance

VaultGemma performance
Photo Credit: Google

 

5. So, what is the advantage to the end consumer? One privacy-focused model in itself is not going to change anything for the consumer. However, what Google has shown here is that it is possible to train and build a privacy-focused AI model that still delivers relatively decent performance.

If this standard is adopted by all major AI players, it will significantly contribute to protecting the data of people globally. This is important at a time when companies such as Google, OpenAI, and Anthropic are training their models on users' conversations.

Comments

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

Akash Dutta
Akash Dutta is a Chief Sub Editor at Gadgets 360. He is particularly interested in the social impact of technological developments and loves reading about emerging fields such as AI, metaverse, and fediverse. In his free time, he can be seen supporting his favourite football club - Chelsea, watching movies and anime, and sharing passionate opinions on food. More
Flipkart Big Billion Days Sale: Nothing Announces Offers on Phone 3a Pro, CMF Phone 2 Pro, Nothing Ear, and More

Advertisement

Follow Us

Advertisement

© Copyright Red Pixels Ventures Limited 2025. All rights reserved.
Trending Products »
Latest Tech News »