Anthropic Warns That Minimal Data Contamination Can ‘Poison’ Large AI Models

As few as 250 malicious documents can produce a "backdoor" vulnerability in a large AI model, says Anthropic.

Advertisement
Written by Akash Dutta, Edited by Ketan Pratap | Updated: 11 October 2025 13:04 IST
Highlights
  • LLMs can exfiltrate sensitive data when attacker adds a trigger phrase
  • Anthropic says the size of the total dataset does not matter
  • Study breaks the belief that attackers need to control large data portion

UK AI Security Institute and the Alan Turing Institute partnered with Anthropic on this study

Photo Credit: Anthropic

Anthropic, on Thursday, warned developers that even a small data sample contaminated by bad actors can open a backdoor in an artificial intelligence (AI) model. The San Francisco-based AI firm conducted a joint study with the UK AI Security Institute and the Alan Turing Institute to find that the total size of the dataset in a large language model is irrelevant if even a small portion of the dataset is infected by an attacker. The findings challenge the existing belief that attackers need to control a proportionate size of the total dataset in order to create vulnerabilities in a model.

Anthropic's Study Highlights AI Models Can Be Poisoned Relatively Easily

The new study, titled “Poisoning Attacks on LLMs Require a Near-constant Number of Poison Samples,” has been published on the online pre-print journal arXiv. Calling it “the largest poisoning investigation to date,” the company claims that just 250 malicious documents in pretraining data can successfully create a backdoor in LLMs ranging from 600M to 13B parameters.

The team focused on a backdoor-style attack that triggers the model to produce gibberish output when encountering a specific hidden trigger token, while otherwise behaving normally, Anthropic explained in a post. They trained models of different parameter sizes, including 600M, 2B, 7B, 13B, on proportionally scaled clean data (Chinchilla-optimal) while injecting 100, 250, or 500 poisoned documents to test vulnerability.

Advertisement

Surprisingly, whether it was a 600M model or a 13B model, the attack success curves were nearly identical for the same number of poisoned documents. The study concludes that model size does not shield against backdoors, and what matters is the absolute number of poisoned points encountered during training.

Advertisement

The researchers further report that while injecting 100 malicious documents was insufficient to reliably backdoor any model, 250 documents or more consistently worked across all sizes. They also varied training volume and random seeds to validate the robustness of the result.

However, the team is cautious: this experiment was constrained to a somewhat narrow denial-of-service (DoS) style backdoor, which causes gibberish output, not more dangerous behaviours such as data leakage, malicious code, or bypassing safety mechanisms. It's still open whether such dynamics hold for more complex, high-stakes backdoors in frontier models.

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. iQOO 15: Everything You Need to Know Ahead of Launch in India
  2. Poco Pad X1 Design, Key Features Leaked Ahead of November 26 Launch
  3. Huawei MatePad Edge, Watch Ultimate 2 Launched: Check Prices
  4. Kevin Hart Brings Big Laughs in Acting My Age, Now Streaming on Netflix
  5. Steam Black Friday Deals: Best Games Under Rs. 500 and More
  6. Claude Opus 4.5 Arrives With Upgraded Coding and Agentic Performance
  1. Missing: Dead or Alive Season 2 OTT Release on Netflix: Everything You Need to Know
  2. 3 Roses Season 2 OTT Release: Know When and Where to Watch This Telugu Series Online
  3. Tell Me Softly OTT Release Date Revealed: Know When and Where to Watch the Spanish Rom-Com Film Online
  4. Comet C/2025 K1 (ATLAS) Breaks Into Three Pieces Following Close Approach to the Sun
  5. James Webb Telescope May Have Discovered Universe’s Earliest Supermassive Black Hole
  6. NASA’s Nancy Grace Roman Space Telescope Surpassing Expectations Even Before Launch, Reveals Research
  7. Airtel Ramps Up Xstream Fiber Rollout Amid Surge in India’s Connected Homes
  8. OnePlus Ace 6T Charging Speed, Cooling System, Other Specifications Confirmed Ahead of Launch
  9. Samsung Galaxy S25 Series Could Get One UI 8.5 Beta Soon; Update Spotted on Samsung Server: Report
  10. Sam Altman and Jony Ive’s AI Device Prototype Finalised, Could Launch Within Two Years
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.