Anthropic Thwarts Hacker Attempts to Misuse Claude AI for Cybercrime

Anthropic's report said its internal systems had stopped the attacks and it was sharing the case studies to help others understand the risks.

By Reuters | Updated: 28 August 2025 18:19 IST

Highlights

Anthropic said it had banned the accounts involved, tightened its filters
AI tools are being increasingly exploited in cybercrime
Attackers attempted to use Claude to produce harmful content

Anthropic said it follows strict safety practices, including regular testing and outside reviews

Photo Credit: Anthropic

Anthropic said on Wednesday it had detected and blocked hackers attempting to misuse its Claude AI system to write phishing emails, create malicious code and circumvent safety filters.

The company's findings, published in a report, highlight growing concerns that AI tools are increasingly exploited in cybercrime, intensifying calls for tech firms and regulators to strengthen safeguards as the technology spreads.

Anthropic Discussion

Explore More...

Anthropic's report said its internal systems had stopped the attacks and it was sharing the case studies - showing how attackers had attempted to use Claude to produce harmful content - to help others understand the risks.

OpenAI, Anthropic Join Hands to Improve Safety of Each Other’s AI Models

The report cited attempts to use Claude to draft tailored phishing emails, write or fix snippets of malicious code and sidestep safeguards through repeated prompting.

It also described efforts to script influence campaigns by generating persuasive posts at scale and helping low-skill hackers with step-by-step instructions.

The company, backed by Amazon.com and Alphabet, did not publish technical indicators such as IPs or prompts, but said it had banned the accounts involved and tightened its filters after detecting the activity.

Claude Can Now End Conversations if the Topic Is Harmful or Abusive

Experts say criminals are increasingly turning to AI to make scams more convincing and to speed up hacking attempts. These tools can help write realistic phishing messages, automate parts of malware development and even potentially assist in planning attacks.

Security researchers warn that as AI models become more powerful, the risk of misuse will grow unless companies and governments act quickly.

Anthropic said it follows strict safety practices, including regular testing and outside reviews, and plans to keep publishing reports when it finds major threats.

Anthropic Is Testing a Claude Extension for Google Chrome

Microsoft and SoftBank-backed OpenAI and Google have faced similar scrutiny over fears their AI models could be exploited for hacking or scams, prompting calls for stronger safeguards.

Governments are also moving to regulate the technology, with the European Union moving forward with its Artificial Intelligence Act and the United States pushing for voluntary safety commitments from major developers.

Anthropic Thwarts Hacker Attempts to Misuse Claude AI for Cybercrime

Related Stories