OpenAI Partners With Anthropic to Find Safety Flaws in Each Other’s AI Models

The two AI firms said that the science of alignment evaluations is new, and such collaborations prevent blind spots.

Advertisement
Written by Akash Dutta, Edited by Rohan Pal | Updated: 28 August 2025 12:11 IST
Highlights
  • OpenAI found Claude models prone to jailbreaking attempts
  • Claude is said to have a higher rate of refusals to avoid hallucinations
  • Several models from both developers showed a tendency for sycophancy

GPT-4o, GPT-4.1 and o4-mini were found to be more cooperative with simulated human misuse

Photo Credit: Unsplash/Markus Winkler

OpenAI has partnered with Anthropic for a first-of-its-kind alignment evaluation exercise, with the aim of finding gaps in the other company's internal safety measures. The findings from this collaboration were shared publicly on Wednesday, highlighting interesting behavioural insights about their popular artificial intelligence (AI) models. OpenAI found out that the Claude models said to be more prone to jailbreaking attempts compared to its o3 and o4-mini models. On the other hand, Anthropic found several models from the ChatGPT maker to be more cooperative with simulated human abuse.

OpenAI, Anthropic Find Several Concerns in Others' Models

In a blog post, OpenAI stated that the goal behind this joint exercise was to identify concerning model behaviours that could lead to it generating harmful content or being vulnerable to attacks. Anthropic has also shared its findings publicly, after sharing them with the ChatGPT maker.

Advertisement

OpenAI's findings suggest that Claude 4 models are well aligned when it comes to instruction hierarchy, the ability of a large language model (LLM) to respond appropriately to messages that can create a conflict between being helpful to humans and not breaking the developer's policies. The company said Claude 4 outperformed o3 marginally, and other OpenAI models by a wider margin.

However, the company found Claude models to be more prone to jailbreaking attempts. The risk was said to be higher when the models had reasoning enabled. These models also had a high 70 percent rate of refusals to mitigate hallucinations. OpenAI said the trade-off negatively impacts the utility of the model as “the overall accuracy rate for the examples in these evaluations where the models did choose to answer is still low.”

Advertisement

On the other hand, Anthropic found that GPT-4o, GPT-4.1 and o4-mini models were more willing than Claude models or o3 to cooperate with simulated human misuse. These models were found to comply with requests about drug synthesis, bioweapon development, and operational planning for terrorist attacks.

Additionally, both models exhibited signs of sycophancy towards users. In some cases, they even validated the harmful decisions by users who show delusional behaviour.

Advertisement

In light of the recent lawsuit against OpenAI and its CEO, Sam Altman, for the alleged wrongful death of a teenager by suicide, such cross-examination of major developers' AI models could pave the way for better safety measures for future AI products.

R
 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. Amazon Fire TV Stick HD (2026) Review
  2. Upcoming Smartphones in June: Motorola Edge 70 Pro+, Xiaomi 17T and More
  1. Propeller: One-Way Night Coach OTT Release: Where to Watch John Travolta’s Emotional Family Drama
  2. Mark OTT Release: Kannada Action Thriller Streaming on Amazon Prime Video
  3. Brothers and Sisters on OTT: Where to Watch the Emotional Family Drama Series
  4. The Pyramid Scheme OTT Release Date Revealed: Know When and Where to Watch it Online
  5. Most Powerful Neutrino Ever Detected May Have Come From a Blazar
  6. Faces Out on OTT: Know Where to Stream This Psychological Thriller Film Online
  7. Blue Origin’s New Glenn Rocket Explodes During Pre-Launch Test in Florida
  8. Activision to Shut Down Call of Duty: Warzone on PS4, Xbox One After Modern Warfare 4 Launch
  9. Vivo Over-Ear Noise-Cancelling Headphones Launched With Up to 75 Hours of Battery Life
  10. Motorola Edge 70 Pro+ Key Specifications Revealed Days Ahead of Launch in India on June 4
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.