OpenAI Partners With Anthropic to Find Safety Flaws in Each Other’s AI Models

The two AI firms said that the science of alignment evaluations is new, and such collaborations prevent blind spots.

Advertisement
Written by Akash Dutta, Edited by Rohan Pal | Updated: 28 August 2025 12:11 IST
Highlights
  • OpenAI found Claude models prone to jailbreaking attempts
  • Claude is said to have a higher rate of refusals to avoid hallucinations
  • Several models from both developers showed a tendency for sycophancy

GPT-4o, GPT-4.1 and o4-mini were found to be more cooperative with simulated human misuse

Photo Credit: Unsplash/Markus Winkler

OpenAI has partnered with Anthropic for a first-of-its-kind alignment evaluation exercise, with the aim of finding gaps in the other company's internal safety measures. The findings from this collaboration were shared publicly on Wednesday, highlighting interesting behavioural insights about their popular artificial intelligence (AI) models. OpenAI found out that the Claude models said to be more prone to jailbreaking attempts compared to its o3 and o4-mini models. On the other hand, Anthropic found several models from the ChatGPT maker to be more cooperative with simulated human abuse.

OpenAI, Anthropic Find Several Concerns in Others' Models

In a blog post, OpenAI stated that the goal behind this joint exercise was to identify concerning model behaviours that could lead to it generating harmful content or being vulnerable to attacks. Anthropic has also shared its findings publicly, after sharing them with the ChatGPT maker.

OpenAI's findings suggest that Claude 4 models are well aligned when it comes to instruction hierarchy, the ability of a large language model (LLM) to respond appropriately to messages that can create a conflict between being helpful to humans and not breaking the developer's policies. The company said Claude 4 outperformed o3 marginally, and other OpenAI models by a wider margin.

Advertisement

However, the company found Claude models to be more prone to jailbreaking attempts. The risk was said to be higher when the models had reasoning enabled. These models also had a high 70 percent rate of refusals to mitigate hallucinations. OpenAI said the trade-off negatively impacts the utility of the model as “the overall accuracy rate for the examples in these evaluations where the models did choose to answer is still low.”

Advertisement

On the other hand, Anthropic found that GPT-4o, GPT-4.1 and o4-mini models were more willing than Claude models or o3 to cooperate with simulated human misuse. These models were found to comply with requests about drug synthesis, bioweapon development, and operational planning for terrorist attacks.

Additionally, both models exhibited signs of sycophancy towards users. In some cases, they even validated the harmful decisions by users who show delusional behaviour.

Advertisement

In light of the recent lawsuit against OpenAI and its CEO, Sam Altman, for the alleged wrongful death of a teenager by suicide, such cross-examination of major developers' AI models could pave the way for better safety measures for future AI products.

R
 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. NASA Confirms Third Interstellar Visitor 3I/ATLAS Is a Natural Comet
  2. Lava Play Max Could Launch in India Soon at This Price
  3. Realme P4x 5G Price in India Leaked; Here's How Much It Might Cost
  4. iQOO 15 Sale in India Begins Today: All You Need to Know
  5. OnePlus Ace 6T Camera Specifications Confirmed Ahead of China Debut
  6. Realme Watch 5 Key Features Confirmed Ahead of December 4 India Launch
  7. Redmi Note 16 Pro+, Realme 16 Pro+ Tipped to Launch Soon
  8. Xiaomi 17 Ultra Tipped to Launch Soon With This Leica Camera Upgrade
  9. Nagin 7 OTT Release: When and Where to Watch the Popular Supernatural Drama
  10. Vivo X300 Ultra Launch Timeline, Battery Capcity Leaked
  1. New GTA 6 Leak Allegedly Shows In-Development Footage From Game
  2. Gustakh Ishq OTT Release Reportedly Revealed Online: When and Where to Watch it Online?
  3. Nithari: Truth, Lies & Murder Now Streaming Online: Plot, Cast, Crew, Streaming Details, and More
  4. Seher Hone Ko Hai OTT Release: Cast, Plot, Trailer, Storyline, and Complete Drama Summary
  5. Vivo V70 FE India Launch Timeline Leaked; Said to Debut With Snapdragon Chipset
  6. Vivo X300 Ultra Launch Timeline Leaked; Tipped to Arrive With 7,000mAh Battery
  7. Nothing Phone 3a, Phone 3a Pro Get Nothing OS 4.0 Update With Android 16, AI Usage Dashboard and More
  8. Bitcoin Price Slips to $85,000 Zone After Liquidation Shock; Crypto Market Eyes US Fed Shift
  9. OnePlus Ace 6T Camera Details Revealed: Expected Specifications, Features
  10. Oakley Meta Glasses With Meta AI Integration Now Available for Purchase in India: Price, Availability
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.