OpenAI Partners With Anthropic to Find Safety Flaws in Each Other’s AI Models

The two AI firms said that the science of alignment evaluations is new, and such collaborations prevent blind spots.

Advertisement
Written by Akash Dutta, Edited by Rohan Pal | Updated: 28 August 2025 12:11 IST
Highlights
  • OpenAI found Claude models prone to jailbreaking attempts
  • Claude is said to have a higher rate of refusals to avoid hallucinations
  • Several models from both developers showed a tendency for sycophancy

GPT-4o, GPT-4.1 and o4-mini were found to be more cooperative with simulated human misuse

Photo Credit: Unsplash/Markus Winkler

OpenAI has partnered with Anthropic for a first-of-its-kind alignment evaluation exercise, with the aim of finding gaps in the other company's internal safety measures. The findings from this collaboration were shared publicly on Wednesday, highlighting interesting behavioural insights about their popular artificial intelligence (AI) models. OpenAI found out that the Claude models said to be more prone to jailbreaking attempts compared to its o3 and o4-mini models. On the other hand, Anthropic found several models from the ChatGPT maker to be more cooperative with simulated human abuse.

OpenAI, Anthropic Find Several Concerns in Others' Models

In a blog post, OpenAI stated that the goal behind this joint exercise was to identify concerning model behaviours that could lead to it generating harmful content or being vulnerable to attacks. Anthropic has also shared its findings publicly, after sharing them with the ChatGPT maker.

OpenAI's findings suggest that Claude 4 models are well aligned when it comes to instruction hierarchy, the ability of a large language model (LLM) to respond appropriately to messages that can create a conflict between being helpful to humans and not breaking the developer's policies. The company said Claude 4 outperformed o3 marginally, and other OpenAI models by a wider margin.

Advertisement

However, the company found Claude models to be more prone to jailbreaking attempts. The risk was said to be higher when the models had reasoning enabled. These models also had a high 70 percent rate of refusals to mitigate hallucinations. OpenAI said the trade-off negatively impacts the utility of the model as “the overall accuracy rate for the examples in these evaluations where the models did choose to answer is still low.”

On the other hand, Anthropic found that GPT-4o, GPT-4.1 and o4-mini models were more willing than Claude models or o3 to cooperate with simulated human misuse. These models were found to comply with requests about drug synthesis, bioweapon development, and operational planning for terrorist attacks.

Additionally, both models exhibited signs of sycophancy towards users. In some cases, they even validated the harmful decisions by users who show delusional behaviour.

Advertisement

In light of the recent lawsuit against OpenAI and its CEO, Sam Altman, for the alleged wrongful death of a teenager by suicide, such cross-examination of major developers' AI models could pave the way for better safety measures for future AI products.

R
 

Catch the latest from the Consumer Electronics Show on Gadgets 360, at our CES 2026 hub.

Advertisement

Related Stories

Popular Mobile Brands
  1. OTT Releases of the Week (Jan 12 - Jan 18): Taskaree, 120 Bahadur, and More
  2. Redmi Note 15 Pro, Note 15 Pro+ 5G Could Launch in India on This Date
  3. Amazon Great Republic Day Sale: Top Deals on Premium Smartphones
  4. Top Deals on OnePlus Smartphones During the Amazon Great Republic Day Sale
  5. Instagram Will Let You Dub, Lip Sync Reels Into Five Indian Languages
  6. Amazon Great Republic Day Sale Is Live: Best Offers Today
  7. iPhone 18 Pro Series, iPhone Fold Could Launch With These Specifications
  8. Anantha Streaming Now: All You Need to Know About the Tamil Spiritual Drama
  9. Top Deals on Smartphones Under Rs 10,000 During Amazon Great Republic Day Sale
  10. iQOO Z11 Turbo With 200-Megapixel Camera Arrives in China at This Price
  1. Vivo X200T Price in India, Design, Key Specifications Tipped Ahead of Launch
  2. India Becomes World’s Second Largest 5G Base with 400M+ Users, Says Union Minister Jyotiraditya Scindia
  3. Instagram Will Now Let You Dub and Lip Sync Reels Into Five Indian Languages
  4. Bitcoin Trades Above $95,000 as ETF Inflows Drive Market Sentiment
  5. Redmi Note 15 Pro, Redmi Note 15 Pro+ 5G India Launch Date Reportedly Leaked
  6. Top Deals on Echo and Fire TV Devices During Amazon Great Republic Day Sale
  7. iPhone Fold, iPhone 18 Pro Series Said to Launch With A20 Pro Chip; Camera and Display Specifications Leaked
  8. MediaTek Announces Dimensity 9500s Flagship Chipset, Rival to Snapdragon 8 Gen 5 and Dimensity 8500 Midrange Chipset
  9. Tecno Spark Go 3 Launched in India With 5,000mAh Battery, 13-Megapixel Camera: Price, Specifications
  10. Square Enix Confirms New Life Is Strange Game for 2026; Full Reveal on January 20
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.