Anthropic to Fund Initiative to Develop New Third-Party AI Benchmarks to Assess AI Models

Anthropic has invited applications from interested entities to develop new benchmarks for AI models.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 2 July 2024 17:34 IST
Highlights
  • Anthropic said AI Safety Level assessment is one of its priorities
  • The new benchmarks will also focus on advanced AI capabilities
  • Anthropic said it prefers evaluations with up to 10,000 tasks

Anthropic said it wants assessment over impacts such as harmful biases, discrimination, and more

Photo Credit: Anthropic

Anthropic announced a new initiative to develop new benchmarks to test the capabilities of advanced artificial intelligence (AI) models on Tuesday. The AI firm will be funding the project and has invited applications from interested entities. The company said that the existing benchmarks are not enough to fully test the capabilities and the impact of the newer large language models (LLMs). As a result, a new set of evaluations focused on AI safety, advanced capabilities, and its societal impact is needed to be developed, stated Anthropic.

Anthropic to fund new benchmarks for AI models

In a newsroom post, Anthropic highlighted the need for a comprehensive third-party evaluation ecosystem to overcome the limited scope of current benchmarks. The AI firm announced that through its initiative, it will fund third-party organisations that want to develop new assessments for AI models focused on quality and high safety standards.

Advertisement

For Anthropic, the high-priority areas include tasks and questions that can measure an LLM's AI Safety Levels (ASLs), advanced capabilities in generating ideas and responses, as well as the societal impact of these capabilities.

Under the ASL category, the company highlighted several parameters that include the capability of the AI models to assist or act autonomously in running cyberattacks, the potential of the models to assist in the creation of or enhancing the knowledge of creating chemical, biological, radiological and nuclear (CBRN) risks, national security risk assessment, and more.

Advertisement

In terms of advanced capabilities, Anthropic highlighted that the benchmarks should be capable of assessing AI's potential to transform scientific research, participation and refusal towards harmfulness, and multilingual capabilities. Further, the AI firm said it is necessary to understand the potential of an AI model to impact society. For this, the evaluations should be able to target concepts such as “harmful biases, discrimination, over-reliance, dependence, attachment, psychological influence, economic impacts, homogenization, and other broad societal impacts.”

Apart from this, the AI firm also listed some principles for good evaluations. It said evaluations should not be available in training data used by AI as it often turns into a memorisation test for the models. It also encouraged keeping between 1,000 to 10,000 tasks or questions to test the AI. It also asked organisations to use subject matter experts to create tasks that test performance in a specific domain.


Is the Samsung Galaxy Z Flip 5 the best foldable phone you can buy in India right now? We discuss the company's new clamshell-style foldable handset on the latest episode of Orbital, the Gadgets 360 podcast. Orbital is available on Spotify, Gaana, JioSaavn, Google Podcasts, Apple Podcasts, Amazon Music and wherever you get your podcasts.
Affiliate links may be automatically generated - see our ethics statement for details.
 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement

Related Stories

Popular Mobile Brands
  1. Moto G87 Launched With 200-Megapixel Main Camera, 5,200mAh Battery
  2. Moto G47 Debuts Globally With a 108-Megapixel Camera at This Price
  3. CMF Watch 3 Pro India Launch Finally Confirmed, Here's What to Expect
  4. Raakaasa OTT Release Date Confirmed: Know When and Where to Watch it Online
  5. OnePlus Pad 4 Launched in India With Flagship Chip and These Features
  6. These Four Xiaomi Phones Are Now Eligible to Get Android 17 Beta Updates
  7. The iQOO Neo 10 Is Now Available in These New Colour Variants in India
  8. Moto G37 Power, Moto G37 Launched With Dimensity 6300 Chip: See Price
  9. Sony Issues Statement on New DRM Check for PS5, PS4 Games After Backlash
  1. PS5 Linux Loader Gets Public Release, Allowing Users to Run Steam and PC Games on Console
  2. Nine Crypto Scam Centres Targeting US Users Shut Down in Joint Operation Involving UAE, US and China
  3. Google Photos Unveils New AI-Powered Wardrobe Feature to Help You Decide What to Wear
  4. OpenAI CEO Sam Altman Teases GPT-5.5 Cyber AI Model Rollout, Could Take On Anthropic’s Claude Mythos
  5. Vivo X Fold 6 Leaks Hint at 200-Megapixel Camera, MediaTek Dimensity 9500 Chip and 7,000mAh Battery
  6. Raakaasa OTT Release Date Confirmed: Know When and Where to Watch it Online
  7. Moto G47 Launched With 108-Megapixel Camera, 5,200mAh Battery: Price, Specifications
  8. Sony Issues Statement on New DRM Check for PS5, PS4 Games After Backlash
  9. House of the Dragon Season 3 OTT Release Date: When and Where to Watch it Online?
  10. Moto G37 Power Launched With 7,000mAh Battery Alongside Moto G37: Price, Specifications
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.