Search

Epoch AI Launches FrontierMath AI Benchmark to Test Capabilities of AI Models

FrontierMath is a benchmark for evaluating advanced mathematical reasoning in AI.

Advertisement
Highlights
  • FrontierMath was created in collaboration with over 60 mathematicians
  • The test comprises algebraic geometry to Zermelo–Fraenkel set theory
  • The company said older benchmarks do not truly test AI capabilities
Epoch AI Launches FrontierMath AI Benchmark to Test Capabilities of AI Models

The problems in FrontierMath are new and unpublished to avoid data contamination

Photo Credit: Epoch AI

Epoch AI, a California-based research institute launched a new artificial intelligence (AI) benchmark last week. Dubbed FrontierMath, the new AI benchmark tests large language models (LLMs) on their capability of reseasoning and mathematical problem-solving. The AI firm claims that existing math benchmarks are not very useful due to factors like data contamination and AI models scoring very high scores on them. Epoch AI claims that even the leading LLMs have scored less than two percent on the new benchmark.

Epoch AI Launches FrontierMath Benchmark

In a post on X (formerly known as Twitter), the AI firm explained that it collaborated with more than 60 mathematicians to create hundreds of origins and unpublished math problems. Epoch AI claims that these questions would take even mathematicians hours to solve. The reason behind developing the new benchmark was cited as the limitations with existing benchmarks such as GSM8K and MATH, where AI models generally score a high point.

The company claimed that the high scores achieved by LLMs are largely due to data contamination. This means the questions somehow were already fed into the AI models, resulting in them easily solving the questions.

FrontierMath solves the problem by including new problems that are unique and have not been published anywhere, mitigating the risks associated with data contamination. Further, the benchmark includes a wide range of questions including computationally intensive problems in number theory, real analysis, and algebraic geometry, as well as topics such as Zermelo–Fraenkel set theory. The AI firm says all the questions are “guess proof”, meaning they cannot be solved accidentally without strong reasoning.

Epoch AI highlighted that to measure AI's aptitude, benchmarks should be created on creative problem-solving where the AI has to maintain reasoning over multiple steps. Notably, many industry veterans believe that the existing benchmarks are not sufficient to correctly measure how advanced an AI model is.

Responding to the new benchmark in a post, Noam Brown, an OpenAI researcher who was behind the company's o1 model welcomed the new benchmark and said, “I love seeing a new eval with such low pass rates for frontier models.”

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

 
Show Full Article
Please wait...
Advertisement
Popular Mobile Brands
  1. Samsung Galaxy Buds 3 Pro's Amazon Prime Day 2025 Offer Revealed
  2. AI+ Pulse, AI+ Nova 5G With 50-Megapixel Rear Cameras Launched in India
  3. OnePlus Nord 5, OnePlus Nord CE 5 Launched in India at These Prices
  4. iQOO 13, iQOO Neo 10R and More Get Discounts During Prime Day 2025 Sale
  5. Samsung Galaxy S25 FE Said to Get a Flexible OLED Display This Year
  6. Ai+ Wearbuds Smartwatch Launched in India With Built-In TWS Earbuds
  7. Samsung Galaxy Z Fold 7, Z Flip 7, Z Flip 7 FE Specifications Leaked
  8. OnePlus Nord CE 5 Review
  9. Apple Releases iOS 26 Beta 3 Update for iPhone With These New Features
  10. OnePlus Nord 5 Review
  1. Samsung Galaxy Unpacked 2025 Event Today: Galaxy Z Fold 7, Z Flip 7 Launch Expected, How to Watch Livestream
  2. Vivo V60 Reportedly Listed on SIRIM and TUV Websites, Could Launch Soon
  3. Amazon Prime Day 2025 Sale: iQOO 13, iQOO Neo 10R, iQOO Z10x and More to Go on Sale at Discounted Prices
  4. Swiggy Instamart Teams Up With Jio for Instant Delivery of JioBharat V4 and JioPhone Prima 2
  5. Apple Maps in iOS 26 Beta Version Come With An Upgraded Search Feature: Report
  6. WhatsApp Rolls Out AI-Powered Chat Wallpaper Feature; Threaded Message Replies Spotted in Development
  7. Samsung Galaxy Watch 8 Series Could Launch With Gemini Voice Assistant
  8. Amazon Prime Day 2025 Sale: Samsung Galaxy Buds 3 Pro to Be Available at a Discounted Price
  9. Oppo Reno 14 Launched in New Finish With Temperature-Sensitive Colour Changing Rear Panel
  10. Microsoft Edge Can Now Load Websites Faster After Migration to WebUI 2.0, Says Company
Gadgets 360 is available in
Download Our Apps
App Store App Store
Available in Hindi
App Store
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.
Trending Products »
Latest Tech News »