Epoch AI Launches FrontierMath AI Benchmark to Test Capabilities of AI Models

FrontierMath is a benchmark for evaluating advanced mathematical reasoning in AI.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 12 November 2024 18:47 IST
Highlights
  • FrontierMath was created in collaboration with over 60 mathematicians
  • The test comprises algebraic geometry to Zermelo–Fraenkel set theory
  • The company said older benchmarks do not truly test AI capabilities

The problems in FrontierMath are new and unpublished to avoid data contamination

Photo Credit: Epoch AI

Epoch AI, a California-based research institute launched a new artificial intelligence (AI) benchmark last week. Dubbed FrontierMath, the new AI benchmark tests large language models (LLMs) on their capability of reseasoning and mathematical problem-solving. The AI firm claims that existing math benchmarks are not very useful due to factors like data contamination and AI models scoring very high scores on them. Epoch AI claims that even the leading LLMs have scored less than two percent on the new benchmark.

Epoch AI Launches FrontierMath Benchmark

In a post on X (formerly known as Twitter), the AI firm explained that it collaborated with more than 60 mathematicians to create hundreds of origins and unpublished math problems. Epoch AI claims that these questions would take even mathematicians hours to solve. The reason behind developing the new benchmark was cited as the limitations with existing benchmarks such as GSM8K and MATH, where AI models generally score a high point.

Advertisement

The company claimed that the high scores achieved by LLMs are largely due to data contamination. This means the questions somehow were already fed into the AI models, resulting in them easily solving the questions.

FrontierMath solves the problem by including new problems that are unique and have not been published anywhere, mitigating the risks associated with data contamination. Further, the benchmark includes a wide range of questions including computationally intensive problems in number theory, real analysis, and algebraic geometry, as well as topics such as Zermelo–Fraenkel set theory. The AI firm says all the questions are “guess proof”, meaning they cannot be solved accidentally without strong reasoning.

Advertisement

Epoch AI highlighted that to measure AI's aptitude, benchmarks should be created on creative problem-solving where the AI has to maintain reasoning over multiple steps. Notably, many industry veterans believe that the existing benchmarks are not sufficient to correctly measure how advanced an AI model is.

Responding to the new benchmark in a post, Noam Brown, an OpenAI researcher who was behind the company's o1 model welcomed the new benchmark and said, “I love seeing a new eval with such low pass rates for frontier models.”

 

Get your daily dose of tech news, reviews, and insights, in under 80 characters on Gadgets 360 Turbo. Connect with fellow tech lovers on our Forum. Follow us on X, Facebook, WhatsApp, Threads and Google News for instant updates. Catch all the action on our YouTube channel.

Advertisement
Popular Mobile Brands
  1. Oppo Find X9 Ultra With 200-Megapixel Periscope Camera Launched Globally
  2. Poco M8s 5G Debuts Globally With 7,000mAh Battery: See Price, Features
  3. Vivo X300 FE Roundup: Expected Price in India, Specifications
  4. Oppo Find X9s Pro Launched With 200-Megapixel Cameras: See Price, Features
  5. These Vivo Smartphones Will Cost More in India Due to the Latest Price Hike
  6. Motorola Edge 70 Fusion Review
  7. Vivo Y6t Launched With 6,500mAh Battery, Snapdragon 4 Gen 2 SoC
  8. Apple's iOS 26.5 Beta 3 Update for iPhone Rolls Out: Here's What's New
  1. Oppo Enco Clip 2 With Open-Ear Design, Up to 40 Hours Total Battery Life Launched Alongside Oppo Watch X3 Mini
  2. Vivo Y6t Launched With 6,500mAh Battery, Snapdragon 4 Gen 2 SoC: Price, Specifications
  3. OCBC Partners Lion Global Investors and DigiFT to Launch Tokenised Gold Fund With GOLDX Token
  4. Oppo Pad 5 Pro Launched With 13,380mAh Battery, Snapdragon 8 Elite Gen 5 SoC Alongside Oppo Pad Mini: Price, Features
  5. Redmi K90 Max Launched With Dimensity 9500 SoC, 8,550mAh Battery and Active Cooling Fan: Price, Specifications
  6. Oppo Find X9 Ultra Launched With Snapdragon 8 Elite Gen 5 SoC, 200-Megapixel Periscope Camera: Price, Specifications
  7. Oppo Find X9s Pro Launched With 200-Megapixel Cameras, 7,025mAh Battery: Price, Specifications
  8. OnePlus Ace 6 Ultra Geekbench Listing Reveals MediaTek Dimensity 9500 Chip, 16GB RAM
  9. Motorola Edge 70 Pro+ Leaked Renders Hint at Design, Five Colour Options
  10. Deezer Claims 75,000 AI-Generated Songs Are Being Uploaded to the Platform Daily
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.