Hugging Face Showcases How Test-Time Compute Scaling Can Help SLMs Outperform Larger AI Models

The researchers were able to improve the capabilities of open AI models using Google DeepMind’s study.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 24 December 2024 17:37 IST
Highlights
  • Hugging Face was able to make the Llama 3B model outperform the 70B model
  • Test-time compute scaling allows models to “think longer” on problems
  • The researchers reverse-engineered closed models to develop the technique

Reasoning models such as OpenAI’s o1 use test-time scaling to improve their output

Photo Credit: Hugging Face

Hugging Face shared a new case study last week showcasing how small language models (SLMs) can outperform larger models. In the post, the platform's researchers claimed that instead of increasing the training time of artificial intelligence (AI) models, focusing on the test-time compute can show enhanced results for AI models. The latter is an inference strategy that allows AI models to spend more time on solving a problem and offers different approaches such as self-refinement and searching against a verifier that can improve their efficiency.

How Test-Time Compute Scaling Works

In a post, Hugging Face highlighted that the traditional approach to improving the capabilities of an AI model can often be resource-intensive and extremely expensive. Typically, a technique dubbed train-time compute is used where the pretraining data and algorithms are used to improve the way a foundation model breaks down a query and gets to the solution.

Alternatively, the researchers claimed that focusing on test-time compute scaling, a technique where AI models are allowed to spend more time solving a problem and letting them correct themselves can show similar results.

Advertisement

Highlighting the example of OpenAI's o1 reasoning-focused model, which uses test-time compute, the researchers stated that this technique can let AI models display enhanced capabilities despite making no changes to the training data or pretraining methods. However, there was one problem. Since most reasoning models are closed, there is no way to know the strategies that are being used.

The researchers used a study by Google DeepMind and reverse engineering techniques to unravel how exactly LLM developers can scale test-time compute in the post-training phase. As per the case study, just increasing the processing time does not show significant improvement in outputs for complex queries.

Instead, the researchers recommend using a self-refinement algorithm that allows AI models to assess the responses in subsequent iterations and identify and correct potential errors. Additionally, using a verifier that models can search against can further improve the responses. Such verifiers can be a learned reward model or hard-coded heuristics.

Advertisement

More advanced techniques would involve a best-of-N approach where a model generates multiple responses per problem and assigns a score to judge which would be better suited. Such approaches can be paired with a reward model. Beam search, which prioritises step-by-step reasoning and assigning scores for each step, is another strategy highlighted by researchers.

By using the abovementioned strategies, the Hugging Face researchers were able to use the Llama 3B SLM and make it outperform Llama 70B, a much larger model, on the MATH-500 benchmark.

 

Catch the latest from the Consumer Electronics Show on Gadgets 360, at our CES 2026 hub.

Advertisement

Related Stories

Popular Mobile Brands
  1. Oppo A6 5G Launched in India With 7,000mAh Battery at This Price
  2. Here's When the Realme P4 Power 5G Will Launch in India
  3. Redmi Note 15 Pro Series Might Launch in India With These Storage Options
  4. New Dark Matter Simulation Could Change How Galaxies Are Thought to Evolve
  5. Google Pixel 10a Leak Suggests No Price Hike Over Pixel 9a
  6. Bindiya Ke Bahubali Season 2 OTT Release Date: Know Everyting About Cast, Plot, and Mo
  7. Google Adds New Feature in Gemini App for Providing Quick Replies: Report
  8. Sony to Cede Control of Bravia TVs to China's TCL Electronics
  9. Vivo X200T With Zeiss Cameras to Launch in India on This Date
  10. Motorola Edge 70 Fusion Leak Reveals Full Specifications Ahead of Launch
  1. Scientists Find Clue to High-Temperature Superconductivity in Quantum Materials
  2. New Dark Matter Simulation Could Change How Galaxies Are Thought to Evolve
  3. SpaceX Adds 29 More Starlink Satellites in Rapid Falcon 9 Launch From Florida
  4. Sony to Cede Control of Bravia TVs to China’s TCL Electronics
  5. Adobe Premiere Integrated With AI-Powered Firefly Platform; New After Effects Features Rolling Out
  6. Samsung Upgrades Bixby With Perplexity-Powered AI Features, Takes Page Out of Apple’s Playbook
  7. Google Reportedly Working On New Live Features and Agentic Mode for Gemini Assistant
  8. Redmi Note 15 Pro+, Redmi Note 15 Pro RAM and Storage Options, Key Specifications Leaked Ahead of India Launch
  9. Eddington Arrives on OTT: What You Need to Know About Joaquin Phoenix and Pedro Pascal Starrer Thriller
  10. Red Magic 11 Air Launched With Snapdragon 8 Elite, RedCore R4 Gaming Chip and 7,000mAh Battery
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2026. All rights reserved.