DeepSeek and Tsinghua Developing Self-Improving AI Models

DeepSeek is calling these new models DeepSeek-GRM — short for “generalist reward modeling”.

By Saritha Rai, Bloomberg | Updated: 7 April 2025 13:38 IST

Highlights

DeepSeek is exploring ways make AI models more efficient
The aim is to bring AI models in alignment with human preferances
DeepSeek's AI revamp strategy uses fewer computing resources

DeepSeek roiled markets with its low-cost reasoning AI model back in January this year

Photo Credit: Reuters

DeepSeek is working with Tsinghua University on reducing the training its AI models need in an effort to lower operational costs.

The Chinese startup, which roiled markets with its low-cost reasoning model that emerged in January, collaborated with researchers from the Beijing institution on a paper detailing a novel approach to reinforcement learning to make models more efficient.

The new method aims to help artificial intelligence models better adhere to human preferences by offering rewards for more accurate and understandable responses, the researchers wrote. Reinforcement learning has proven effective in speeding up AI tasks in narrow applications and spheres. However, expanding it to more general applications has proven challenging — and that's the problem that DeepSeek's team is trying to solve with something it calls self-principled critique tuning. The strategy outperformed existing methods and models on various benchmarks and the result showed better performance with fewer computing resources, according to the paper.

China’s DeepSeek Unveils Latest Update in Race With OpenAI

DeepSeek is calling these new models DeepSeek-GRM — short for “generalist reward modeling” — and will release them on an open source basis, the company said. Other AI developers, including Chinese tech giant Alibaba Group Holding. and San Francisco-based OpenAI, are also pushing into a new frontier of improving reasoning and self-refining capabilities while an AI model is performing tasks in real time.

Menlo Park, California-based Meta Platforms Inc. released its latest family of AI models, Llama 4, over the weekend and marked them as its first to use the Mixture of Experts (MoE) architecture. DeepSeek's models rely significantly on MoE to make more efficient use of resources, and Meta benchmarked its new release against the Hangzhou-based startup. DeepSeek hasn't specified when it might release its next flagship model.

China’s DeepSeek Unveils Latest Update in Race With OpenAI

(This story has not been edited by NDTV staff and is auto-generated from a syndicated feed.)

DeepSeek-OCR Open-Source AI Model Changes How AI Models Read and Process Plain Text

21 October 2025

DeepSeek R2 Launch Stalled as CEO Balks at Progress: Report

27 June 2025

OpenAI’s o3 Outsmarts Rivals in AI Strategy Battle, Called ‘A Master of Deception’ by AI Researcher

12 June 2025

Apple Claims AI Reasoning Models Suffer From ‘Accuracy Collapse’ When Solving Complex Problems

9 June 2025

DeepSeek Unveils Update to R1 Model as AI Race Heats Up

28 May 2025

DeepSeek and Tsinghua Developing Self-Improving AI Models

Related Stories