DeepSeek-V3 Open-Source AI Model With Mixture-of-Experts Architecture Released

The model features 671B parameters, much higher than Meta Llama 3.1 model's 405B parameters.

Advertisement
Written by Akash Dutta, Edited by Siddharth Suvarna | Updated: 27 December 2024 16:38 IST
Highlights
  • DeepSeek-V3 was pre-trained on 14.8 trillion tokens
  • The AI model also comes with advanced reasoning capabilities
  • It scored 87.1 percent on the MMLU benchmark
DeepSeek-V3 Open-Source AI Model With Mixture-of-Experts Architecture Released

The AI model adopts Multi-head Latent Attention (MLA) and DeepSeekMoE architectures

Photo Credit: DeepSeek

DeepSeek, a Chinese artificial intelligence (AI) firm, released the DeepSeek-V3 AI model on Thursday. The new open-source large language model (LLM) features a massive 671 billion parameters, surpassing the Meta Llama 3.1 model which has 405 billion parameters. Despite its size, the researchers claimed that the LLM is focused towards efficiency with its mixture-of-expert (MoE) architecture. Due to this, the AI model can only activate specific parameters relevant to the task provided and ensure efficiency and accuracy. Notably, it is a text-based model and does not have multimodal capabilities.

DeepSeek-V3 AI Model Released

The open-source DeepSeek-V3 AI model is currently being hosted on Hugging Face. According to the listing, the LLM is geared towards efficient inference and cost-effective training. For this, the researchers adopted Multi-head Latent Attention (MLA) and DeepSeekMoE architectures.

Essentially, the AI model only activates the parameters which are relevant to the topic of the prompt, ensuring faster processing and higher accuracy compared to typical models of this size. Pre-trained on 14.8 trillion tokens, the DeepSeek-V3 uses techniques such as supervised fine-tuning and reinforcement learning to generate high-quality responses.

The Chinese firm claimed that despite its size, the AI model was fully trained in 2.788 million hours with the Nvidia H800 GPU. DeepSeek-V3's architecture also includes a load-balancing technique to minimise performance degradation. This technique was first used on its predecessor.

Advertisement

Coming to performance, the researchers shared evals from internal testing of the model and claimed that it outperforms Meta Llama 3.1 and Qwen 2.5 models on the Big-Bench High-Performance (BBH), Massive Multitask Language Understanding (MMLU), HumanEval, MATH, and several other benchmarks. However, these are currently not verified by third-party researchers.

One of the main highlights of the DeepSeek-V3 is its massive size of 671 billion parameters. While larger models exist, for example, the Gemini 1.5 Pro has one trillion parameters, such size in the open source space is rare. Prior to this, the largest open-source AI model was Meta's Llama 3.1 with 405 billion parameters.

Advertisement

At present, DeepSeek-V3's code can be accessed by its Hugging Face listing under an MIT license for personal and commercial usage. Additionally, the AI model can also be tested via the company's online chatbot platform. Those looking to build using the AI model can also access the API.

 

For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.

Advertisement
Popular Mobile Brands
  1. Microsoft Wants Websites to Have an AI-Powered Natural Language Interface
  2. Google's New Beam Video Communication Platform Can Turn 2D Video Into 3D
  3. Google I/O 2025: Here Are All the Major AI Announcements
  4. Nothing Phone 3 Confirmed to Launch Globally in July
  5. iQOO Neo 10 Pro+ With Snapdragon 8 Elite, 6,800mAh Battery Launched
  6. Gemini 2.5 Series Gets Improved Capabilities and a Deep Think Mode
  7. Asus ExpertBook P3 Series Launched at Computex 2025
  8. Apple WWDC 2025 Scheduled From June 9 to June 13: All You Need to Know
  9. Retro OTT Release Reportedly Revealed: When and Where to Watch it Online?
  10. Infinix GT 30 Pro Leaked Images Suggest RGB Lighting, Other Design Elements
  1. HP OmniStudio X All-in-One PC With Intel Core Ultra 7 CPU Launched in India: Price, Specifications
  2. Android 16 Release: Everything You Can Expect from Google’s Upcoming OS Update
  3. Google I/O 2025: Everything Announced From Gemini 2.5, AI Mode to Project Astra
  4. Asus ExpertBook P3 Series With AMD Ryzen AI 7 350 Processor Launched at Computex 2025
  5. Tesla on Track to Launch Robotaxi Trial in Austin, Texas, by June End, Musk Says
  6. Stellar Blade Sequel Confirmed by Shift Up, Launch Planned Before 2027
  7. Epic Games' Fortnite Returns to Apple App Store in US After Nearly Five Years
  8. Amazon's Drones Can Now Deliver New Categories of Devices Like iPhone, AirPods and More
  9. Infinix GT 30 Pro Leaked Images Suggest RGB Lighting, Colour Options Ahead of Global Debut
  10. Bitcoin Surges Past $107,000 for First Time Since January as Altcoins Rally
Gadgets 360 is available in
Download Our Apps
Available in Hindi
© Copyright Red Pixels Ventures Limited 2025. All rights reserved.