Microsoft is also inviting developers and AI startups to explore model and workload optimisation with the new Maia 200 SDK.
Photo Credit: Microsoft
Microsoft says Maia 200 can run the largest AI models in existence, with room for bigger future models
Microsoft unveiled its newest artificial intelligence (AI) accelerator, Maia 200 chip, on Monday. It is a purpose-built chipset design for faster AI inference, and is said to cut the cost of running large language models (LLMs) at scale. The new enterprise-focused processor is the successor to the Maia 100, which was launched in 2023. The Maia 200 is currently being deployed in Microsoft's Azure cloud data centres, starting in the US. The company highlighted that its new chip will power the latest models, such as OpenAI's GPT 5.2.
In a blog post, the Redmond-based tech giant announced and detailed its latest AI chipset. Maia 200 is built on Taiwan Semiconductor Manufacturing Corporation's (TSMC) 3nm process, and each chip contains more than 140 billion transistors. Microsoft said the chips will also feature a custom memory and communication architecture tailored specifically for inference workloads. The advanced design helps maximise the speed at which the chip can process data and keep AI models “fed” with information.
A key part of Maia 200's performance comes from its support for low-precision compute formats such as 4-bit (FP4) and 8-bit (FP8) operations. These formats allow AI models to generate responses more quickly and with lower energy use compared with traditional higher-precision computing. Microsoft said Maia 200 delivers in excess of 10 petaFLOPS (quadrillions of floating-point operations per second) in FP4 mode and over 5 petaFLOPS in FP8 mode, making it well-suited for modern LLMs and other AI systems that are used in real-time applications.
Maia 200 also includes 216GB of high-bandwidth memory (HBM3e) with 7TBps bandwidth and 272MB of on-chip SRAM. High-bandwidth memory lets the chip quickly access and move large amounts of data, which is a common bottleneck in AI workloads. The addition of on-chip SRAM helps reduce delays when models need frequent access to smaller, critical data sets, improving responsiveness for inference tasks.
At the system level, Microsoft has designed Maia 200 to scale efficiently across large clusters. Each chip supports 2.8TBps bi-directional bandwidth, and groups of up to 6,144 accelerators can be connected together using standard Ethernet networking. This scalable architecture allows data centre operators to deploy many Maia 200 chips in a rack or across nodes, increasing the throughput available for demanding AI services while keeping power use and costs under control.
One of the central goals behind Maia 200 is to improve performance per dollar, a key metric for inference infrastructure where organisations pay for both compute and energy. Microsoft said Maia 200 delivers around 30 percent better performance per dollar than the hardware the company currently uses in its fleet.
Microsoft is currently previewing a Maia software development kit (SDK) that includes tools such as a Triton compiler, PyTorch support, an optimised kernel library and low-level programming support, enabling developers to build and tune models for the Maia 200 platform.
Catch the latest from the Consumer Electronics Show on Gadgets 360, at our CES 2026 hub.