Microsoft Azure Unveils Nvidia GB300 NVL72 Cluster Built for OpenAI’s AI Workloads

Microsoft said it has delivered more than 4,600 Nvidia GB300 NVL72, featuring its Blackwell Ultra GPUs.

Written by Akash Dutta, Edited by Rohan Pal | Updated: 10 October 2025 15:46 IST

Highlights

The cluster is connected through the Nvidia InfiniBand network
Microsoft has procured a total of 72 Nvidia Blackwell Ultra GPUs
The entire architecture is built on a rack-scale system

Microsoft said it has delivered more than 4,600 Nvidia GB300 NVL72, featuring its Blackwell Ultra GPUs.

Photo Credit: Microsoft

Microsoft Azure unveiled the new NDv6 GB300 virtual machine (VM) series on Thursday. Claimed to be the industry's first supercomputing-scale production cluster of the Nvidia GB300 NVL72 systems, it will be made available for OpenAI's “most demanding artificial intelligence (AI) inference workloads.” The Redmond-based tech giant says these VMs are optimised for reasoning models, agentic AI systems, and multimodal generative AI workflows. Interestingly, with the new architecture, Azure has upgraded from ND GB200 v6 VMs, which were introduced less than a year ago.

Microsoft Azure Upgrades Cloud Computing Stack With New Nvidia Hardware

In a blog post, Microsoft's cloud division, Azure, announced the creation of its new virtual machines. The cluster is powered by more than 4,600 Nvidia GB300 NVL72 systems, which feature the company's Blackwell Ultra GPUs connected via its InfiniBand network. Microsoft claims that the cluster will enable model training in weeks instead of months and deliver high throughput for inference workloads. It is said to support training models with “hundreds of trillions of parameters.”

Breaking down the system, the cloud division has implemented a rack-scale architecture, where each rack contains 18 virtual machines with a total of 72 GPUs and 36 Nvidia Grace CPUs. Each GPU can communicate at 800GBps per GPU via Nvidia's Quantum-X800 InfiniBand, which uses two GB200 NVL72 systems.

Asus ROG Xbox Ally and Ally X Launched in India With These Features

Inside each rack, the chips are connected with ultra-fast links that can move 130TB of data per second. There's a huge 37TB of very fast memory to handle massive calculations. Overall, it can perform up to 1,440 petaflops (PFLOPS) of AI calculations per second using FP4 Tensor Cores, making it one of the fastest systems in the world for AI tasks.

Within each rack, NVLink and NVSwitch, special high-speed connections that let GPUs talk to each other extremely quickly, allow 37TB of memory to exchange data at up to 130TB per second. This tight integration means AI models can process larger tasks faster, handle longer sequences of information, and run complex agentic (AI that can make decisions on its own) or multimodal (AI that can process multiple types of data like text, images, and audio together) workloads with minimal delays.

Microsoft says to expand beyond a single rack, Azure uses a full fat-tree, non-blocking network, a networking design that ensures all racks can communicate without slowdowns, powered by InfiniBand. This allows AI training to scale efficiently across tens of thousands of GPUs while keeping communication delays minimal. By reducing synchronisation overhead (the time GPUs spend waiting for each other), GPUs spend more time computing, helping researchers train massive AI models faster and at lower cost.

Azure's co-designed stack combines custom protocols, collective libraries, and in-network computing to ensure the network is reliable and fully utilised. Additionally, Microsoft's cooling systems use standalone heat exchanger units along with facility cooling to reduce water use. On the software side, the company says it has reengineered stacks for storage, orchestration, and scheduling.

Microsoft Azure Unveils Nvidia GB300 NVL72 Cluster Built for OpenAI’s AI Workloads

Microsoft Azure Upgrades Cloud Computing Stack With New Nvidia Hardware

Related Stories