Microsoft said it has delivered more than 4,600 Nvidia GB300 NVL72, featuring its Blackwell Ultra GPUs.
Microsoft said it has delivered more than 4,600 Nvidia GB300 NVL72, featuring its Blackwell Ultra GPUs.
Photo Credit: Microsoft
Microsoft Azure unveiled the new NDv6 GB300 virtual machine (VM) series on Thursday. Claimed to be the industry's first supercomputing-scale production cluster of the Nvidia GB300 NVL72 systems, it will be made available for OpenAI's “most demanding artificial intelligence (AI) inference workloads.” The Redmond-based tech giant says these VMs are optimised for reasoning models, agentic AI systems, and multimodal generative AI workflows. Interestingly, with the new architecture, Azure has upgraded from ND GB200 v6 VMs, which were introduced less than a year ago.
In a blog post, Microsoft's cloud division, Azure, announced the creation of its new virtual machines. The cluster is powered by more than 4,600 Nvidia GB300 NVL72 systems, which feature the company's Blackwell Ultra GPUs connected via its InfiniBand network. Microsoft claims that the cluster will enable model training in weeks instead of months and deliver high throughput for inference workloads. It is said to support training models with “hundreds of trillions of parameters.”
Breaking down the system, the cloud division has implemented a rack-scale architecture, where each rack contains 18 virtual machines with a total of 72 GPUs and 36 Nvidia Grace CPUs. Each GPU can communicate at 800GBps per GPU via Nvidia's Quantum-X800 InfiniBand, which uses two GB200 NVL72 systems.
Inside each rack, the chips are connected with ultra-fast links that can move 130TB of data per second. There's a huge 37TB of very fast memory to handle massive calculations. Overall, it can perform up to 1,440 petaflops (PFLOPS) of AI calculations per second using FP4 Tensor Cores, making it one of the fastest systems in the world for AI tasks.
Within each rack, NVLink and NVSwitch, special high-speed connections that let GPUs talk to each other extremely quickly, allow 37TB of memory to exchange data at up to 130TB per second. This tight integration means AI models can process larger tasks faster, handle longer sequences of information, and run complex agentic (AI that can make decisions on its own) or multimodal (AI that can process multiple types of data like text, images, and audio together) workloads with minimal delays.
Microsoft says to expand beyond a single rack, Azure uses a full fat-tree, non-blocking network, a networking design that ensures all racks can communicate without slowdowns, powered by InfiniBand. This allows AI training to scale efficiently across tens of thousands of GPUs while keeping communication delays minimal. By reducing synchronisation overhead (the time GPUs spend waiting for each other), GPUs spend more time computing, helping researchers train massive AI models faster and at lower cost.
Azure's co-designed stack combines custom protocols, collective libraries, and in-network computing to ensure the network is reliable and fully utilised. Additionally, Microsoft's cooling systems use standalone heat exchanger units along with facility cooling to reduce water use. On the software side, the company says it has reengineered stacks for storage, orchestration, and scheduling.
For the latest tech news and reviews, follow Gadgets 360 on X, Facebook, WhatsApp, Threads and Google News. For the latest videos on gadgets and tech, subscribe to our YouTube channel. If you want to know everything about top influencers, follow our in-house Who'sThat360 on Instagram and YouTube.