The rumors are true: Microsoft has developed its own custom AI chip that can train large language models and potentially avoid a costly dependency on Nvidia. Microsoft has also developed its own Arm-based CPU for cloud workloads. Both custom silicon chips are designed to power Azure data centers and prepare the company and its enterprise customers for a future full of AI.
Microsoft’s Azure Maia AI chip and Arm-powered Azure Cobalt CPU are coming to market in 2024, as this year has seen increased demand for Nvidia’s H100 GPUs, which are widely used to train and power generative imaging tools and large language models . The demand for these GPUs is so high that some have even fetched more than $40,000 on eBay.
“Microsoft actually has a long history in silicon development,” explains Rani Borkar, head of Azure hardware systems and infrastructure at Microsoft, in an interview with The Verge. Microsoft helped develop silicon for the Xbox more than 20 years ago and even helped develop chips for its Surface devices. “This effort is based on that experience,” says Borkar. “In 2017, we began architecting the cloud hardware stack and set out to build our new custom chips.”
The new Azure Maia AI chip and Azure Cobalt CPU are both being developed in-house at Microsoft, combined with a major overhaul of the entire cloud server stack to optimize power, performance and cost. “We are rethinking cloud infrastructure for the age of AI and optimizing literally every layer of that infrastructure,” says Borkar.
The first two custom silicon chips Microsoft developed for its cloud infrastructure. Image: Microsoft
The Azure Cobalt CPU, named after the blue pigment, is a 128-core chip based on an Arm Neoverse CSS theme and customized for Microsoft. It is designed to support general cloud services on Azure. “We put a lot of thought into not only making it high performance, but also making sure we pay attention to power management,” explains Borkar. “We made some very conscious design decisions, including the ability to control performance and power consumption per core and on each individual virtual machine.”
Microsoft is currently testing its Cobalt CPU on workloads like Microsoft Teams and SQL Server and plans to make virtual machines available to customers for a variety of workloads next year. While Borkar wouldn’t be drawn into direct comparisons with Amazon’s Graviton 3 servers available on AWS, there should be some noticeable performance improvements over the Arm-based servers that Microsoft currently uses for Azure. “Our initial tests show that our performance is up to 40 percent better than current performance in our data centers that use commercial Arm servers,” says Borkar. Microsoft is not yet releasing full system specifications or benchmarks.
Microsoft’s Maia 100 AI accelerator, named after a bright blue star, is designed to run cloud AI workloads such as training and inference of large language models. It is used to power some of the company’s largest AI workloads on Azure, including parts of its multi-billion dollar partnership with OpenAI, where Microsoft runs all of OpenAI’s workloads. The software giant collaborated with OpenAI in the design and testing phase of Maia.
“We were excited when Microsoft first unveiled its designs for the Maia chip, and we worked together to refine it and test it with our models,” said Sam Altman, CEO of OpenAI. “Azure’s end-to-end AI architecture, now optimized all the way to silicon with Maia, paves the way for training more powerful models and making those models more cost-effective for our customers.”
Maia is manufactured using a 5-nanometer TSMC process and features 105 billion transistors – around 30 percent fewer than the 153 billion found in AMD’s own Nvidia competitor, the MI300X AI GPU. “Maia supports our first implementation of sub-8-bit data types, MX data types, to co-design hardware and software,” says Borkar. “This helps us support faster model training and inference times.”
Microsoft is part of a group that includes AMD, Arm, Intel, Meta, Nvidia and Qualcomm that are standardizing the next generation of data formats for AI models. Microsoft is building on the collaborative and open work of the Open Compute Project (OCP) to adapt entire systems to the needs of AI.
A test station for testing Microsoft’s Azure Cobalt System-on-Chip. Image: Microsoft
“Maia is Microsoft’s first fully liquid-cooled server processor,” reveals Borkar. “The goal was to enable higher server density with greater efficiency. As we rethink the entire stack, we are intentionally thinking about each layer so that these systems actually fit within the footprint of our current data center.”
This is critical for Microsoft to be able to spin up these AI servers more quickly without having to make space in data centers around the world. Microsoft has built a unique rack to house Maia server boards, complete with a “sidekick” liquid cooler that works like a radiator in your car or a fancy gaming PC to cool the surface of the Maia chips.
In addition to sharing MX data types, Microsoft also shares its rack designs with its partners so they can use them on systems with other integrated chips. But the Maia chip designs are not being distributed more widely; Microsoft is keeping them in-house.
Maia 100 is currently being tested on GPT 3.5 Turbo, the same model that supports ChatGPT, Bing AI workloads and GitHub Copilot. Microsoft is in the early stages of deployment and, similar to Cobalt, is not yet ready to release exact Maia specifications or performance benchmarks.
The Maia 100 server rack and the “Sidekick” cooling. Image: Microsoft
As such, it’s difficult to see exactly how Maia will compare to Nvidia’s popular H100 GPU, the recently announced H200, or even AMD’s latest MI300X. Borkar didn’t want to delve into comparisons, instead reiterating that partnerships with Nvidia and AMD remain critical to the future of Azure’s AI cloud. “Given the scale at which the cloud operates, it’s really important to optimize and integrate every layer of the stack to maximize performance, diversify the supply chain and, frankly, give our customers infrastructure choices says Borkar.
This supply chain diversification is important for Microsoft, especially since Nvidia is currently the main supplier of AI server chips and companies are competing to buy up those chips. It is estimated that OpenAI needed more than 30,000 of Nvidia’s older A100 GPUs to commercialize ChatGPT, so Microsoft’s own chips could help reduce the cost of AI for its customers. Microsoft also designed these chips for its own Azure cloud workloads, rather than to sell them to others, as Nvidia, AMD, Intel and Qualcomm do.
“I see this as more of a complement and not a competition,” emphasizes Borkar. “We have both Intel and AMD in our cloud computing today, and similarly in AI we are announcing AMD where today we already have Nvidia. These partners are very important to our infrastructure and we really want to give our customers the choice.”
You may have noticed the Maia 100 and Cobalt 100 naming conventions, which indicate that Microsoft is already developing second-generation versions of these chips. “This is a series, it’s not just 100 and done… but we won’t share our roadmaps,” says Borkar. It’s not yet clear how often Microsoft will deliver new versions of Maia and Cobalt, but given the speed of AI, I wouldn’t be surprised if a Maia 100 successor comes out at a similar pace to Nvidia’s H200 announcement (around 20 months). would.
What will now matter is how quickly Microsoft puts Maia into action to accelerate the rollout of its broader AI ambitions, and how these chips will impact the prices for using AI cloud services. Microsoft isn’t ready to talk about these new server prices yet, but we’ve already seen the company quietly launch its Copilot for Microsoft 365 at a premium of $30 per month per user.
Copilot for Microsoft 365 is currently limited to Microsoft’s largest customers. Enterprise users must commit to at least 300 users to be included on the list for the new AI-powered Office Assistant. As Microsoft pushes even more Copilot features and a rebranding of Bing Chat this week, Maia could soon help offset demand for the AI chips that power these new experiences.