Six months after unveiling the new GH100 accelerator chip and the H100 GPU card that carries it, as well as the L40 GPU SOC, Nvidia has detailed the products it will market around it at the end of March. Above all, the versions of the GPU cards will be adjusted and the DGX and OVX servers that embed them will be updated.
Unfortunately, Nvidia’s GTC Spring Event didn’t kick off the commercialization of much-anticipated products, including the high-end version of the H100s or the Grace processor.
For the record, the GH100/H100 pair succeeds the GA100/A100 pair with the promise of running machine learning engines nine times faster and recognition engines thirty times faster. In addition, the GH100 chip is equipped with new Transformer Engine circuits, which according to Nvidia should now accelerate generative AI algorithms, i.e. those on which services such as ChatGPT are based.
Unlike the A100, the H100 GPU can’t be dismissed as a graphics accelerator. To fulfill this function, the L40 GPU now exists. It is more versatile and can be used in workstations to speed up the display (graphics cards RTX4000 and RTX6000), as well as in servers to speed up simulations based on 3D objects.
A “duplicate” H100 NVL card
So far, NVidia has only marketed H100 cards in the CNX version, i.e. in the form of a PCIe card with two NVLink network connections. These are individually capable of communicating at 300 Gbps with other H100 cards installed in other servers. The current version of the card has 80 GB of RAM.
Nvidia now offers this card in a “dual” H100 NVL model. It’s actually two H100s installed on top of each other on the same PCIe card, always communicating with each other via their NVLink network. In this case, each chip of the H100 NVL Dual card is better than that of the H100 CNX because it’s actually two H100 SXM GPUs glued back to back.
The H100 SXM version is the one everyone has been waiting for. It would be available in small quantities – for a limited number of companies or even only for hyperscalers. The H100 SXM version replaces the PCIe connector (on the edge of the card) with an NVLink socket (under the GPU) capable of communicating at 900 Gbps. SXM GPUs are installed on an NVSwitch backplane card that allows the connection of up to eight H100 GPUs.
Servers with an NVSwitch backplane are traditionally referred to as HGX machines by third parties. Nvidia itself markets such servers under the name DGX. As of this writing, no new HGX or DGX servers appear to be available with H100 maps. While it would now be possible in the HGX and DGX servers marketed to date to replace the A100 cards with H100s, Nvidia states that its DGX H100s will feature the latest Xeon processors that Intel announced earlier this year has brought to market.
The GH100 chips of the H100 NVL and SXM boards operate at 1.98 GHz and communicate with their onboard HBM3 memory at a speed of 3.35 TB/s. The H100 CNX card’s GH100 chip operates at 1.75 GHz and communicates with the onboard HBM2 memory at a speed of 2 TB/s. In addition, the chip on the CNX card has about 15% fewer processor cores than on the NVL and SXM cards.
The NVL card would have 94GB RAM twice, while the other two cards have 80GB RAM.
Many products are still pending
Eventually it will be possible to install a card with one or two Grace processors, or even a card with a Grace processor and a GH100 chip, on an NVSwitch card socket. This long-awaited ARM processor, developed by NVidia, must contain 72 cores capable of communicating with each other at a speed of 3.2 TB/s. Grace’s interest is in excelling where a GPU can’t: executing application code sequentially. Grace has long been announced by Nvidia.
The new information of this month of March, rather disappointing, is that the chip would finally go into production. This implies that the announcements made over the previous two years were ultimately premature.
Likewise, the BlueField-3 DPU that accelerates networking (including versions of NVMe-over-Fabric) isn’t “finally available” either. NVidia was pleased to announce that it was “finally in production”.
In this context, the announcement of a DGX Quantum server should be treated with caution. More of a concept than a truly functional product, the DGX Quantum aims to serve as a bridge between conventional processing and others run by a quantum computer. Let’s remember, the latter does not exist yet. Or in such an embryonic state that it is far from being able to compete with a conventional computer.
According to Nvidia’s marketing presentations, the DGX Quantum will work with Grace processors – that is, in development – and will have tools to distribute the algorithms between Grace and a quantum processor. And also a CUDA development kit suitable for quantum processors. The rest of Nvidia’s documentation on the subject just paraphrases what everyone has been writing about quantum noise for years.
The L40 GPU is available in the economical L4
Finally, with regard to the L40 GPU, Nvidia does without the already marketed PCIe variant (also called “CNX”) in a more economical RTX-L4 model because it has about half as many computing units.
The L40 card’s chip is a 1 GHz AD102 that communicates with its 24 GB of GDDR6 memory at a speed of 864 GB/s over a 384-bit bus. The chip on the L4 card is a 795MHz AD104, which has the same storage capacity but only communicates at a speed of 504GB/s over a 192-bit bus.
According to Nvidia, the L4 cards would be better suited to speed up video processing, while the L40 remains best suited for mathematical calculations in simulations.
Nvidia markets its graphics cards separately or within OVX servers. These are now developing into “OVX 3.0”. Essentially, it’s just an update for machines based on the latest generations of Intel Xeon processors.
DGX Cloud to rent electricity instead of buying it
Instead of paying $200,000 to buy a DGX server with eight H100 cards, it’s now possible to rent it in the cloud for $37,000 a month. The challenge is to meet the needs of organizations that only have a one-time need for computing power to run a machine learning engine on a limited data set.
“To put it simply, we offer the ultimate supercomputer in push-button mode: You open the service in your browser, load your calculation program there, indicate where the data to be processed is located, press the Go button and we will take care of everything else” , summarizes Manuvir Das, Head of Enterprise Products at Nvidia, during the Nvidia GTC Spring Event press conference.
The offering will be marketed either directly by Nvidia for DGX servers hosted in Equinix data centers, or through OCI, Oracle’s public cloud. Finally, Microsoft Azure and Google GCP are also supposed to host this service. According to Nvidia, it can be assumed that integrators will subscribe to the offer in order to offer it to their customers, with the offer being multi-client capable, i.e. able to manage and bill several groups of simultaneous users. In any case, the resources used are dedicated, ie there is no need to worry about a random drop in performance depending on the number of applications running at the same time.
AWS and Azure, the two leading public cloud infrastructure providers, cannot be outdone and have announced that they have already deployed server farms with H100 cards. Unlike cloud DGX offerings that replicate physical DGX servers online, AWS and Azure offerings are marketed as either bare-metal VM farms or relaxing off-the-shelf AI services on Nvidia’s latest accelerators.
Of course, the owner of a local DGX server can of course extend its performance with online DGXs from OCI or Equinix – the software is the same – while the offerings from AWS and Azure are standalone.
In addition to the infrastructure itself, Nvidia also markets ready-made functional services in the form of PaaS resources that can be used in other applications. These are NeMo for interpreting and generating text content, Picasso for interpreting and generating images, videos or 3D objects and BioNeMO for creating scientific reports from research data.