Nvidia’s next-gen AI GPU is 4X faster than Hopper: Blackwell B200 GPU delivers up to 20 petaflops of compute and other massive improvements

Nvidia’s next-gen AI GPU is 4X faster than Hopper: Blackwell B200 GPU delivers up to 20 petaflops of compute and other massive improvements

Nvidia currently sits atop the AI world, with data center GPUs that everybody wants. Its Hopper H100 and GH200 Grace Hopper superchip are in serious demand and power many of the most powerful supercomputers in the world. Well, hang on to your seats, because Nvidia just revealed the successor to Hopper. CEO Jensen Huang today dropped the Blackwell B200 bomb, the next-generation data center and AI GPU that will provide a massive generational leap in computational power.

The Blackwell architecture and B200 GPU take over from H100/H200. There will also be a Grace Blackwell GB200 superchip, which as you can guess by the name will keep the Grace CPU architecture but pair it with the updated Blackwell GPU. We anticipate Nvidia will eventually have consumer-class Blackwell GPUs as well, but those may not arrive until 2025 and will be quite different from the data center chips.

Nvidia Blackwell GPU

At a high level, the B200 GPU more than doubles the transistor count of the existing H100. There are some caveats that we’ll get to momentarily, but B200 packs 208 billion transistors (versus 80 billion on H100/H200). It also provides 20 petaflops of AI performance from a single GPU — a single H100 had a maximum 4 petaflops of AI compute. And last but not least, it will feature 192GB of HBM3e memory offering up 8 TB/s of bandwidth.

Now, let’s talk about some of the caveats. First and foremost, as the rumors have indicated, Blackwell B200 is not a single GPU in the traditional sense. Instead, it’s comprised of two tightly coupled die, though they do function as one unified CUDA GPU according to Nvidia. The two chips are linked via a 10 TB/s NV-HBI (Nvidia High Bandwidth Interface) connection to ensure they can properly function as a single fully coherent chip.

The reason for this dual-die configuration is simple: Blackwell B200 will use TSMC’s 4NP process node, a refined version of the 4N process used by the existing Hopper H100 and Ada Lovelace architecture GPUs. We don’t have a ton of details on TSMC 4NP, but it likely doesn’t offer a major improvement in feature density, which means if you want a more powerful chip, you need a way to go bigger. That’s difficult as H100 was basically already a full reticle size chip — it has a die size of 814 mm2, where the theoretical maximum is 858 mm2.

Yorum gönder