Intel has announced new processors with high-bandwidth memory (HBM) geared toward high-performance computing (HPC), supercomputing, and artificial intelligence (AI).
The products are known as the Xeon CPU Max series and GPU Max series. The chips are based on existing technology; the CPU is 4th Generation Xeon Scalable, aka Sapphire Rapids, and the GPU is Ponte Vecchio, the data center version of Intel’s Xe GPU technology.
The difference is that the two processors come with HBM on the processor die rather than relying only on standard DRAM. HBM is considerably faster than DDR4 or DDR5 memory and sits on the processor die right next to the CPU/GPU core with a high speed interconnect, rather than on memory sticks like DDR memory.
“If you look at the overall workloads in the HPC and AI domain, there’s a wide diversity of workloads,” said Jeff McVeigh, vice president and general manager of supercomputing at Intel. “Traditionally, there’s been two routes up this summit. One is the CPU route and the other is the GPU route, and they each have their own obstacles. And our goal is to really go forward and address them holistically.”
The focus of CPU Max and GPU Max is around maximizing the bandwidth, maximizing the compute, and maximizing the capabilities and possibilities that they offer for addressing the breadth of workloads, said McVeigh.
CPU Max comes in three server configurations. The first is without DRAM, so the only memory in the system is 64GB of HBM on the CPU Max chip. This is how Japan’s Fugaku supercomputer, for some time one of the fastest supercomputers in the world, operates. In a two-socket system, that’s 128GB of memory, which McVeigh said “for many applications and workloads is sufficient.” In this use scenario, applications can run unchanged.
The second configuration is called HBM flat mode, which combines HBM in the CPU package with standard DDR5 memory sticks in the system. With both HBM and DDR software needs to be optimized to move data between those different memory regions.
The third configuration is HBM caching mode, where the HBM acts as a cache for the DDR memory in the system. In this mode, no software code changes are required. “You might want to do some tuning to utilize that very large cache that you now have, but you don’t have to when you get immediate benefits,” said McVeigh.
GPU Max also comes in three configurations; the 1100, 1350, and 1550 models. The 1100 is a 300-Watt double-wide PCIe card with 56 Xe cores and 48GB of HBM2e memory. Multiple cards can be connected via Intel Xe Link bridges.
The other two configurations employ the Open Compute Project (OCP) accelerator module, known as OAM, which is a faster alternative interface to PCIe cards.
The 1350 GPU is a 450-Watt OAM module with 112 Xe cores and 96GB of HBM. The 1550 GPU is a 600-Watt OAM module with 128 Xe cores and 128GB of HBM.
The PCI Express card is great for use in a standard server even workstation systems, but the OAM modules are really oriented for higher density environments, said McVeigh. He said Intel has got a number of system designs being developed by OEM and system solution providers, who will bring out servers with OAM starting in 2023.
Intel is building a supercomputer for the Argonne National Lab with CPU Max and GPU Max processors that, when it goes online in 2023, will exceed 2 exaFLOPs of performance. That’s double the speed of Frontier, the current leader in the supercomputer race. McVeigh said it’s the combination of the two processors that makes it happen.
“You’d argue, well, we’ve offloaded everything to the GPU, we don’t need that highest end CPU, right? We don’t need the HBM memory, right? Wrong. We can gain significant performance improvement by turning on the HBM that’s integrated within the CPU because there’s still a lot of code that runs on the CPU, even if we’ve offloaded some of our larger kernels off to the GPU,” he said.
The new processors have already begun shipping to initial customers, including Argonne. The Max Series is slated to launch in January 2023.