Gaudi2

Introduction

Gaudi2 is Intel’s second-generation deep learning accelerator, developed by Habana Labs (now part of Intel). Unlike traditional GPUs, Gaudi2 has been designed from the ground up for large-scale AI training. Each device is powered by Habana Processing Units (HPUs), its purpose-built AI training cores. The memory-centric architecture and Ethernet-based scale-out enable efficient training of today’s large and complex models, while offering a favorable power-to-performance ratio. The platform provides 96 GB of on-chip high-bandwidth memory per device, together with 24×100 Gbps standard Ethernet interfaces. This combination eliminates the need for proprietary interconnects and allows flexible integration into existing cluster infrastructures. On the FTP, we currently host a single Gaudi2 node equipped with 8 HL-225 HPUs, available for researchers and developers to evaluate distributed AI training.

Key Features

Memory-Centric Design

Gaudi2 features 96 GB of HBM2E (High Bandwidth Memory 2E) with 2.45 TB/s bandwidth, providing the fast memory access essential for large model training. Unlike external DRAM, HBM2E is physically stacked on the chip close to the compute cores, which reduces latency and power consumption. Tom’s Hardware has a nice article explaining HBM.

Ethernet-Based Scaling

Instead of using proprietary interconnects, Gaudi2 integrates 24×100 Gbps RoCE v2 (RDMA over Converged Ethernet) network interfaces directly on-chip. RoCE v2 enables remote direct memory access between nodes across standard Ethernet, allowing data to move directly between device memories without involving the CPU. This reduces latency, lowers CPU overhead, and provides a combined networking capacity of 2.4 Tbps per accelerator. Because it relies on standard Ethernet, distributed AI training becomes more flexible, cost-effective, and easier to deploy in existing cluster environments.

Warning

Please note that only one Gaudi2 node with 8 HPUs is currently available on FTP.

Framework Compatibility

Gaudi2 supports popular AI frameworks like PyTorch through the SynapseAI software stack, with TensorFlow support was deprecated after version 1.15. The hardware integrates seamlessly with scheduling systems such as SLURM. Check our Getting Started, for a detailed way to access and use the Gaudi2 Node on our FTP cluster.

Application Areas

Gaudi2 is particularly suited for:

Natural Language Processing (NLP)
Computer Vision (CV)
Large Language Model (LLM) training
Generative AI models (e.g., diffusion-based image synthesis)

Practical tutorials and model examples are available in the Gaudi Tutorials and Examples section.

Software Stack: SynapseAI

SynapseAI is Habana Labs’ comprehensive software ecosystem for Gaudi processors, providing everything needed to program, optimize, and run machine learning workloads efficiently.

Framework Integration

PyTorch support: Full compatibility with PyTorch through optimized plugins
TensorFlow: Support deprecated after SynapseAI version 1.15

Optimized Libraries

Pre-optimized computation kernels for matrix operations and convolutions
Runtime software for scheduling, memory management, and multi-processor communication
Development tools, including profilers, debuggers, and performance analyzers

GitHub Resources

Habana Labs GitHub

How to Access

Access to Gaudi2 is currently possible through our Future Technology Platform (FTP). You need to contact support to get access, please ensure you use FTP in the subject and mention you need to access Gaudi2. For this, an account is required (usually a GWDG account, or an AcademicCloudID for external users), which then needs to be explicitly enabled by the admins to be able to access the FTP nodes. For more information, check our documentation on getting an account.

Access requests currently run through KISSKI (researchers and companies) and NHR (researchers at Universities only). Please consult their documentation and eventually request a project to test and utilize Gaudi2. If you have related questions, you can also reach out through one of our support channels.

After gaining access to FTP, log into FTP and check if you have access to Gaudi2.

scontrol show res Gaudi2

Once you have confirmed access follow our gaudi2 tutorial to learn how to use the Gaudi2 node with Apptainer. In case you don’t have access to Gaudi2 please reach out to one of our support channels.

Gaudi2 FTP Node Configuration

Public IP: 10.238.3.35
Server Chassis: Supermicro model SYS-820GH-TNR2
Motherboard: Supermicro X12DPG-OA6-GD2
High Speed Ethernet: 2x BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller
Infiband cards: 2x Infiniband MT27800 Family
RAM: 16 × 64 GiB = 1024 GiB = 1 TiB
CPU: Xeon(R) Platinum 8380, 2 sockets, 40 cores per socket, two threads per core = 160 cpus
Storage: 2x NVMe 3.5TB Micron_7450_MTFDKCB3T8TFR
OS: Ubuntu 22.04.4 LTS
Gaudi accelerators (HPUs): 8x Gaudi2 HL-225 accelerators