Gaudi2
Introduction
Gaudi2 is Intel’s second-generation deep learning accelerator, developed by Habana Labs (now part of Intel). Unlike traditional GPUs, Gaudi2 has been designed from the ground up for large-scale AI training. Each device is powered by Habana Processing Units (HPUs), its purpose-built AI training cores. The memory-centric architecture and Ethernet-based scale-out enable efficient training of today’s large and complex models, while offering a favorable power-to-performance ratio. The platform provides 96 GB of on-chip high-bandwidth memory per device, together with 24×100 Gbps standard Ethernet interfaces. This combination eliminates the need for proprietary interconnects and allows flexible integration into existing cluster infrastructures. On the FTP, we currently host a single Gaudi2 node equipped with 8 HL-225 HPUs, available for researchers and developers to evaluate distributed AI training.
Key Features
Memory-Centric Design
Gaudi2 features 96 GB of HBM2E (High Bandwidth Memory 2E) with 2.45 TB/s bandwidth, providing the fast memory access essential for large model training. Unlike external DRAM, HBM2E is physically stacked on the chip close to the compute cores, which reduces latency and power consumption. Tom’s Hardware has a nice article explaining HBM.
Ethernet-Based Scaling
Instead of using proprietary interconnects, Gaudi2 integrates 24×100 Gbps RoCE v2 (RDMA over Converged Ethernet) network interfaces directly on-chip. RoCE v2 enables remote direct memory access between nodes across standard Ethernet, allowing data to move directly between device memories without involving the CPU. This reduces latency, lowers CPU overhead, and provides a combined networking capacity of 2.4 Tbps per accelerator. Because it relies on standard Ethernet, distributed AI training becomes more flexible, cost-effective, and easier to deploy in existing cluster environments.
Please note that only one Gaudi2 node with 8 HPUs is currently available on FTP.
Framework Compatibility
Gaudi2 supports popular AI frameworks like PyTorch through the SynapseAI software stack, with TensorFlow support was deprecated after version 1.15. The hardware integrates seamlessly with scheduling systems such as SLURM. Check our Getting Started, for a detailed way to access and use the Gaudi2 Node on our FTP cluster.
Application Areas
Gaudi2 is particularly suited for:
- Natural Language Processing (NLP)
- Computer Vision (CV)
- Large Language Model (LLM) training
- Generative AI models (e.g., diffusion-based image synthesis)
Practical tutorials and model examples are available in the Gaudi Tutorials and Examples section.
Software Stack: SynapseAI
SynapseAI is Habana Labs’ comprehensive software ecosystem for Gaudi processors, providing everything needed to program, optimize, and run machine learning workloads efficiently.
Framework Integration
- PyTorch support: Full compatibility with PyTorch through optimized plugins
- TensorFlow: Support deprecated after SynapseAI version 1.15
Optimized Libraries
- Pre-optimized computation kernels for matrix operations and convolutions
- Runtime software for scheduling, memory management, and multi-processor communication
- Development tools, including profilers, debuggers, and performance analyzers
GitHub Resources
How to Access
Access to Gaudi2 is currently possible through our Future Technology Platform (FTP). You need to contact support to get access, please ensure you use FTP in the subject and mention you need to access Gaudi2. For this, an account is required (usually a GWDG account, or an AcademicCloudID for external users), which then needs to be explicitly enabled by the admins to be able to access the FTP nodes. For more information, check our documentation on getting an account.
Access requests currently run through KISSKI (researchers and companies) and NHR (researchers at Universities only). Please consult their documentation and eventually request a project to test and utilize Gaudi2. If you have related questions, you can also reach out through one of our support channels.
After gaining access to FTP, log into FTP and check if you have access to Gaudi2.
scontrol show res Gaudi2
Once you have confirmed access follow our gaudi2 tutorial to learn how to use the Gaudi2 node with Apptainer. In case you don’t have access to Gaudi2 please reach out to one of our support channels.
Gaudi2 FTP Node Configuration

- Public IP: 10.238.3.35
- Server Chassis: Supermicro model SYS-820GH-TNR2
- Motherboard: Supermicro X12DPG-OA6-GD2
- High Speed Ethernet: 2x BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller
- Infiband cards: 2x Infiniband MT27800 Family
- RAM: 16 × 64 GiB = 1024 GiB = 1 TiB
- CPU: Xeon(R) Platinum 8380, 2 sockets, 40 cores per socket, two threads per core = 160 cpus
- Storage: 2x NVMe 3.5TB Micron_7450_MTFDKCB3T8TFR
- OS: Ubuntu 22.04.4 LTS
- Gaudi accelerators (HPUs): 8x Gaudi2 HL-225 accelerators