Nvidia HPC Compilers
The Nvidia HPC Compilers are the successors to the PGI compilers and have good CUDA support (see the official compiler documentation).
In all software stacks, the module name is nvhpc
as well as nvhpc-hpcx
in the HLRN Modules (hlrn-tmod) software stack to get Nvidia HPC SDK OpenMPI that supports jobs across more than one node.
To load a specific version, run
module load nvhpc/VERSION
To load the default version, run
module load nvhpc
The nvhpc
module (and similarly the nvhpc-hpcx
module) either have CUDA builtin or load the respective cuda
module, so you don’t need to load the cuda
module separately.
But if it didn’t load a cuda
module, loading one would let you target a different CUDA version using the -gpu=cudaX.Y
option to target CUDA X.Y.
Languages
The supported languages and the names of their compiler programs (and PGI compiler aliases) are in the table below.
Language | Compiler Program | PGI Compiler Alias |
---|---|---|
C | nvc | pgcc |
C++ | nvc++ | pgc++ |
Fortran | nvfortran | pgfortran |
OpenMP
The Nvidia HPC Compilers support the OpenMP (Open Multi-Processing) extension for C, C++, and Fortran.
Enable it by passing the -mp
or -mp KIND
options to the compiler where KIND
is multicore
(default if no option given) for using CPU cores or gpu
for GPU offloading on compatible GPUs (V100 and newer) with CPU fallback.
OpenACC
THe Nvidia HPC Compilers support the OpenACC (Open ACCelerators) extention for C, C++, and Fortran.
Enable it by passing the -acc
or -acc=KIND
options to the compiler where KIND
is gpu
for GPU offloading (default if no option given) or multicore
for using CPU cores.
There are additional KIND
as well as other options that can be used, which should be separated by commas.
See the Nvidia HPC Compilers OpenACC page for more information.
Targeting Architecture
GPU
By default, the Nvidia HPC Compilers will compile the GPU parts of the code for the compute capability of the GPUs attached to the node the compilers are run on, or all compute capabilities if none are present (most frontend nodes).
The former may mean that when the program is run on a compute node, the program won’t support the compute node’s GPUs (requires features the GPUs don’t provide) or will perform suboptimally (compiled for much lower capability).
The latter takes more time to compile and makes the program bigger.
The compilers use the -gpu=OPTION1,OPTION2,...
option to control the target GPU architecture, where different options are separated by commas.
The most important option is ccXY
where XY
is the compute capability.
It can be specified more than once to support more than one compute capability.
The compute capabilities for the different GPUs that are provided are listed on the Spack page (cuda_arch
is the compute capability).
CPU
By default, the Nvidia HPC Compilers will compile code targeting the generic version of the CPU the compilers are run on.
On an x86-64 node, this means being compatible with the original 64-bit AMD and Intel processors from 2003 and thus no AVX/AVX2/AVX-512 without SSE fallback.
The compilers use the -tp ARCH
options to control the target architecture.
The ARCH
values for the different CPU architectures (Spack naming) we provide are
Architecture/Target (Spack naming) | ARCH value |
---|---|
Most generic version of node compiler is running on | px |
The CPU of the node the compiler is running on | native or host |
haswell | haswell |
broadwell | haswell |
skylake_avx512 | skylake |
cascadelake | skylake |
sapphirerapids | skylake |
zen2 | zen2 |
zen3 | zen3 |