Nvidia HPC Compilers

The Nvidia HPC Compilers are the successors to the PGI compilers and have good CUDA support (see the official compiler documentation). In all software stacks, the module name is nvhpc as well as nvhpc-hpcx in the HLRN Modules (hlrn-tmod) software stack to get Nvidia HPC SDK OpenMPI that supports jobs across more than one node. To load a specific version, run

module load nvhpc/VERSION

To load the default version, run

module load nvhpc

Info

The nvhpc module (and similarly the nvhpc-hpcx module) either have CUDA builtin or load the respective cuda module, so you don’t need to load the cuda module separately. But if it didn’t load a cuda module, loading one would let you target a different CUDA version using the -gpu=cudaX.Y option to target CUDA X.Y.

Languages

The supported languages and the names of their compiler programs (and PGI compiler aliases) are in the table below.

Language	Compiler Program	PGI Compiler Alias
C	`nvc`	`pgcc`
C++	`nvc++`	`pgc++`
Fortran	`nvfortran`	`pgfortran`

OpenMP

The Nvidia HPC Compilers support the OpenMP (Open Multi-Processing) extension for C, C++, and Fortran. Enable it by passing the -mp or -mp KIND options to the compiler where KIND is multicore (default if no option given) for using CPU cores or gpu for GPU offloading on compatible GPUs (V100 and newer) with CPU fallback.

OpenACC

THe Nvidia HPC Compilers support the OpenACC (Open ACCelerators) extention for C, C++, and Fortran. Enable it by passing the -acc or -acc=KIND options to the compiler where KIND is gpu for GPU offloading (default if no option given) or multicore for using CPU cores. There are additional KIND as well as other options that can be used, which should be separated by commas. See the Nvidia HPC Compilers OpenACC page for more information.

Targeting Architecture

GPU

By default, the Nvidia HPC Compilers will compile the GPU parts of the code for the compute capability of the GPUs attached to the node the compilers are run on, or all compute capabilities if none are present (most frontend nodes). The former may mean that when the program is run on a compute node, the program won’t support the compute node’s GPUs (requires features the GPUs don’t provide) or will perform suboptimally (compiled for much lower capability). The latter takes more time to compile and makes the program bigger. The compilers use the -gpu=OPTION1,OPTION2,... option to control the target GPU architecture, where different options are separated by commas. The most important option is ccXY where XY is the compute capability. It can be specified more than once to support more than one compute capability. The compute capabilities for the different GPUs that are provided are listed on the Spack page (cuda_arch is the compute capability).

CPU

By default, the Nvidia HPC Compilers will compile code targeting the generic version of the CPU the compilers are run on. On an x86-64 node, this means being compatible with the original 64-bit AMD and Intel processors from 2003 and thus no AVX/AVX2/AVX-512 without SSE fallback. The compilers use the -tp ARCH options to control the target architecture. The ARCH values for the different CPU architectures (Spack naming) we provide are

Architecture/Target (Spack naming)	`ARCH` value
Most generic version of node compiler is running on	`px`
The CPU of the node the compiler is running on	`native` or `host`
`skylake_avx512`	`skylake`
`cascadelake`	`skylake`
`sapphirerapids`	`skylake`
`zen2`	`zen2`
`zen3`	`zen3`