MPI

Message Passing Interface (MPI) is a popular standardized API for parallel processing both within a node and across many nodes. When using MPI, each task in a slurm job runs its the program in its own separate process, which all communicate to each other by MPI (generally using an MPI library).

Warning

Because MPI parallelization is between separate processes; variables, threads, file handles, and various other pieces of state are NOT shared between the processes even on the same node, in contrast to OpenMP which runs over many threads in a single process. Nothing is shared except that which is explicitly sent and received via the MPI API.

Implementations

Two MPI implementations are provided in the software stacks, OpenMPI and Intel MPI. Both of them have two variants. OpenMPI has the official variant and the Nvidia HPC SDK variant, which is more tuned and built for Nvidia GPUs, Nvidia NV-Link (used between the GPUs on some nodes), and Mellanox Infiniband. Intel MPI has the classic variant and the newer OneAPI variant. Their module names in each software stack are in the table below. Note that in the event that there are multiple versions of a module, it is important to specify which one by module load module/VERSION. For example, the impi module in the HLRN Modules (hlrn-tmod) software stack has 10 different versions, 9 of them the classic Intel MPI and 1 of them Intel OneAPI MPI.

ImplementationNHR Modules nameSCC Modules nameHLRN Modules name
OpenMPI (official)openmpi (CUDA support on Grete)openmpiopenmpi
OpenMPI (Nvidia HPC SDK)nvhpcnvhpcnvhpc-hpcx
Intel MPI (OneAPI)intel-oneapi-mpiintel-oneapi-mpiimpi/2021.6 and newer
Intel MPI (classic)intel-mpi or intel/mpiimpi/2019.9 and older
Warning

Do not mix up OpenMPI (an implementation of MPI) and OpenMP, which is a completely separate parallelization technology (it is even possible to use both at the same time). They just coincidentally have names that are really similar.

Compiling MPI Code

All MPI implementations work similarly for compiling code. You first load the module for the compilers you want to use and then the module for the MPI implementation (see table above).

load MPI:

For a specific version, run

module load openmpi/VERSION

and for the default version, run

module load openmpi

Substitute nvhpc with nvhpc-hpcx for the HLRN Modules (hlrn-tmod) software stack.

For a specific version, run

module load nvhpc/VERSION

and for the default version, run

module load nvhpc

Substitute intel-one-mpi/VERSION with impi/2021.6 for the HLRN Modules (hlrn-tmod) software stack.

For a specific version, run

module load intel-oneapi-mpi/VERSION

and for the default version (not available for the HLRN Modules (hlrn-tmod) software stack), run

module load intel-oneapi-mpi

For a specific version (2019.9 or older), run

module load ipmi/VERSION

and for the default version, run

module load ipmi

In the rev/11.06 revision, substitute intel-mpi with intel/mpi

module load intel-mpi/VERSION

and for the default version, run

module load intel-mpi

The MPI modules provide compiler wrappers that wrap around the C, C++, and Fortran compilers to setup the compiler and linking options that the MPI library needs. As a general rule, it is best to use these wrappers for compiling. One major exception is if you are using HDF5 or NetCDF, which provide their own compiler wrappers which will wrap over the MPI compiler wrappers. If the code uses build system and is MPI naïve, you might have to manually set environmental variables to make the build system use the wrappers. The compiler wrappers and the environmental variables you might have to set are given in the table below:

LanguageWrapperEnv. variable you might have to set
CmpiccCC
C++mpicxxCXX
Fortran (modern)mpifort or mpifcFC
Fortran (legacy)mpif77F77
Tip

Note that Intel MPI also provides additional wrappers that double the “i” such as mpiicc, mpiicxx, and mpiifort.

MPI naïve build systems can usually be convinced to use the MPI compiler wrappers like

passing wrappers to build system:
CC=mpicc CXX=mpicxx FC=mpifort F77=mpif77 BUILD_SYSTEM_COMMAND [OPTIONS]
CC=mpicc CXX=mpicxx FC=mpifort F77=mpif77 cmake [OPTIONS]
CC=mpicc CXX=mpicxx FC=mpifort F77=mpif77 ./configure [OPTIONS]

Running MPI Programs

All MPI implementations work similarly for running in Slurm jobs, though they have vastly different extra options and environmental variables to tune their behavior. Each provides a launcher program mpirun to help run an MPI program. Both OpenMPI and Intel MPI read the environmental variables that Slurm set and communicate with Slurm via PMI or PMIx in order set themselves up with the right processes on the right nodes and cores.

First load the module for the MPI implementation you are using:

load MPI:

For a specific version, run

module load openmpi/VERSION

and for the default version, run

module load openmpi

Substitute nvhpc with nvhpc-hpcx for the HLRN Modules (hlrn-tmod) software stack.

For a specific version, run

module load nvhpc/VERSION

and for the default version, run

module load nvhpc

Substitute intel-one-mpi/VERSION with impi/2021.6 for the HLRN Modules (hlrn-tmod) software stack.

For a specific version, run

module load intel-oneapi-mpi/VERSION

and for the default version (not available for the HLRN Modules (hlrn-tmod) software stack), run

module load intel-oneapi-mpi

For a specific version (2019.9 or older), run

module load ipmi/VERSION

and for the default version, run

module load ipmi

In the rev/11.06 revision, substitute intel-mpi with intel/mpi

module load intel-mpi/VERSION

and for the default version, run

module load intel-mpi

Then, run your program using the launcher your MPI implementation provided mpirun like so:

mpirun [MPI_OPTIONS] PROGRAM [OPTIONS]

where PROGRAM is the program you want to run, OPTIONS are the options for PROGRAM, and MPI_OPTIONS are options controlling MPI behavior (these are specific to each implementation).

Info

In some cases, it can make sense to use Slurm’s srun as the launcher instead of mpirun in batch jobs. Examples would include when you want to use only a subset of the tasks instead of all of them. Historically, there have been many bugs when launching MPI programs this way, so it is best avoided unless needed.