MPI
Message Passing Interface (MPI) is a popular standardized API for parallel processing both within a node and across many nodes. When using MPI, each task in a slurm job runs its the program in its own separate process, which all communicate to each other by MPI (generally using an MPI library).
Because MPI parallelization is between separate processes; variables, threads, file handles, and various other pieces of state are NOT shared between the processes even on the same node, in contrast to OpenMP which runs over many threads in a single process. Nothing is shared except that which is explicitly sent and received via the MPI API.
Implementations
Two MPI implementations are provided in the software stacks, OpenMPI and Intel MPI.
Both of them have two variants.
OpenMPI has the official variant and the Nvidia HPC SDK variant, which is more tuned and built for Nvidia GPUs, Nvidia NV-Link (used between the GPUs on some nodes), and Mellanox Infiniband.
Intel MPI has the classic variant and the newer OneAPI variant.
Their module names in each software stack are in the table below.
Note that in the event that there are multiple versions of a module, it is important to specify which one by module load module/VERSION
.
For example, the impi
module in the HLRN Modules (hlrn-tmod) software stack has 10 different versions, 9 of them the classic Intel MPI and 1 of them Intel OneAPI MPI.
Implementation | NHR Modules name | SCC Modules name | HLRN Modules name |
---|---|---|---|
OpenMPI (official) | openmpi (CUDA support on Grete) | openmpi | openmpi |
OpenMPI (Nvidia HPC SDK) | nvhpc | nvhpc | nvhpc-hpcx |
Intel MPI (OneAPI) | intel-oneapi-mpi | intel-oneapi-mpi | impi/2021.6 and newer |
Intel MPI (classic) | intel-mpi or intel/mpi | impi/2019.9 and older |
Do not mix up OpenMPI (an implementation of MPI) and OpenMP, which is a completely separate parallelization technology (it is even possible to use both at the same time). They just coincidentally have names that are really similar.
Compiling MPI Code
All MPI implementations work similarly for compiling code. You first load the module for the compilers you want to use and then the module for the MPI implementation (see table above).
For a specific version, run
module load openmpi/VERSION
and for the default version, run
module load openmpi
Substitute nvhpc
with nvhpc-hpcx
for the HLRN Modules (hlrn-tmod) software stack.
For a specific version, run
module load nvhpc/VERSION
and for the default version, run
module load nvhpc
Substitute intel-one-mpi/VERSION
with impi/2021.6
for the HLRN Modules (hlrn-tmod) software stack.
For a specific version, run
module load intel-oneapi-mpi/VERSION
and for the default version (not available for the HLRN Modules (hlrn-tmod) software stack), run
module load intel-oneapi-mpi
For a specific version (2019.9
or older), run
module load ipmi/VERSION
and for the default version, run
module load ipmi
In the rev/11.06
revision, substitute intel-mpi
with intel/mpi
module load intel-mpi/VERSION
and for the default version, run
module load intel-mpi
The MPI modules provide compiler wrappers that wrap around the C, C++, and Fortran compilers to setup the compiler and linking options that the MPI library needs. As a general rule, it is best to use these wrappers for compiling. One major exception is if you are using HDF5 or NetCDF, which provide their own compiler wrappers which will wrap over the MPI compiler wrappers. If the code uses build system and is MPI naïve, you might have to manually set environmental variables to make the build system use the wrappers. The compiler wrappers and the environmental variables you might have to set are given in the table below:
Language | Wrapper | Env. variable you might have to set |
---|---|---|
C | mpicc | CC |
C++ | mpicxx | CXX |
Fortran (modern) | mpifort or mpifc | FC |
Fortran (legacy) | mpif77 | F77 |
Note that Intel MPI also provides additional wrappers that double the “i” such as mpiicc
, mpiicxx
, and mpiifort
.
MPI naïve build systems can usually be convinced to use the MPI compiler wrappers like
CC=mpicc CXX=mpicxx FC=mpifort F77=mpif77 BUILD_SYSTEM_COMMAND [OPTIONS]
CC=mpicc CXX=mpicxx FC=mpifort F77=mpif77 cmake [OPTIONS]
CC=mpicc CXX=mpicxx FC=mpifort F77=mpif77 ./configure [OPTIONS]
Running MPI Programs
All MPI implementations work similarly for running in Slurm jobs, though they have vastly different extra options and environmental variables to tune their behavior.
Each provides a launcher program mpirun
to help run an MPI program.
Both OpenMPI and Intel MPI read the environmental variables that Slurm set and communicate with Slurm via PMI or PMIx in order set themselves up with the right processes on the right nodes and cores.
First load the module for the MPI implementation you are using:
For a specific version, run
module load openmpi/VERSION
and for the default version, run
module load openmpi
Substitute nvhpc
with nvhpc-hpcx
for the HLRN Modules (hlrn-tmod) software stack.
For a specific version, run
module load nvhpc/VERSION
and for the default version, run
module load nvhpc
Substitute intel-one-mpi/VERSION
with impi/2021.6
for the HLRN Modules (hlrn-tmod) software stack.
For a specific version, run
module load intel-oneapi-mpi/VERSION
and for the default version (not available for the HLRN Modules (hlrn-tmod) software stack), run
module load intel-oneapi-mpi
For a specific version (2019.9
or older), run
module load ipmi/VERSION
and for the default version, run
module load ipmi
In the rev/11.06
revision, substitute intel-mpi
with intel/mpi
module load intel-mpi/VERSION
and for the default version, run
module load intel-mpi
Then, run your program using the launcher your MPI implementation provided mpirun
like so:
mpirun [MPI_OPTIONS] PROGRAM [OPTIONS]
where PROGRAM
is the program you want to run, OPTIONS
are the options for PROGRAM
, and MPI_OPTIONS
are options controlling MPI behavior (these are specific to each implementation).
In some cases, it can make sense to use Slurm’s srun
as the launcher instead of mpirun
in batch jobs.
Examples would include when you want to use only a subset of the tasks instead of all of them.
Historically, there have been many bugs when launching MPI programs this way, so it is best avoided unless needed.