Workflow Intel MPI
Code Compilation
For code compilation you can choose one of the two compilers Intel or Gnu. Both compilers are able to include the Intel MPI library.
Intel compiler
module load intel/19.0.5
module load impi/2019.5
mpiicc -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.c
mpiifort -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.f90
mpiicpc -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.cpp
module load intel/19.0.5
module load impi/2019.5
mpiicc -qopenmp -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.c
mpiifort -qopenmp -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.f90
mpiicpc -qopenmp -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.cpp
Gnu compiler
module load gcc/9.3.0
module load impi/2019.5
mpigcc -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.c
mpif90 -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.f90
mpigxx -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.cpp
module load gcc/9.3.0
module load impi/2019.5
mpigcc -fopenmp -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.c
mpif90 -fopenmp -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.f90
mpigxx -fopenmp -Wl,-rpath,$LD_RUN_PATH -o hello.bin hello.cpp
Code execution
You need to start the MPI parallelized code on the system. You can choose between two approaches, namely using mpirun or srun.
Using mpirun
Using mpirun the pinning is controlled by the MPI library. Pinning by slurm you need to switch off by adding export SLURM_CPU_BIND=none.
MPI only
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --partition=standard96:test
module load impi/2019.5
export SLURM_CPU_BIND=none
mpirun -ppn 96 ./hello.bin
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --partition=standard96:test
module load impi/2019.5
export SLURM_CPU_BIND=none
export I_MPI_PIN_DOMAIN=core
export I_MPI_PIN_ORDER=scatter
mpirun -ppn 48 ./hello.bin
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --partition=standard96:test
module load impi/2019.5
export SLURM_CPU_BIND=none
mpirun -ppn 192 ./hello.bin
MPI, OpenMP
You can run one code compiled with MPI and OpenMP. The examples cover the setup
- 2 nodes,
- 4 processes per node, 24 threads per process.
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --partition=standard96:test
module load impi/2019.5
export SLURM_CPU_BIND=none
export OMP_NUM_THREADS=24
mpirun -ppn 4 ./hello.bin
The example covers the setup
- 2 nodes,
- 4 processes per node, 12 threads per process.
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --partition=standard96:test
module load impi/2019.5
export SLURM_CPU_BIND=none
export OMP_PROC_BIND=spread
export OMP_NUM_THREADS=12
mpirun -ppn 4 ./hello.bin
The example covers the setup
- 2 nodes,
- 4 processes per node using hyperthreading,
- 48 threads per process.
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --partition=standard96:test
module load impi/2019.5
export SLURM_CPU_BIND=none
export OMP_PROC_BIND=spread
export OMP_NUM_THREADS=48
mpirun -ppn 4 ./hello.bin
Using srun
MPI only
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --partition=standard96:test
srun --ntasks-per-node=96 ./hello.bin
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --partition=standard96:test
srun --ntasks-per-node=48 ./hello.bin
MPI, OpenMP
You can run one code compiled with MPI and OpenMP. The example covers the setup
- 2 nodes,
- 4 processes per node, 24 threads per process.
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --partition=standard96:test
export OMP_PROC_BIND=spread
export OMP_NUM_THREADS=24
srun --ntasks-per-node=4 --cpus-per-task=48 ./hello.bin
The example covers the setup
- 2 nodes,
- 4 processes per node, 12 threads per process.
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --partition=standard96:test
export OMP_PROC_BIND=spread
export OMP_NUM_THREADS=12
srun --ntasks-per-node=4 --cpus-per-task=24 ./hello.bin
The example covers the setup
- 2 nodes,
- 4 processes per node using hyperthreading,
- 48 threads per process.
#!/bin/bash
#SBATCH --nodes=2
#SBATCH --partition=standard96:test
export OMP_PROC_BIND=spread
export OMP_NUM_THREADS=48
srun --ntasks-per-node=4 --cpus-per-task=48 ./hello.bin