Hybrid MPI + OpenMP

Often, Message Passing Interface (MPI) and OpenMP (Open Multi-Processing) are used together to make hybrid jobs, using MPI to parallelize between nodes and OpenMP within nodes. This means codes must be compiled with both, carefully launched with Slurm to set the number of tasks and cores per task correctly.

Example Code

Here is an example code using both:

#include <stdio.h>

#include <omp.h>
#include <mpi.h>


int main(int argc, char** argv)
{
    // Initialize the MPI environment
    MPI_Init(NULL, NULL);

    // Get the number of processes
    int world_size;
    MPI_Comm_size(MPI_COMM_WORLD, &world_size);

    // Get the rank of the process
    int world_rank;
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    // Get the name of the processor
    char processor_name[MPI_MAX_PROCESSOR_NAME];
    int name_len;
    MPI_Get_processor_name(processor_name, &name_len);

    int nthreads, tid;

    // Fork a team of threads giving them their own copies of variables
    #pragma omp parallel private(nthreads, tid)
    {

        // Obtain thread number
        tid = omp_get_thread_num();
        printf("Hello World from thread = %d, processor %s, rank %d out of %d processors\n", tid, processor_name, world_rank, world_size);

        // Only primary thread does this
        if (tid == 0)
        {
            nthreads = omp_get_num_threads();
            printf("Number of threads = %d\n", nthreads);
        }

    }  // All threads join primary thread

    // Finalize the MPI environment.
    MPI_Finalize();
}

Compilation

To compile it, load the compiler and MPI module and then essentially combine the MPI compiler wrappers (see MPI) with the OpenMP compiler options (see OpenMP). For the example above, you would do

module load gcc
module load openmpi
mpicc -fopenmp -o hybrid_hello_world.bin hybrid_hello_world.c
module load intel-oneapi-compilers
module load intel-oneapi-mpi
mpiicx -qopenmp -o hybrid_hello_world.bin hybrid_hello_world.c
module load intel-oneapi-compilers
module load openmpi
mpicc -qopenmp -o hybrid_hello_world.bin hybrid_hello_world.c

Batch Job

When submitting the batch job, you have to decide how separate MPI processes you want to run per node (tasks) and how many cores for each (usually the number on the node divided by the number of tasks per node). The best way to do this is to explicitly set

  • -N <nodes> for the number of nodes
  • --tasks-per-node=<tasks-per-node> for the number of separate MPI processes you want on each node
  • -c <cores-per-task> if you want to specify the number of cores per task (if you leave it out, it will evenly divide them)

and then run the code in the jobscript with mpirun, which will receive all the required information from Slurm (we do not recommend using srun). If we run the example above using two nodes where each node runs 2 tasks each that uses all cores (but not hypercores), one would use the following job script:

#!/bin/bash

#SBATCH --time=00:10:00
#SBATCH --nodes=2
#SBATCH --tasks-per-node=2
#SBATCH --partition=standard96:test

module load gcc
module load openmpi

export OMP_NUM_THREADS=$(( $SLURM_CPUS_ON_NODE / $SLURM_NTASKS_PER_NODE / 2 ))

mpirun ./hybrid_hello_world.bin
#!/bin/bash

#SBATCH --time=00:10:00
#SBATCH --nodes=2
#SBATCH --tasks-per-node=2
#SBATCH --partition=standard96:test

module load intel-oneapi-compilers
module load intel-oneapi-mpi

export OMP_NUM_THREADS=$(( $SLURM_CPUS_ON_NODE / $SLURM_NTASKS_PER_NODE / 2 ))

mpirun ./hybrid_hello_world.bin
#!/bin/bash

#SBATCH --time=00:10:00
#SBATCH --nodes=2
#SBATCH --tasks-per-node=2
#SBATCH --partition=standard96:test

module load intel-oneapi-compilers
module load openmpi

export OMP_NUM_THREADS=$(( $SLURM_CPUS_ON_NODE / $SLURM_NTASKS_PER_NODE / 2 ))

mpirun ./hybrid_hello_world.bin