OpenMP

OpenMP (Open Multi-Processing) is a shared-memory parallelization extension to C, C++, and Fortran builtin to supporting compilers. OpenMP does parallelization across threads in a single process, rather than across processes in contrast to MPI. This means that variables, file handles, and other program state is shared in an OpenMP program with the downside that parallelization across multiple nodes is IMPOSSIBLE. However, it is possible to use OpenMP together with MPI to parallelize across processes as well, which can be on other nodes, to use the best of both frameworks (see Hybrid MPI + OpenMP for more information).

Compilers and Compiler Options

The following compilers that support OpenMP for C, C++, and/or Fortran and the compiler options to enable it are given in the table below.

CompilerOptionSIMD OptionGPU Offload Option
GCC-fopenmp-fopenmp-simd
AMD Optimizing Compilers-fopenmp-fopenmp-simd
Intel Compilers-qopenmp-qopenmp-simd
LLVM-fopenmp-fopenmp-simd
Nvidia HPC Compilers, successor to the PGI compilers-mp-mp-mp gpu

Setting Number of Threads

OpenMP will use a number of threads equal to the value in the environmental variable OMP_NUM_THREADS (or number of cores if it doesn’t exist). Set it with

export OMP_NUM_THREADS=VALUE

Since Slurm pins tasks to the cores requested, it is generally best to set it to some multiple or division of the number of cores in the task. On shared partitions (more than one job can run at a time), the environmental variable SLURM_CPUS_PER_TASK holds the number of cores per task in the job. On non-shared partitions (jobs take up whole node), the environmental variable SLURM_CPUS_ON_NODE holds the number of hypercores on the node and SLURM_TASKS_PER_NODE holds the number of tasks per node. Common values to set it to on a node with hyper-threading would be

Number of ThreadsVALUE on shared partitionVALUE on non-shared partition
one per core in task$SLURM_CPUS_PER_TASK$(( $SLURM_CPUS_ON_NODE / $SLURM_TASKS_PER_NODE / 2 ))
one per hypercore in task$(( 2 * $SLURM_CPUS_PER_TASK ))$(( $SLURM_CPUS_ON_NODE / $SLURM_TASKS_PER_NODE ))
one per pair of cores in task$(( $SLURM_CPUS_PER_TASK / 2))$(( $SLURM_CPUS_ON_NODE / $SLURM_TASKS_PER_NODE / 4 ))

Notice that you can use the $(( MATH )) syntax for doing math operations in POSIX shells like Bash and Zsh.

Warning

OMP_NUM_THREADS is set to 1 by default on the whole HPC cluster in order to not overload login nodes. If you want to take advantage of OpenMP in compute jobs, you must change its value to some greater value.