OpenMP
OpenMP (Open Multi-Processing) is a shared-memory parallelization extension to C, C++, and Fortran builtin to supporting compilers. OpenMP does parallelization across threads in a single process, rather than across processes in contrast to MPI. This means that variables, file handles, and other program state is shared in an OpenMP program with the downside that parallelization across multiple nodes is IMPOSSIBLE. However, it is possible to use OpenMP together with MPI to parallelize across processes as well, which can be on other nodes, to use the best of both frameworks (see Hybrid MPI + OpenMP for more information).
Compilers and Compiler Options
The following compilers that support OpenMP for C, C++, and/or Fortran and the compiler options to enable it are given in the table below.
Compiler | Option | SIMD Option | GPU Offload Option |
---|---|---|---|
GCC | -fopenmp | -fopenmp-simd | |
AMD Optimizing Compilers | -fopenmp | -fopenmp-simd | |
Intel Compilers | -qopenmp | -qopenmp-simd | |
LLVM | -fopenmp | -fopenmp-simd | |
Nvidia HPC Compilers, successor to the PGI compilers | -mp | -mp | -mp gpu |
Setting Number of Threads
OpenMP will use a number of threads equal to the value in the environmental variable OMP_NUM_THREADS
(or number of cores if it doesn’t exist).
Set it with
export OMP_NUM_THREADS=VALUE
Since Slurm pins tasks to the cores requested, it is generally best to set it to some multiple or division of the number of cores in the task.
On shared partitions (more than one job can run at a time), the environmental variable SLURM_CPUS_PER_TASK
holds the number of cores per task in the job.
On non-shared partitions (jobs take up whole node), the environmental variable SLURM_CPUS_ON_NODE
holds the number of hypercores on the node and SLURM_TASKS_PER_NODE
holds the number of tasks per node.
Common values to set it to on a node with hyper-threading would be
Number of Threads | VALUE on shared partition | VALUE on non-shared partition |
---|---|---|
one per core in task | $SLURM_CPUS_PER_TASK | $(( $SLURM_CPUS_ON_NODE / $SLURM_TASKS_PER_NODE / 2 )) |
one per hypercore in task | $(( 2 * $SLURM_CPUS_PER_TASK )) | $(( $SLURM_CPUS_ON_NODE / $SLURM_TASKS_PER_NODE )) |
one per pair of cores in task | $(( $SLURM_CPUS_PER_TASK / 2)) | $(( $SLURM_CPUS_ON_NODE / $SLURM_TASKS_PER_NODE / 4 )) |
Notice that you can use the $(( MATH ))
syntax for doing math operations in POSIX shells like Bash and Zsh.
OMP_NUM_THREADS
is set to 1 by default on the whole HPC cluster in order to not overload login nodes.
If you want to take advantage of OpenMP in compute jobs, you must change its value to some greater value.