PoCL
Portable Computing Language (PoCL) is a widely used OpenCL platform particularly well known for providing the host CPUs as an OpenCL device. But PoCL does support other devices, such as Nvidia GPUs via CUDA. There are two variants, one with support for Nvidia GPUs and one without. The module names are given in the table below
Devices | NHR Modules name |
---|---|
CPU | pocl |
CPU, Nvidia GPU | pocl/VERSION_cuda-CUDAMAJORVERSION |
For example, pocl/5.0
would be the non-GPU variant and pocl/5.0_cuda-11
would be a GPU variant using CUDA 11.x.
Due to limitations in how the NHR Modules software stack is built, having a module for another platform loaded prevents use of this platform.
To load a specific version, run
module load pocl/VERSION
and for the default version (non-GPU), run
module load pocl
Controlling Runtime Behavior
PoCL uses a variety of environmental variables to control its runtime behavior, which are described in the PoCL Documentation.
Two important environmental variables are POCL_DEVICES
and POCL_MAX_CPU_CU_COUNT
.
By default, PoCL provides access to the cpu
device and all non-CPU devices it was compiled for.
Setting POCL_DEVICES
to a space separated list of devices limits PoCL to providing access to only those kinds of devices.
The relevant device names are in the table below.
Setting POCL_DEVICES=cuda
would limit PoCL to only Nvidia GPUs, while setting it to POCL_DEVICES="cpu cuda"
would limit PoCL to the host CPUs (with threads) and Nvidia GPUs.
Name for POCL_DEVICES | Description |
---|---|
cpu | All CPUs on the host using threads |
cuda | Nvidia GPUs using CUDA |
At the present time, PoCL is unable to determine how many CPUs it should use based on the limits set by Slurm.
Instead, it tries to use one thread for every core it sees on the host including hyperthread cores, even if the Slurm job was run with say -c 1
.
To override the number of CPUs that PoCL sees (and therefore threads it uses for the cpu
device), use the environmental variable POCL_MAX_CPU_CU_COUNT
.
This is particularly bad when running a job that doesn’t use all the cores on a shared node, in which case it usually makes the most sense to either first run
export POCL_MAX_CPU_CU_COUNT="$SLURM_CPUS_PER_TASK"
if one wants to use all hyperthreads, or
export POCL_MAX_CPU_CU_COUNT="$(( $SLURM_CPUS_PER_TASK / 2))"
if one wants only one thread per physical core (not using all hyperthreads).