PoCL

Portable Computing Language (PoCL) is a widely used OpenCL platform particularly well known for providing the host CPUs as an OpenCL device. But PoCL does support other devices, such as Nvidia GPUs via CUDA. There are two variants, one with support for Nvidia GPUs and one without. The module names are given in the table below

DevicesNHR Modules name
CPUpocl
CPU, Nvidia GPUpocl/VERSION_cuda-CUDAMAJORVERSION

For example, pocl/5.0 would be the non-GPU variant and pocl/5.0_cuda-11 would be a GPU variant using CUDA 11.x.

Warning

Due to limitations in how the NHR Modules software stack is built, having a module for another platform loaded prevents use of this platform.

To load a specific version, run

module load pocl/VERSION

and for the default version (non-GPU), run

module load pocl

Controlling Runtime Behavior

PoCL uses a variety of environmental variables to control its runtime behavior, which are described in the PoCL Documentation. Two important environmental variables are POCL_DEVICES and POCL_MAX_CPU_CU_COUNT.

By default, PoCL provides access to the cpu device and all non-CPU devices it was compiled for. Setting POCL_DEVICES to a space separated list of devices limits PoCL to providing access to only those kinds of devices. The relevant device names are in the table below. Setting POCL_DEVICES=cuda would limit PoCL to only Nvidia GPUs, while setting it to POCL_DEVICES="cpu cuda" would limit PoCL to the host CPUs (with threads) and Nvidia GPUs.

Name for POCL_DEVICESDescription
cpuAll CPUs on the host using threads
cudaNvidia GPUs using CUDA

At the present time, PoCL is unable to determine how many CPUs it should use based on the limits set by Slurm. Instead, it tries to use one thread for every core it sees on the host including hyperthread cores, even if the Slurm job was run with say -c 1. To override the number of CPUs that PoCL sees (and therefore threads it uses for the cpu device), use the environmental variable POCL_MAX_CPU_CU_COUNT. This is particularly bad when running a job that doesn’t use all the cores on a shared node, in which case it usually makes the most sense to either first run

export POCL_MAX_CPU_CU_COUNT="$SLURM_CPUS_PER_TASK"

if one wants to use all hyperthreads, or

export POCL_MAX_CPU_CU_COUNT="$(( $SLURM_CPUS_PER_TASK / 2))"

if one wants only one thread per physical core (not using all hyperthreads).