MPI Jobs with more than 40 (96) tasks per node failing

Problem

If you run a job on Emmy with more than 40 MPI tasks per node (or on Lise with more than 96 tasks per node), the job fails with the following errors in the output file

gcn1066.335807hfi_userinit: assign_context command failed: Device or resource busy
gcn1066.335807PSM2 can't open hfi unit: -1 (err=23)
[99] MPI startup(): ofi fabric is not available and fallback fabric is not enabled

Solution

Set environmental variable PSM2_MULTI_EP to 0 with export PSM2_MULTI_EP=0.

Note: This only works for applications that open only one OFI endpoint per process, which is the default setting. Or you can set –ntasks-per-node 40 for MPI jobs to limit maximum amount of MPI processes to 40 (replace 40 with 96 on Lise).