Job arrays
Job arrays are the preferred way to submit many similar jobs, for instance, if you need to run the same program on several input files, or run it repeatedly with different settings or parameters. This type of parallelism is usually called “embarrassingly parallel” or trivially parallel jobs.
Arrays are created with the -a start-finish
sbatch parameter.
E.g. sbatch -a 0-19
will create 20 jobs indexed from 0 to 19.
There are different ways to index the arrays, which are described below.
Job Array Indexing, Stepsize and more
Slurm supports a number of ways to set up the indexing in job arrays.
- Range:
-a 0-5
- Multiple values:
-a 1,5,12
- Step size:
-a 0-5:2
(same as -a 0,2,4) - Combined:
-a 0-5:2,20
(same as -a 0,2,4,20)
Additionally, you can limit the number of simultaneously running jobs with the %x
parameter in there:
-a 0-11%4
only four jobs at once-a 0-11%1
run all jobs sequentially-a 0-5:2,20%2
everything combined. Run IDs 0,2,4,20, but only two at a time.
You can read everything on array indexing in the sbatch man page.
Slurm Array Environment Variables
The behavior of your applications inside array jobs can be tied to Slurm environment variables, e.g. to tell the program which part of the array they should process.
These variables will have different values for each job in the array.
Probably the most commonly used of these environment variables is $SLURM_ARRAY_TASK_ID
, which holds the index of the current job in the array.
Other useful variables are:
SLURM_ARRAY_TASK_COUNT
: Total number of tasks in a array.SLURM_ARRAY_TASK_MAX
: Job array’s maximum ID (index) number.SLURM_ARRAY_TASK_MIN
: Job array’s minimum ID (index) number.SLURM_ARRAY_TASK_STEP
: Job array’s index step size.SLURM_ARRAY_JOB_ID
: Job array’s master job ID number.
Example job array
The most simple example for using a job array is running a loop in parallel.
#!/bin/bash
#SBATCH -p medium
#SBATCH -t 10:00
#SBATCH -n 1
#SBATCH -c 4
module load python
for i in {1..100}; do
python myprogram.py $i
done
#!/bin/bash
#SBATCH -p medium
#SBATCH -t 10:00
#SBATCH -n 1
#SBATCH -c 4
#SBATCH -a 1-100
module load python
python myprogram.py $SLURM_ARRAY_TASK_ID
The loop in the first example runs on the same node and in serial. More efficiently, the job array in the second tab unrolls the loop and if enough resources are available, runs all of the 100 jobs in parallel.
Example job array running over files
This is an example of a job array, creates a job for every file ending in .inp
in the current working directory:
#!/bin/bash
#SBATCH -p medium
#SBATCH -t 01:00
#SBATCH -a 0-X
# insert X as the number of .inp files you have -1 (since bash arrays start counting from 0)
# ls *.inp | wc -l
#for safety reasons
shopt -s nullglob
#create a bash array with all files
arr=(./*.inp)
#put your command here. This just runs the fictional "big_computation" program with one of the files as input
./big_computation ${arr[$SLURM_ARRAY_TASK_ID]}
In this case, you have to get the number of files beforehand (fill in the X).
You can also do that automatically by removing the #SBATCH -a
line and adding that information on the command line when submitting the job:
sbatch -a 0-$(($(ls ./*.inp | wc -l)-1)) jobarray.sh
The part in parentheses just uses ls to output all .inp files, counts them with wc
and subtracts 1, since bash arrays start counting at 0.