Slurm tutorial

Warning

This page is a work in progress and subject to change.

If you would like to share some feedback with us regarding this tutorial, you can write us an Email and put [GWDG Academy] into the title.

Table of contents:

Requirements

This tutorial assumes that you have learned all that is mentioned in the bash tutorial. Feel free to explore that tutorial or just skip to the exercise and see if you can solve all the steps.

Course information

These steps shown here accompany the HPC courses give at the GWDG Academy. Since this course will be given for all three systems, we will keep this general. Please replace every mention of <course> with the values for the respective system which can either be NHR, KISSKI, or SCC. The exact replacements are located near the exercise.

Chapter 01 - The file system

Exploring the options and checking the quota

Every user has access to at least three different storage locations, which are stored in these variables:

  • $HOME
  • $WORK
  • $PROJECT

Go to each location and find the full path of the current working directory.

In the $PROJECT directory, create a folder which has the name of your user account. This folder can be accessed by everyone in the project so you need to check the permissions of that folder and adjust it. For now set the permissions to be not readable by the group. Check the other folders that are there and see if you can read them.

Once you have done that, check the quota of your storage by using the show-quota command.

Copy files from local system

Follow the instructions on the Data Transfer page to copy a file from your local computer to your folder in the $PROJECT directory. Choose a file which you could share or just create a new file and upload that one. If you do not know what file to upload maybe do this:

date +"%H:%M %d.%m.%Y" > my-file.txt
echo "This file belongs to me" >> my-file.txt

Now you can open the permission for the folder again so that all on the project can access it and make the file read only for group. Go around and see what the others have uploaded.

Chapter 02 - The module system

Module basics

In order to use any software on the HPC system you need to use the module system. A brief explanation can be found in the module basics page.

Here, you will check the $PATH variable. Do it first without any module loaded. Once you know what is stored in there follow these steps:

echo $PATH
module load gcc
echo $PATH

module unload gcc
echo $PATH

module load gcc/9.5.0
echo $PATH

Also, check module avail before loading a compiler and after. The of the command might change depending on the compiler you have chosen. Try this out for the module gcc and the module intel-oneapi-compilers. What has changed?

SPACK

If you cannot find a software you are looking for, one option is to use spack. Once you have loaded the module and sourced the SPACK set up file source $SPACK_ROOT/share/spack/setup-env.sh you have access to the full repository of packages. Check the list for SPACK packages to see if your favourite software is available.

As a simple example, try these steps:

module load spack
source $SPACK_ROOT/share/spack/setup-env.sh

spack install ncdu
spack load ncdu
ncdu

This might take a moment.

Chapter 03 - Slurm the scheduler

Here we will need to replace the tag <course> with one name depending on the system. Follow this table to know what to substitute:

SystemPartition -p
NHRstandard96s:shared
KISSKIgrete:interactive
SCCscc-cpu

First command

The first command we can try is this one:

srun -p <course> -t 02:00 -n 1 hostname

What do you observe? Runs this command again, did anything change? The general syntax is srun <options> <command to run> and the command to run in this example is called hostname.

You can also already try this command:

srun -p <course> -t 02:00 -n 1 /opt/slurm/etc/scripts/misc/slurm_resources

Interactive session

You can also allocate a node, or portion of the node for an interactive session. This way, you can get a terminal on a node to try out some stuff:

srun -p <course> -t 10:00 -n 1 --pty /bin/bash

The update to the command above is the interactivity flag --pty and that we run the command called /bin/bash, which starts the bash program on the node.

Once a node is allocated you can manually run the two commands from above. Run both hostname and /opt/slurm/etc/scripts/misc/slurm_resources.

Difference between -c and -n

Use the srun command and the two options -p <course> and -t 02:00 with the program /opt/slurm/etc/scripts/misc/slurm_resources. This time adjust the options -c and -n in order to get these combinations:

  • 10 tasks
  • 10 tasks distributed over 3 nodes
  • 3 nodes with 3 tasks each
  • 1 task with 5 cores
  • 2 tasks per node on 2 nodes with 4 cores per task

Job scripts

Repeat the task from the last section but this time, write it as a job script (also called batch script). The template could look like this:

#!/bin/bash
#SBATCH -p <course>
#SBATCH -t 02:00
#SBATCH --qos=2h
#SBATCH -o job_%J.out

hostname
srun /opt/slurm/etc/scripts/misc/slurm_resources

Add the required combination of -c and -n to the script. Here are also two new options for a shorter queue time called --qos=2h and to redirect the output to a specific file -o job_%J.out, where the unique job id will replace the %J.

Check the output files and compare these results to the section before. Also, try to run them all at the same time.

Slurm commands

While you are at it, use these commands to check what is going on:

  • squeue --me
  • sinfo -p <course>
  • scancel -u $USER

The last one will cancel all you jobs, so take care.

Last modified: 2025-01-28 17:07:49