Interactive Jobs

What is an interactive job? Why use interactive jobs?

An interactive job requests resources from a partition, and immediately opens a session on the assigned nodes so you can work interactively. This is done usually on specially designated interactive or test partitions, that have low to no wait times (but also usually low maximum resource allocation and short maximum sessions), so your session can start immediately. This can also be attempted on normal partitions, but of course you would have to be present at the terminal when your job actually starts.

There are multiple use cases for interactive jobs:

  • Performing trial runs of a program that should not be done on a log-in node. Remember login nodes are in principle just for logging in!
  • Testing a new setup, for example a new Conda configuration or a Snakemake workflow, in a realistic node environment. This prevents you from wasting time in a proper partition with waiting times, just for your job to fail due to the wrong packages being loaded.
  • Testing a new submission script or SLURM configuration.
  • Running heavy installation or compilation jobs or Apptainer container builds.
  • Running small jobs that don’t have large resource requirements, thus reducing waiting time.
  • Doing quick benchmarks to determine the best partition to run further computations in (e.g. is the code just as fast on Emmy Phase 2 nodes as Emmy Phase 3)
  • Testing resource allocation, which can sometimes be tricky particularly for GPUs. Start up a GPU interactive job, and test if your system can see the number and type of GPUs you expected with nvidia-smi. Same for other resources such as CPU count and RAM memory allocation. Do remember interactive partitions usually have low resource maximums or use older hardware, so this testing is not perfect!
  • Running the rare interactive-only tools and programs.
Tip

The Jupyter-HPC service is also provided for full graphical interactive JupyterHub, RStudio, IDE, and Desktop sessions.

How to start an interactive job

To start a (proper) interactive job:

srun -p jupyter --pty -n 1 -c 16 bash

This will block your terminal while the job starts, which should be within a few minutes. If for some reason this is taking too long or it returns a message that the request is denied (due to using the wrong partition or exceeding resource allocations for example), you can break the request with Ctrl-c.

In the above command:

  • starts a job in the partition designated after -p
  • --pty runs in pseudo terminal mode (critical for interactive shells)
  • then the usual SLURM resource allocation options
  • finally we ask it to start a session in bash
Tip

Don’t forget to specify the time limit with -t LIMIT if the job will be short so that it is more likely to start earlier, where LIMIT can be in the form of MINUTES, HOURS:MINUTES:SECONDS, DAYS-HOURS, etc. (see the srun man page for all available formats). This is especially important on partitions not specialized for interactive and test jobs. If your time limit is less than or equal to 2 hours, you can also add --qos=2h to use the 2 hour QOS to further reduce the likely wait time.

Info

If you want to run a GUI application in the interactive job, you will almost certainly need X11 forwarding. To do that, you must have SSH-ed into the login node with X11 forwarding and add the --x11 option to the srun command for starting the interactive job. This then forwards X11 from the interactive job all the way to your machine via the login node.

Though, in some cases, it might make more sense to use the Jupyter-HPC service instead.

You should see something like the following after your command, notice how the command line prompt (if you haven’t played around with this) changes after the job starts up and logs you into the node:

u12345@glogin5 ~ $ srun -p standard96:test --pty -n 1 -c 16 bash
srun: job 6892631 queued and waiting for resources
srun: job 6892631 has been allocated resources
u12345@gcn2020 ~ $ 

To stop an interactive session and return to the login node:

exit

Which partitions are interactive?

Any partition can be used interactively if it is empty enough, but some are specialized for it with shorter wait times and thus better suited for interactive jobs. These interactive partitions can change as new partitions are added or retired. Check the list of partitions for the most current information. Partitions whose names match the following are specialized for shorter wait times:

  • *:interactive
  • *:test which have shorter maximum job times
  • jupyter* which is shared with the Jupyter-HPC service and are overprovisioned (e.g. your job may share cores with other jobs)

You can also just look for other partitions with nodes in the idle state (or mixed nodes if doing a job that doens’t require a full node on a shared partition) with sinfo -p PARTITION. For example, if we check the scc-gpu partition:

[scc_agc_test_accounts] u12283@glogin6 ~ $ sinfo -p scc-gpu
PARTITION AVAIL  TIMELIMIT  NODES  STATE NODELIST
scc-gpu      up 2-00:00:00      1  inval ggpu194
scc-gpu      up 2-00:00:00      3   mix- ggpu[135,138,237]
scc-gpu      up 2-00:00:00      1   plnd ggpu145
scc-gpu      up 2-00:00:00      1  down* ggpu150
scc-gpu      up 2-00:00:00      1   comp ggpu199
scc-gpu      up 2-00:00:00      1  drain ggpu152
scc-gpu      up 2-00:00:00      1   resv ggpu140
scc-gpu      up 2-00:00:00     11    mix ggpu[139,141,147-149,153-155,195-196,212]
scc-gpu      up 2-00:00:00      6  alloc ggpu[136,142-144,146,211]
scc-gpu      up 2-00:00:00      4   idle ggpu[151,156,197-198]

we can see that there are 4 idle nodes and 11 mixed nodes. This means that an interactive job using a single node should start rather quickly, particularly if it only requires part of a node since then one of the mixed nodes might be able to run it too.

Pseudo-interactive jobs

If you have a job currently running on a given node, you can actually SSH into that node. This can be useful in some cases to debug and check on your program and workflows. For example, you can check on the live GPU load with nvidia-smi or monitor the CPU processes and the host memory allocation with btop. Some of these checks are easier and more informative when performed live rather than using after-job reports such as the job output files or sacct.

u12345@glogin5 ~ $ squeue --me
  JOBID    PARTITION         NAME     USER  ACCOUNT     STATE       TIME  NODES NODELIST(REASON)
6892631   standard96         bash   u12345  myaccount   RUNNING     11:33     1 gcn2020
u12345@glogin5 ~ $ ssh gcn2020
u12345@gcn2020 ~ $

If you try this on a node you don’t currently have a job in it will fail, since its resources have not been allocated to your user!

GPU Interactive Jobs

See: GPU Usage.

Interactive Jobs with Internet Access

See: Internet access within jobs.