Interactive Jobs

What is an interactive job? Why use interactive jobs?

An interactive job requests resources from a queue, and immediately opens a session on the assigned nodes so you can work interactively. This is done usually on specially designated interactive or test queues/partitions, that have low to no wait times (but also usually low maximum resource allocation and short maximum sessions), so your session can start immediately. This can also be attempted on normal queues, but of course you would have to be present at the terminal when your job actually starts.

There are multiple use cases for interactive queues:

  • Performing trial runs of a program that should not be done on a log-in node. Remember log-in nodes are in principle just for logging in!
  • Testing a new setup, for example a new Conda configuration or a Snakemake workflow, in a realistic node environment. This prevents you from wasting time in a proper with waiting times, just for your job to fail due to the wrong packages being loaded.
  • Testing a new submission script or SLURM configuration.
  • Running heavy installation or compilation jobs or Apptainer container builds.
  • Running small jobs that don’t have large resource requirements, thus saving on queuing time.
  • Testing resource allocation, for example for GPUs, which can sometimes be tricky. Start up a GPU interactive job, and test if your system can see the number and type of GPUs you expected with nvidia-smi. Same for other resources such as CPU count adn RAM memory allocation. Do remember interactive queues usually low resource maximums or use older hardware, so this testing is not perfect!
  • Running the rare interactive-only tools and programs.

Which queues/partitions are interactive?

This changes as new partitions are added or retired. Sometimes they will explicitly say they are “interactive”, or “int”. Test partitions can often be used as itneractive queues, since they have short queueing times due to their short maximum job times. Check the current list of partitions. At the moment, interactive queues are jupyter (shared with the Jupyter-HPC service) and any test queues (mostly accessible from Emmy/NHR).

Pseudo-interactive jobs

If you have a job currently running in a given node and partition, you can actually SSH into that node. This can be useful in some cases to debug and check on your program and workflows. For example, you can check on the live GPU load with nvidia-smi or monitor the CPU processes and the host memory allocation with btop. Some of these checks are easier and more informative when performed live than using after-job reports such as the job output files or sacct.

u12345@glogin5 ~ $ squeue --me
  JOBID    PARTITION         NAME     USER  ACCOUNT     STATE       TIME  NODES NODELIST(REASON)
6892631   standard96         bash   u15412  myaccount   RUNNING     11:33     1 gcn2020
u12345@glogin5 ~ $ ssh gcn2020
u15412@gcn2020 ~ $

If you try this on a node you don’t currently have a job in it will fail, since naturally this resource has not been allocated to your user!

How to start an interactive job

To start a (proper) interactive job:

srun -p jupyter --pty -n 1 -c 16 bash

This will block your terminal while the job starts, which should be within a few minutes. If for some reason this is taking too long or it returns a message that the request can be denied (due to using the wrong partition or exceeding resource allocations for example), you can break the request with Ctrl-c.

In the above command:

  • starts up a job in the partition/queue designated after -p.
  • --pty runs in pseudo terminal mode.
  • then come the usual SLURM resource allocation options.
  • finally we ask it to start a session in bash.

You should see something like the following after your command, notice how the command line prompt (if you haven’t played around with this) changes after the job starts up and logs you into the node:

u12345@glogin5 ~ $ srun -p standard96:test --pty -n 1 -c 16 bash
srun: job 6892631 queued and waiting for resources
srun: job 6892631 has been allocated resources
u12345@gcn2020 ~ $ 

To stop an interactive session and return to the log-in node:

exit

GPU Interactive Jobs

See: GPU Usage.

Interactive Jobs with Internet Access

See: Internet access within jobs.