Workspaces
Workspaces work very differently from the other data stores. Rather than projects simply getting a directory in the respective filesystem for the lifetime of the project, directories with expiration dates are requested and created dynamically. Each project and project user can have multiple workspaces per data store, each with their own expiration dates. After the expiration date, the expired workspace and the data within it are removed and eventually deleted. The use of workspaces enables users to manage the data life cycle of their HPC data and prevents filesystems from filling up with inactive data. The concept of workspaces has become quite popular and is applied in a large number of HPC centers around the world. We use the well known HPC Workspace tooling from Holger Berger to manage workspaces.
All Slurm jobs get their own temporary storage directories on the nodes themselves and the fastest shared filesystems available to the particular nodes, which are cleaned up when the job is finished. If you only need temporary storage for the lifetime of the job, those directories are better suited than workspaces. See Temporary Storage for more information.
Workspaces are meant for active data and are configured for high aggregate bandwidth (across all files and nodes at once) at the expense of robustness. The bandwidth for single files varies based on the underlying filesystem and how they are stored. Common workflows are to use workspaces for temporary data, or store data in a compact form in a Project or COLD data store and then copy and/or decompress it to a higher performance workspace to be used for a short period of time. The characteristics of the Workspaces data stores are
- Different filesystems optimized for different purposes (e.g. capacity vs. individual file bandwidth)
- Meant for active data (heavily used data with a relatively short lifetime)
- Has a generous quota for the whole project (the finite lifetimes of workspaces help protect against running out of space)
- Has NEITHER backups nor snapshots
Workspaces have NO BACKUPS. Their aggregate bandwidth comes at the price of robustness, often with less data redundancy to protect against data loss than other data stores. This means there is a non-negligible risk of data on them being completely lost if more than a few components/drives in the underlying storage fail at the same time.
The data stores for workspaces available to each kind of project are:
Kind of Project | Name | Media | Capacity | Filesystem |
---|---|---|---|---|
SCC, NHR, REACT | Ceph SSD WS (ceph-ssd ) | SSD | 607 TiB | CephFS |
SCC, NHR, REACT | Ceph HDD WS (ceph-hdd ) | HDD with metadata on SSD | 21 PiB (shared) | CephFS |
NHR | Lustre RZG WS (lustre-rzg ) | SSD | 509 TiB (shared) | Lustre |
Legend for the tags in the Capacity column:
(shared): They share capacity with other data stores. For example, the NHR workspaces on Lustre RZG WS are on the same filesystem as the NHR SCRATCH/WORK directories.
Workspace Basics
The six basic commands to handle workspaces and manage their lifecycles are:
Command | Description |
---|---|
ws_allocate | Create a workspace |
ws_extend | Extend a workspace’s expiration date |
ws_list | List all workspaces or available data stores for them |
ws_release | Release a workspace (files will be deleted after a grace period) |
ws_restore | Restore a previously released workspace (if in grace period) |
ws_register | Manage symbolic links to workspaces |
All six commands have help messages accessible by COMMAND -h
and man pages accessible by man COMMAND
.
None of the workspace commands except for ws_list
and ws_register
work inside user namespaces such as created by running containers of any sort (e.g. Apptainer) or manually with unshare
.
This includes the JupyterHPC service and HPC desktops.
Workspaces are created with the requested expiration time (each data store has a maximum allowed value).
The default expiration time if none is requested is 1 day.
Workspaces can have their expiration extended a limited number of times.
After a workspace expires, it is released.
Released workspaces can be restored for a limited grace period, after which the data is permanently deleted.
Note that released workspaces still count against your filesystem quota during the grace period.
All workspace tools use the -F <name>
option to control which data store to operate on, where the default depends on the kind of project.
The various limits, default data store for each kind of project, as well as which cluster islands each data store is meant to be used from and their purpose/specialty are:
Name | Default | Islands | Purpose/Specialty | Time Limit | Extensions | Grace Period |
---|---|---|---|---|---|---|
ceph-ssd | NHRSCC REACT | all | all-rounder | 30 days | 2 (90 day max lifetime) | 30 days |
ceph-hdd | all | large size | 60 days | 2 (180 day max lifetime) | 30 days | |
lustre-rzg | GreteEmmy P3 | High bandwidth for medium and big files | 30 days | 2 (90 day max lifetime) | 30 days |
Only workspaces on data stores mounted on a particular node are visible and can be managed (allocate, release, list, etc.).
If the data store that is the default for your kind of project is not available on a particular node, the special DONT_USE
data store will be the default that doesn’t support allocation (you must then specify -F <name>
in all cases).
See Cluster Storage Map for more information on which filesystems are available where.
Managing Workspaces
Allocating
Workspaces are created via
ws_allocate [OPTIONS] WORKSPACE_NAME DURATION
The duration is given in days and workspace names are limited to ASCII letters, numbers, dashes, dots, and underscores.
The most important options (run ws_allocate -h
to see the full list) are:
-F [ --filesystem ] arg filesystem
-r [ --reminder ] arg reminder to be sent n days before expiration
-m [ --mailaddress ] arg mailaddress to send reminder to
-g [ --group ] group workspace
-G [ --groupname ] arg groupname
-c [ --comment ] arg comment
Use --reminder <days> --mailaddress <email>
to be emailed a reminder the specified number of days before the workspace expires.
Use --group --groupname <group>
to make the workspace readable and writable by the members of the specified group, however this only works for members of the group that are also members of the same project.
Members of other projects (than the username you used to create the workspace) cannot access it, even if you have a common POSIX group and use the group option.
Thus, usually the only value that makes sense is the group HPC_<project>
, which can be conveniently generated via "HPC_${PROJECT_NAME}"
in the shell.
If you run ws_allocate
for a workspace that already exists, it just prints its path to stdout, which can be used if you forgot the path (you can also use ws_list
).
Workspace names and their paths are not private.
Any user on the cluster can see which workspaces exist and who created them.
However, other usernames cannot access workspaces unless the workspace was created with --group --groupname <group>
and they are both a member of the same project and of that group.
To create a workspace named MayData
on ceph-ssd
with a lifetime of 6 days which emails a reminder 2 days before expiration, you could run:
~ $ ws_allocate -F ceph-ssd -r 2 -m myemail@example.com MayData 6
Info: creating workspace.
/mnt/ceph-ssd/workspaces/ws/nhr_internal_ws_test/u17588-MayData
remaining extensions : 2
remaining time in days: 6
The path to the workspace is printed to stdout while additional information is printed to stderr. This makes it easy to get the path and save it as an environment variable:
~ $ WS_PATH=$(ws_allocate -F ceph-ssd -r 2 -m myemail@example.com MayData 6)
Info: creating workspace.
remaining extensions : 2
remaining time in days: 6
~ $ echo $WS_PATH
/mnt/ceph-ssd/workspaces/ws/nhr_internal_ws_test/u17588-MayData
You can set defaults for the duration as well as the --reminder
, --mailaddress
, and --groupname
options by creating a YAML config file at $HOME/.ws_user.conf
formatted like this:
duration: 15
groupname: HPC_foo
reminder: 3
mail: myemail@example.com
Listing
Use the ws_list
command to list your workspaces, the workspaces made available to you by other users in your project, and the available datastores.
Use the -l
option to see the available data stores for your username on the particular node you are currently using:
~ $ ws_list -l
available filesystems:
ceph-ssd
lustre-rzg
DONT_USE (default)
Note that the special unusable location DONT_USE
is always listed as the default even if the default for the kind of project your username is in is available.
Running ws_list
by itself lists your workspaces that can be accessed from the node (not all data stores are available on all nodes).
Add the -g
option to additionally list the ones made available to you by other users.
If you run ws_list -g
after creating the workspace in the previous example, you would get:
~ $ ws_list -g
id: MayData
workspace directory : /mnt/ceph-ssd/workspaces/ws/nhr_internal_ws_test/u17588-MayData
remaining time : 5 days 23 hours
creation time : Thu Jun 5 15:01:14 2025
expiration date : Wed Jun 25 15:01:14 2025
filesystem name : ceph-ssd
available extensions : 2
Extending
The expiration time of a workspace can be extended with ws_extend
up to the maximum time allowed for an allocation on the chosen data store.
It is even possible to reduce the expiration time by requesting a value lower than the remaining duration.
The number of times a workspace on a particular data store can be extended is also limited, to two times on our cluster.
Workspaces are extended by running:
ws_extend -F DATA_STORE WORKSPACE_NAME DURATION
Don’t forget to specify the data store with -F <data-store>
.
For example, to extend the workspace allocated in the previous example to 20 days, run:
~ $ ws_extend -F ceph-ssd MayData 20
Info: extending workspace.
Info: reused mail address example@example.com
/mnt/ceph-ssd/workspaces/ws/nhr_internal_ws_test/u17588-MayData
remaining extensions : 1
remaining time in days: 20
Releasing
A workspace can be released before its expiration time by running:
ws_release -F DATA_STORE [OPTIONS] WORKSPACE_NAME
The most important option here is --delete-data
which causes the workspace’s data to be deleted immediately (remember, the data stores for workspaces have NEITHER backups NOR snapshots, so the data is lost forever).
Otherwise, the workspace will be set aside and remain restorable for the duration of the grace period of the respective data store.
Workspaces released without the --delete-data
option still count against your project’s quota until the grace period is over and they are automatically cleaned up.
Restoring
A released or expired workspace can be restored within the grace period using the ws_restore
command.
Use ws_restore -l
to list your restorable workspaces and to get their full IDs.
If the previously created example workspace was released, you would get:
~ $ ws_restore -l
ceph-ssd:
u17588-MayData-1749129222
unavailable since Thu Jun 5 15:13:42 2025
lustre-rzg:
DONT_USE:
Note that the full ID of a restorable workspace includes your username, the workspace name, and the unix timestamp from when it was released. In order to restore a workspace, you must first have another workspace available on the same data store to restore it into. Then, you would call the command like this:
ws_restore -F DATA_STORE WORKSPACE_ID_TO_RESTORE DESTINATION_WORKSPACE
and it will ask you to type back a set of randomly generated characters before restoring (restoration is interactive and is not meant to be scripted).
The workspace being restored is placed as a subdirectory in the destination workspace with its ID.
Using the previous example, one could create a new workspace MayDataRestored
and restore the workspace to it via:
~ $ WS_DIR=$(ws_allocate -F ceph-ssd MayDataRestored 30)
Info: creating workspace.
remaining extensions : 2
remaining time in days: 30
~ $ ws_restore -F ceph-ssd u17588-MayData-1749129222 MayDataRestored
to verify that you are human, please type 'tafutewisu': tafutewisu
you are human
Info: restore successful, database entry removed.
~ $ ls $WS_DIR
u17588-MayData-1749129222
Managing Links
Keeping track of the paths to each workspace can be difficult sometimes.
You can use the ws_register DIR
command to setup symbolic links (symlinks) to all of your workspaces in the directory DIR
.
After doing that, each of your workspaces has a symlink <dir>/<datastore>/<username>-<workspacename>
.
~ $ mkdir ws-links
~ $ ws_register ws-links
keeping link ws-links/ceph-ssd/u17588-MayDataRestored
~ $ ls -lh ws-links/
total 0
drwxr-x--- 2 u17588 GWDG 4.0K Jun 5 15:47 DONT_USE
drwxr-x--- 2 u17588 GWDG 4.0K Jun 5 15:47 ceph-ssd
drwxr-x--- 2 u17588 GWDG 4.0K Jun 5 15:47 lustre-rzg
~ $ ls -lh ws-links/*
ws-links/DONT_USE:
total 0
ws-links/ceph-ssd:
total 0
lrwxrwxrwx 1 u17588 GWDG 71 Jun 5 15:47 u17588-MayDataRestored -> /mnt/ceph-ssd/workspaces/ws/nhr_internal_ws_test/u17588-MayDataRestored
ws-links/lustre-rzg:
total 0