SCRATCH/WORK
SCRATCH/WORK data stores are meant for active data and are configured for high performance at the expense of robustness (no backups). The characteristics of the SCRATCH/WORK data stores are
- Optimized for good performance by the sub-clusters in the same computing center
- Optimized for high input/output bandwidth by many nodes and jobs at the same time
- Optimized for a moderate number of files
- Meant for active data (heavily used data with a short lifetime)
- Has a quota
- Has NO backups
The SCRATCH filesystems have NO BACKUPS. Their performance comes at the price of robustness, meaning they are quite fragile. This means there is a non-negligible risk of data on them being completely lost if more than one component in the underlying storage fail at the same time.
There is data store for the SCC and three for NHR which are shown in the table below and detailed by Project/User kind in separate subsections.
Project/User Kind | Name | Media | Capacity | Filesystem |
---|---|---|---|---|
SCC | SCRATCH SCC | HDD with metadata on SSD | 2.1 PiB | BeeGFS |
NHR | SCRATCH MDC (SSD) | SSD | 110 TiB | Lustre |
SCRATCH MDC (HDD) (formerly “SCRATCH Emmy”) | HDD with metadata on SSD | 8.4 PiB | Lustre | |
SCRATCH RZG (formerly “SCRATCH Grete”) | SSD | 509 TiB | Lustre |
SCC
Projects get a SCRATCH SCC directory at /scratch/projects/PROJECT
, which has a Project Map symlink with the name dir.scratch-scc
.
Users get a SCRATCH SCC directory at /scratch/users/USER
.
NHR
Each project gets 0-2 directories in each SCRATCH/WORK data store which are listed in the table below. New projects in the HPC Project Portal get the directories marked “new”. Legacy NHR/HLRN projects started before 2024/Q2 get the directories marked “legacy”. Legacy NHR/HLRN projects that have been migrated to the HPC Project Portal keep the directories marked “legacy” and get the directories marked “new” (they get both). See NHR/HLRN Project Migration for more information on migration.
Project Data Store | Pathes | Project Map symlink |
---|---|---|
SCRATCH MDC (SSD) | /mnt/lustre-emmy-ssd/projects/PROJECT (new) | dir.lustre-emmy-ssd (new) |
SCRATCH MDC (HDD) | /mnt/lustre-emmy-hdd/projects/PROJECT (new)/scratch-emmy/projects/PROJECT (legacy) | dir.lustre-emmy-hdd (new)dir.scratch-emmy (legacy) |
SCRATCH RZG | /mnt/lustre-grete/projects/PROJECT (new)/scratch-grete/projects/PROJECT (legacy) | dir.lustre-grete (new)dir.scratch-grete (legacy) |
Users get two directories in each SCRATCH/WORK data store, except for legacy NHR/HLRN users which do not get them in the SCRATCH MDC (SSD) data store.
They take the form SCRATCH/SUBDIR/USER
, with SCRATCH/usr/USER
being for the user’s files and SCRATCH/tmp/USER
for temporary files (see Temporary Storage for more information).
The directories in each data store are listed in the table below.
Members of projects in the HPC Project Portal get the directories marked “new”.
Legacy NHR/HLRN users get the directories marked “legacy”.
User Data Store | Path |
---|---|
SCRATCH MDC (SSD) | /mnt/lustre-emmy-ssd/SUBDIR/USER (new) |
SCRATCH MDC (HDD) | /mnt/lustre-emmy-hdd/SUBDIR/USER (new)/scratch-emmy/SUBDIR/USER (legacy) |
SCRATCH RZG | /mnt/lustre-grete/SUBDIR/USER (new)/scratch-grete/SUBDIR/USER (legacy) |
One of the most important things to keep in mind is that the NHR cluster itself is split between two computing centers. While they are physically close to each other, the inter-center latency is higher (speed of light issues plus more network hops) and the inter-center bandwidth lower (less fibers) than intra-center connections. Each computing center has its own SCRATCH/WORK data store/s to provide maximum local performance in the computing center. The best performance is gotten using the SCRATCH/WORK data store/s in the same computing center, particularly for IOPS.
The two centers are the MDC (Modular Data Center) and the RZG (Rechenzentrum Göttingen).
The name of the computing center is in the name of the data store (e.g. “SCRATCH RZG” is at RZG).
The sites for each sub-cluster are listed in the table below along which data store the symlink /scratch
points to.
Sub-cluster | Site (Computing Center) | Target of /scratch symlink | Symlink Target |
---|---|---|---|
Emmy Phase 1 | MDC | SCRATCH MDC (HDD) | /scratch-emmy |
Emmy Phase 2 | MDC | SCRATCH MDC (HDD) | /scratch-emmy |
Emmy Phase 3 | RZG | SCRATCH RZG | /scratch-grete |
Grete Phase 1 | RZG | SCRATCH RZG | /scratch-grete |
Grete Phase 2 | RZG | SCRATCH RZG | /scratch-grete |
Grete Phase 3 | RZG | SCRATCH RZG | /scratch-grete |
SCRATCH MDC and SCRATCH RZG used to be known as “SCRATCH Emmy” and “SCRATCH Grete” respectively because it used to be that all of Emmy was in the MDC and all of Grete was in the RZG, which is no longer the case. This historical legacy can still be seen in the names of their mount points.
Which systems can be accessed and their relative performance of the link between each group of nodes and each data store.
The best performance can be reached with sequential IO of large files that is aligned to the fullstripe size of the underlying RAID6 (1 MiB), especially on SCRATCH MDC (HDD) which HDDs for data and SSDs for metadata.
If you are accessing a large file (1+ GiB) from multiple nodes in parallel, please consider setting the striping of the file with the Lustre command lfs setstripe
with a sensible stripe-count
(recommend up to 32) and a stripe-size
which is a multiple of the RAID6 fullstripe size (1 MiB) and matches the IO sizes of your job.
This can be done to a specific file or for a whole directory.
But changes apply only for new files, so applying a new striping to an existing file requires a file copy.
An example of setting the stripe size and count is given below (run man lfs-setstripe
for more information about the command).
lfs setstripe --stripe-size 1M --stripe-count 16 PATH