SCRATCH/WORK

SCRATCH/WORK data stores are meant for active data and are configured for high performance at the expense of robustness (no backups). The characteristics of the SCRATCH/WORK data stores are

  • Optimized for good performance by the sub-clusters in the same computing center
  • Optimized for high input/output bandwidth by many nodes and jobs at the same time
  • Optimized for a moderate number of files
  • Meant for active data (heavily used data with a short lifetime)
  • Has a quota
  • Has NO backups
Warning

The SCRATCH filesystems have NO BACKUPS. Their performance comes at the price of robustness, meaning they are quite fragile. This means there is a non-negligible risk of data on them being completely lost if more than one component in the underlying storage fail at the same time.

There is data store for the SCC and three for NHR which are shown in the table below and detailed by Project/User kind in separate subsections.

Project/User KindNameMediaCapacityFilesystem
SCCSCRATCH SCCHDD with metadata on SSD2.1 PiBBeeGFS
NHRSCRATCH MDC (SSD)SSD110 TiBLustre
SCRATCH MDC (HDD)
(formerly “SCRATCH Emmy”)
HDD with metadata on SSD8.4 PiBLustre
SCRATCH RZG
(formerly “SCRATCH Grete”)
SSD110 TiBLustre

SCC

Projects get a SCRATCH SCC directory at /scratch/projects/PROJECT, which has a Project Map symlink with the name dir.scratch-scc. Users get a SCRATCH SCC directory at /scratch/users/USER.

NHR

Each project gets 0-2 directories in each SCRATCH/WORK data store which are listed in the table below. New projects in the HPC Project Portal get the directories marked “new”. Legacy NHR/HLRN projects started before 2024/Q2 get the directories marked “legacy”. Legacy NHR/HLRN projects that have been migrated to the HPC Project Portal keep the directories marked “legacy” and get the directories marked “new” (they get both). See NHR/HLRN Project Migration for more information on migration.

Project Data StorePathesProject Map symlink
SCRATCH MDC (SSD)/mnt/lustre-emmy-ssd/projects/PROJECT (new)dir.lustre-emmy-ssd (new)
SCRATCH MDC (HDD)/mnt/lustre-emmy-hdd/projects/PROJECT (new)
/scratch-emmy/projects/PROJECT (legacy)
dir.lustre-emmy-hdd (new)
dir.scratch-emmy (legacy)
SCRATCH RZG/mnt/lustre-grete/projects/PROJECT (new)
/scratch-grete/projects/PROJECT (legacy)
dir.lustre-grete (new)
dir.scratch-grete (legacy)

Users get two directories in each SCRATCH/WORK data store, except for legacy NHR/HLRN users which do not get them in the SCRATCH MDC (SSD) data store. They take the form SCRATCH/SUBDIR/USER, with SCRATCH/usr/USER being for the user’s files and SCRATCH/tmp/USER for temporary files (see Temporary Storage for more information). The directories in each data store are listed in the table below. Members of projects in the HPC Project Portal get the directories marked “new”. Legacy NHR/HLRN users get the directories marked “legacy”.

User Data StorePath
SCRATCH MDC (SSD)/mnt/lustre-emmy-ssd/SUBDIR/USER (new)
SCRATCH MDC (HDD)/mnt/lustre-emmy-hdd/SUBDIR/USER (new)
/scratch-emmy/SUBDIR/USER (legacy)
SCRATCH RZG/mnt/lustre-grete/SUBDIR/USER (new)
/scratch-grete/SUBDIR/USER (legacy)

One of the most important things to keep in mind is that the NHR cluster itself is split between two computing centers. While they are physically close to each other, the inter-center latency is higher (speed of light issues plus more network hops) and the inter-center bandwidth lower (less fibers) than intra-center connections. Each computing center has its own SCRATCH/WORK data store/s to provide maximum local performance in the computing center. The best performance is gotten using the SCRATCH/WORK data store/s in the same computing center, particularly for IOPS.

The two centers are the MDC (Modular Data Center) and the RZG (Rechenzentrum Göttingen). The name of the computing center is in the name of the data store (e.g. “SCRATCH RZG” is at RZG). The sites for each sub-cluster are listed in the table below along which site SCRATCH/WORK each can access and where data store the symlink /scratch points to.

Sub-clusterSite (Computing Center)Can AccessTarget of /scratch symlinkSymlink Target
Emmy Phase 1MDCjust MDCSCRATCH MDC (HDD)/scratch-emmy
Emmy Phase 2MDCjust MDCSCRATCH MDC (HDD)/scratch-emmy
Emmy Phase 3RZGbothSCRATCH RZG/scratch-grete
Grete Phase 1RZGbothSCRATCH RZG/scratch-grete
Grete Phase 2RZGbothSCRATCH RZG/scratch-grete
Grete Phase 3RZGbothSCRATCH RZG/scratch-grete
Info

SCRATCH MDC and SCRATCH RZG used to be known as “SCRATCH Emmy” and “SCRATCH Grete” respectively because it used to be that all of Emmy was in the MDC and all of Grete was in the RZG, which is no longer the case. This historical legacy can still be seen in the names of their mount points.

Which systems can be accessed and their relative performance of the link between each group of nodes and each data store.

Diagram of the connections between each NHR node group and the storage systems. All frontend nodes (glogin[1-13]) have a very slow connection to PERM. All nodes have a slow-medium connection to GPFS (HOME/Project), VAST (HOME/Project/extra), Software, and the Project Map. The Emmy Phase 1 and 2 nodes (glogin[1-8] and g[cfs]nXXXX) have a very fast connection to the SCRATCH MDC (formerly known as SCRATCH Emmy). The Grete and Emmy Phase 3 nodes (glogin[9-13], ggpuXX, ggpuXXX, cXXXX, and cmXXXX) have a very fast connection to the SCRATCH RZG (formerly known as SCRATCH Grete) and a medium connection to the SCRATCH MDC.

NHR Storage Systems

Connections between each NHR node group and the different storage systems, with the arrow style indicating the performance (see key at bottom right). Each node group has the node names in bold.

The best performance can be reached with sequential IO of large files that is aligned to the fullstripe size of the underlying RAID6 (1 MiB), especially on SCRATCH MDC (HDD) which HDDs for data and SSDs for metadata. If you are accessing a large file (1+ GiB) from multiple nodes in parallel, please consider setting the striping of the file with the Lustre command lfs setstripe with a sensible stripe-count (recommend up to 32) and a stripe-size which is a multiple of the RAID6 fullstripe size (1 MiB) and matches the IO sizes of your job. This can be done to a specific file or for a whole directory. But changes apply only for new files, so applying a new striping to an existing file requires a file copy. An example of setting the stripe size and count is given below (run man lfs-setstripe for more information about the command).

lfs setstripe --stripe-size 1M --stripe-count 16 PATH