Special Filesystems

NHR provides tailored WORK file systems for improved IO throughput of IO intense job workloads.

Default Lustre (WORK)

WORK is the default shared file system for all jobs and can be accessed using the $WORK variable. WORK is accessible for all users and consists of 8 Metadata Targets (MDT’s) with NVMe SSDs and 28 Object Storage Targets (OST’s) on Lise and 100 OST’s on Emmy handling the data. Both using classical hard drives.

Access: $WORK
Size: 8 PiB quoted

Special File System Types

Lustre with striping (WORK)

Some workloads will benefit of striping. Files will be split transparently between a number of OSTs.

Especially large shared file IO patterns will benefit from striping. Up to 28 OSTs on Lise and up to 100 OST’s on Emmy can be used, recommended are up to 8 OSTs for Lise and 32 OSTs on Emmy. We have preconfigured a progressive file layout (PFL), which sets an automatic striping based on the file size.

Access: create a new directory in $WORK and set lfs setstripe -c <stripsize> <dir>
Size: 8 PiB like WORK

Local SSDs

Some Compute Nodes are installed with local SSD storage up to 2 TB on Lise and 480 GB or 1TB (depending on the node) on Emmy.

Info

Data on local SSDs can not be shared across nodes and will be deleted after the job is finished.

For unshared local IO this is the best performing file system to use.

Lise: SSDLise: CASEmmy: SSD
Accessvia partition: standard96:ssd
using $LOCAL_TMPDIR
via partition: large96 and huge96
using $LOCAL_TMPDIR
via partition: medium40, large40, standard96:ssd, large96, huge96
using $LOCAL_TMPDIR
Type and sizeIntel NVMe SSD DC P4511 (2 TB)Intel NVMe SSD DC P4511 (2 TB) using
Intel Optane SSD DC P4801X (200 GB)
as write-trough cache
Intel S-ATA SSD DC S4500 (480 GB)
Intel NVMe SSD DC P4511 (1TB)

FastIO

WORK is extended with 4 (Lise)/8 (Emmy) additional OST’s using NVMe SSDs to accelerate heavy (random) IO-demands. To accelerate specific IO-demands further striping for up to 4/8 OSTs is available.

Access:

Lise: create a new directory in $WORK and set lfs setstripe -p flash <dir>

Emmy: Using either $SHARED_SSD_TMPDIR for a job specific folder (like $LOCAL_TMPDIR) oder set the storage pool ddn_ssd for a new directory like for Lise (there the SSD pool is called flash).

Size:

Lise: 55 TiB - quoted

Emmy: 120 TiB - quoted

Finding the right File System

If your jobs have a significant IO part we recommend asking your consultant via support@hlrn.de to recommend the right file system for you.

Local IO

If you have a significant amount of node-local IO which is not needed to be accessed after job end and will be smaller than 2 TB on Lise and 400 GB/1 TB (depending on the node) on Emmy we recommend using $LOCAL_TMPDIR. Depending on your IO pattern this may accelerate IO to up to 100%.

Global IO

Global IO is defined as shared IO which will be able to be accessed from multiple nodes at the same time and will be persistent after job end.

Especially random IO will be accelerated up to 200% using FastIO on Lise.

Performance Comparison of the different File Systems and SSDs for Emmy with IO500

Please remember that we are comparing here a single SSD for the node local SSDs, 43 SSDs for Lustre SSD and 1000 HDDs for Lustre HDD using 32 IO processes per node. For the Lustre filesystems 64 nodes were used to achieve near maximum performance for the Lustre HDD pool. For the Home filesystem with its 120 HDDs only 16 nodes with 10 processes per node were used, as more nodes or processes overloads this small filesystem, resulting in even lower performance.

A typical user job will see lower performance values as there are usually less IO processes. The numbers for the global filesystems indicate the aggregate performance that is distributed across all users.

IMELustre SSDLustre HDDHomemedium40 SSDstandard96 SSD
concurrent1,09kIOPS0,67kIOPS0,21kIOPS0,28kIOPS0,87kIOPS
concurrent ior-easy-write47,37GiB/s28,01GiB/s42,52GiB/s0,26GiB/s0,78GiB/s
concurrent ior-rnd1MB-read27,34GiB/s5,32GiB/s0,85GiB/s0,10GiB/s0,47GiB/s
concurrent mdworkbench-bench15,38kIOPS18,37kIOPS3,37kIOPS1,22kIOPS7,34kIOPS
find3,58kIOPS710,20kIOPS1049,80kIOPS23,60kIOPS1081,17kIOPS2330,33kIOPS
find-easy58,87kIOPS25881,74kIOPS23005,48kIOPS24,28kIOPS7077,75kIOPS21241,00kIOPS
find-hard0,40kIOPS1051,23kIOPS705,04kIOPS820,49kIOPS265,86kIOPS402,25kIOPS
ior-easy-read157,00GiB/s38,17GiB/s26,08GiB/s9,48GiB/s0,33GiB/s1,84GiB/s
ior-easy-write86,95GiB/s23,58GiB/s53,36GiB/s5,78GiB/s0,33GiB/s0,92GiB/s
ior-hard-read45,94GiB/s25,95GiB/s4,60GiB/s0,96GiB/s0,34GiB/s1,42GiB/s
ior-hard-write62,91GiB/s0,70GiB/s0,67GiB/s0,10GiB/s0,27GiB/s0,69GiB/s
ior-rnd1MB-read102,48GiB/s13,19GiB/s10,41GiB/s0,33GiB/s1,61GiB/s
ior-rnd1MB-write84,88GiB/s7,21GiB/s4,95GiB/s0,32GiB/s0,80GiB/s
ior-rnd4K-read2,18GiB/s0,03GiB/s0,21GiB/s0,30GiB/s0,95GiB/s
ior-rnd4K-write11,84GiB/s0,03GiB/s0,06GiB/s0,07GiB/s0,12GiB/s
mdtest-easy-delete41,66kIOPS35,47kIOPS38,09kIOPS22,39kIOPS100,81kIOPS114,07kIOPS
mdtest-easy-stat99,80kIOPS108,61kIOPS111,29kIOPS59,12kIOPS282,28kIOPS434,48kIOPS
mdtest-easy-write28,77kIOPS60,82kIOPS33,00kIOPS25,43kIOPS86,55kIOPS72,95kIOPS
mdtest-hard-delete2,84kIOPS32,77kIOPS30,62kIOPS1,17kIOPS22,99kIOPS25,52kIOPS
mdtest-hard-read49,50kIOPS53,12kIOPS52,02kIOPS6,43kIOPS38,49kIOPS62,14kIOPS
mdtest-hard-stat39,18kIOPS97,22kIOPS103,15kIOPS5,46kIOPS69,49kIOPS79,14kIOPS
mdtest-hard-write4,67kIOPS43,99kIOPS31,16kIOPS1,56kIOPS5,39kIOPS15,39kIOPS
mdworkbench-bench57,55kIOPS92,10kIOPS85,00kIOPS21,27kIOPS5,82kIOPS8,92kIOPS
mdworkbench-create24,53kIOPS71,25kIOPS37,51kIOPS15,96kIOPS116,62kIOPS83,04kIOPS
mdworkbench-delete50,09kIOPS119,16kIOPS61,66kIOPS16,36kIOPS18,70kIOPS44,20kIOPS
Bandwidth Score79,26GiB/s11,30GiB/s8,08GiB/s1,52GiB/s0,31GiB/s1,13GiB/s
IOPS Score18,00kIOPS77,03kIOPS72,36kIOPS9,19kIOPS73,83kIOPS106,28kIOPS
TOTAL Score37,7729,5024,193,744,8210,98
Bandwidth ScoreX41,53GiB/s2,54GiB/s2,72GiB/s0,26GiB/s0,84GiB/s
IOPS ScoreX15,78kIOPS168,48kIOPS152,49kIOPS99,69kIOPS155,02kIOPS
TOTAL ScoreX8,936,594,411,944,82