GPU Partitions

Nodes in these partitons provide GPUs for parallelizing calculations. See GPU Usage for more details on how to use GPU partitions, particularly those where GPUs are split into MiG slices.

Partitions

The partitions are listed in the table below by which users can use them, without hardware details. Note that some users are members of multiple classifications (e.g. all CIDBN users are also SCC users).

UsersPartitionOSSharedMax. walltimeMax. nodes
per job
Core-hours
per GPU*
NHRgreteRocky 848 hr16150
grete:sharedRocky 8yes48 hr1150
grete:preemptibleRocky 8yes48 hr147 per slice
grete-h100Rocky 848 hr16262.5
grete-h100:sharedRocky 8yes48 hr16262.5
NHR,
KISSKI,
REACT
grete:interactiveRocky 8yes48 hr147 per slice
KISSKIkisskiRocky 848 hr16150
KISSKIkisski-h100Rocky 848 hr16262.5
REACT,
SCC
reactRocky 8yes48 hr16150
SCCscc-gpuRocky 8yes48 hrmax24
visRocky 8yes48 hrmax150
ALLjupyter:gpu
(jupyter)
Rocky 8yes24 hr147
Note

The partitions you are allowed to use depend on what kind of account you have. See the table at the bottom of this disambiguation page for more information.

Info

JupyterHub sessions run on the partitions marked with jupyter in the table above. These partitions are oversubscribed (multiple jobs share resources).

The hardware for the different nodes in each partition are listed in the table below. Note that some partitions are heterogeneous, having nodes with different hardware. Additionally, many nodes are in more than one partition.

PartitionNodesGPU + slicesVRAM eachCPURAM per node*Cores
grete354 × Nvidia A10040 GB2 × Zen3 EPYC 7513512 GiB64
144 × Nvidia A10080 GB2 × Zen3 EPYC 7513512 GiB64
24 × Nvidia A10080 GB2 × Zen2 EPYC 75131 TiB64
grete:shared354 × Nvidia A10040 GB2 × Zen3 EPYC 7513512 GiB64
184 × Nvidia A10080 GB2 × Zen3 EPYC 7513512 GiB64
24 × Nvidia A10080 GB2 × Zen3 EPYC 75131 TiB64
28 × Nvidia A10080 GB2 × Zen2 EPYC 76621 TiB128
grete:interactive34 × Nvidia A100
(2g.10gb and 3g.20gb)
10/20 GB2 × Zen3 EPYC 7513512 GiB64
grete:preemptible34 × Nvidia A100
(2g.10gb and 3g.20gb)
10/20 GB2 × Zen3 EPYC 7513512 GiB64
grete-h10054 × Nvidia H10094 GB2 × Xeon Platinum 84681 TiB96
grete-h100:shared54 × Nvidia H10094 GB2 × Xeon Platinum 84681 TiB96
kisski344 × Nvidia A10080 GB2 × Zen3 EPYC 7513512 GiB64
kisski-h100154 × Nvidia H10094 GB2 × Xeon Platinum 84681 TiB96
react224 x Nvidia A10080 GB2 × Zen3 EPYC 7513512 GiB64
scc-gpu234 × Nvidia A10080 GB2 × Zen3 EPYC 7513512 GiB64
64 × Nvidia A10080 GB2 × Zen3 EPYC 75131 TiB64
24 × Nvidia A10040 GB2 × Zen3 EPYC 7513512 GiB64
jupyter:gpu34 × Nvidia V10032 GB2 × Skylake Xeon Gold 6148768 GiB40
gpu-int24 × Nvidia GTX9804 GB2 × Broadwell E5-2650v4128 GiB24
vis34 × Nvidia GTX9804 GB2 × Broadwell E5-2650v4128 GiB24

*) The actually available memory per node is always less than installed in hardware. Some is reserved by the BIOS, and on top of that, Slurm reserves around 20 GiB for the operating system and background services. To be on the safe side, if you don’t reserve a full node, always deduct ~30 GiB, divide by the number of GPUs and round down to get a number per GPU you can safely request with --mem when submitting jobs.

How to pick the right partition for your job

If you have access to multiple partitions, it can be important to choose one that fits your use case. As a rule of thumb, if you need (or can scale your job to run on) mutliple GPUs, and you are not using a shared partition, make sure to always use a multiple of 4 nodes, as you will be billed for the whole node regardless of a lower number of GPUs you requested via -G. For jobs that need less than 4 GPUs, use a shared partition and make sure to not request more than your fair share of RAM (see the note above). If you need to get your to start quickly, i.e. for testing if your scripts work or interactive tweaking of hyperparameters, use an interactive partition (most users have access to grete:interactive).

The CPUs and GPUs

For partitions that have heterogeneous hardware, you can give Slurm options to request the particular hardware you want. For CPUs, you can specify the kind of CPU you want by passing a -C/--constraint option to slurm to get the CPUs you want. For GPUs, you can specify the name of the GPU when you pass the -G/--gpus option (or --gpus-per-task) and larger VRAM using a -C/--constraint option. See Slurm and GPU Usage for more information.

The GPUs, the options to request them, and some of their properties are given in the table below.

GPUVRAMFP32 coresTensor cores-G option-C optionCompute Cap.
Nvidia A10040 GB6912432A10080
80 GB6912432A10080gb80
Nvidia H10094 GB8448528H10096gb90
2g.10gb slice of Nvidia A10010 GB17281082g.10gb80
3g.20gb slice of Nvidia A10020 GB25921623g.20gb80
Nvidia V10032 GB5120640V10070
Nvidia Quadro RTX 500016 GB3072384RTX500075
Nvidia GeForce GTX 10808 GB2560GTX108061
Nvidia GeForce GTX 9804 GB2048GTX98052

The CPUs, the options to request them, and some of their properties are give in the table below.

CPUCores-C optionArchitecture
AMD Zen3 EPYC 751332zen3 or milanzen3
AMD Zen2 EPYC 766264zen2 or romezen2
Intel Sapphire Rapids Xeon Platinum 846848sapphirerapidssapphirerapids
Intel Cascadelake Xeon Gold 625224cascadelakecascadelake
Intel Cascadelake Xeon Gold 624216cascadelakecascadelake
Intel Skylake Xeon Gold 614820skylakeskylake_avx512
Intel Broadwell Xeon E5-2650 V412broadwellbroadwell

Hardware Totals

The total nodes, cores, GPUs, RAM, and VRAM for each cluster and sub-cluster are given in the table below.

ClusterSub-clusterNodesGPUsVRAM (TiB)CoresRAM (TiB)
NHRGrete Phase 13120.3751202.1
Grete Phase 210342027.16,72047.6
Grete Phase 316646.01,53615.7
TOTAL12249633.58,37665.4
SCCTOTAL321282.4204819.5