GPU Partitions

Nodes in these partitons provide GPUs for parallelizing calculations. See GPU Usage for more details on how to use GPU partitions, particularly those where GPUs are split into MiG slices.

Islands

The islands with a brief overview of their hardware are listed below.

IslandGPUsCPUsFabric
Grete Phase 1Nvidia V100Intel SkylakeInfiniband (100 Gb/s)
Grete Phase 2Nvidia A100AMD Zen 3
AMD Zen 2
Infiniband (2 × 200 Gb/s)
Grete Phase 3Nvidia H100Intel Sapphire RapidsInfiniband (2 × 200 Gb/s)
SCC LegacyNvidia V100
Nvidia Quadro RTX 5000
Intel Cascade LakeOmni-Path (2 × 100 Gb/s)
Omni-Path (100 Gb/s)
Info

See Logging In for the best login nodes for each island (other login nodes will often work, but may have access to different storage systems and their hardware will be less of a match).

See Cluster Storage Map for the storage systems accessible from each island and their relative performance characteristics.

See Software Stacks for the available and default software stacks for each island.

Legacy SCC users only have access to the SCC Legacy island unless they are also CIDBN, FG, or SOE users in which case they also have access to those islands.

Partitions

The partitions are listed in the table below by which users can use them, without hardware details. See Types of User Accounts to determine which kind of user you are. Note that some users are members of multiple classifications (e.g. all CIDBN/FG/SOE users are also SCC users).

UsersIslandPartitionOSSharedMax. walltimeMax. nodes
per job
Core-hours
per GPU*
NHRGrete P3grete-h100Rocky 848 hr16262.5
grete-h100:sharedRocky 8yes48 hr16262.5
Grete P2greteRocky 848 hr16150
grete:sharedRocky 8yes48 hr1150
grete:preemptibleRocky 8yes48 hr147 per slice
NHR,
KISSKI,
REACT
Grete P2grete:interactiveRocky 8yes48 hr147 per slice
Grete P1jupyter:gpu
(jupyter)
Rocky 8yes24 hr147
KISSKIGrete P3kisski-h100Rocky 848 hr16262.5
Grete P2kisskiRocky 848 hr16150
REACTGrete P2reactRocky 8yes48 hr16150
SCCGrete P2 & P3scc-gpuRocky 8yes48 hrmax24
SCC Legacyjupyter
(jupyter)
Rocky 8yes24 h1
Info

JupyterHub sessions run on the partitions marked with jupyter in the table above. These partitions are oversubscribed (multiple jobs share resources). Additionally, the jupyter partition is composed of both GPU nodes and CPU nodes (CPU nodes are available to more than just SCC users).

The hardware for the different nodes in each partition are listed in the table below. Note that some partitions are heterogeneous, having nodes with different hardware. Additionally, many nodes are in more than one partition.

PartitionNodesGPU + slicesVRAM eachCPURAM per nodeCores
grete354 × Nvidia A10040 GiB2 × Zen3 EPYC 7513512 GiB64
144 × Nvidia A10080 GiB2 × Zen3 EPYC 7513512 GiB64
24 × Nvidia A10080 GiB2 × Zen2 EPYC 75131 TiB64
grete:shared354 × Nvidia A10040 GiB2 × Zen3 EPYC 7513512 GiB64
184 × Nvidia A10080 GiB2 × Zen3 EPYC 7513512 GiB64
24 × Nvidia A10080 GiB2 × Zen3 EPYC 75131 TiB64
28 × Nvidia A10080 GiB2 × Zen2 EPYC 76621 TiB128
grete:interactive34 × Nvidia A100
(1g.10gb, 1g.20gb, 2g.10gb)
10/20 GiB2 × Zen3 EPYC 7513512 GiB64
grete:preemptible34 × Nvidia A100
(1g.10gb, 1g.20gb, 2g.10gb)
10/20 GiB2 × Zen3 EPYC 7513512 GiB64
grete-h10054 × Nvidia H10094 GiB2 × Xeon Platinum 84681 TiB96
grete-h100:shared54 × Nvidia H10094 GiB2 × Xeon Platinum 84681 TiB96
kisski344 × Nvidia A10080 GiB2 × Zen3 EPYC 7513512 GiB64
kisski-h100154 × Nvidia H10094 GiB2 × Xeon Platinum 84681 TiB96
react224 x Nvidia A10080 GiB2 × Zen3 EPYC 7513512 GiB64
scc-gpu14 × Nvidia H10094 GiB2 × Xeon Platinum 84681 TiB96
234 × Nvidia A10080 GiB2 × Zen3 EPYC 7513512 GiB64
64 × Nvidia A10080 GiB2 × Zen3 EPYC 75131 TiB64
24 × Nvidia A10040 GiB2 × Zen3 EPYC 7513512 GiB64
jupyter:gpu34 × Nvidia V10032 GiB2 × Skylake Xeon Gold 6148768 GiB40
jupyter28 × Nvidia V10032 GiB2 × Cascadelake 6252384 GiB48
54 × Nvidia RTX50016 GiB2 × Cascadelake 6242192 GiB32
Info

The actually available memory per node is always less than installed in hardware. Some is reserved by the BIOS, and on top of that, Slurm reserves around 20 GiB for the operating system and background services. To be on the safe side, if you don’t reserve a full node, always deduct ~30 GiB, divide by the number of GPUs and round down to get a number per GPU you can safely request with --mem when submitting jobs.

How to pick the right partition for your job

If you have access to multiple partitions, it can be important to choose one that fits your use case. As a rule of thumb, if you need (or can scale your job to run on) mutliple GPUs, and you are not using a shared partition, make sure to always use a multiple of 4 nodes, as you will be billed for the whole node regardless of a lower number of GPUs you requested via -G. For jobs that need less than 4 GPUs, use a shared partition and make sure to not request more than your fair share of RAM (see the note above). If you need to get your to start quickly, i.e. for testing if your scripts work or interactive tweaking of hyperparameters, use an interactive partition (most users have access to grete:interactive).

The CPUs and GPUs

For partitions that have heterogeneous hardware, you can give Slurm options to request the particular hardware you want. For CPUs, you can specify the kind of CPU you want by passing a -C/--constraint option to slurm to get the CPUs you want. For GPUs, you can specify the name of the GPU when you pass the -G/--gpus option (or --gpus-per-task) and larger VRAM using a -C/--constraint option. See Slurm and GPU Usage for more information.

The GPUs, the options to request them, and some of their properties are given in the table below.

GPUVRAMFP32 coresTensor cores-G option-C optionCompute Cap.
Nvidia H10094 GiB8448528H10096gb90
Nvidia A10040 GiB6912432A10080
80 GiB6912432A10080gb80
1g.10gb slice of Nvidia A10010 GiB864541g.10gb80
1g.20gb slice of Nvidia A10020 GiB864541g.20gb80
2g.10gb slice of Nvidia A10010 GiB17281082g.10gb80
Nvidia V10032 GiB5120640V10070
Nvidia Quadro RTX 500016 GiB3072384RTX500075

The CPUs, the options to request them, and some of their properties are give in the table below.

CPUCores-C optionArchitecture
AMD Zen3 EPYC 751332zen3 or milanzen3
AMD Zen2 EPYC 766264zen2 or romezen2
Intel Sapphire Rapids Xeon Platinum 846848sapphirerapidssapphirerapids
Intel Cascadelake Xeon Gold 625224cascadelakecascadelake
Intel Cascadelake Xeon Gold 624216cascadelakecascadelake
Intel Skylake Xeon Gold 614820skylakeskylake_avx512

Hardware Totals

The total nodes, cores, GPUs, RAM, and VRAM for each island are given in the table below.

IslandNodesGPUsVRAM (TiB)CoresRAM (TiB)
Grete Phase 13120.3751202.1
Grete Phase 210342027.16,72047.6
Grete Phase 321847.92,01621
SCC Legacy7360.811761.7
TOTAL13455236.29,03272.4