Storage Systems

In contrast to a personal computer, which often presents the user with a unified view of the file system, the different storage systems on a HPC cluster are exposed to the users. This allows users that are familiar with these systems to achieve the optimal performance and efficiency. You can check the cluster storage map for a visual overview.

Data Store	Data Sharing	Typical Limits	Purpose	Notes
$HOME	Private	60 GiB 7 M files	Config files, helper scripts, small software installations
Job Specific Storage	Private	-	High performance data access for individual job	Deleted after job finishes
Workspaces	Private or Project	10-40 TiB 2 M files	High performance data access for multiple jobs	Has expiration date, no backups
$PROJECT	Project	3 TiB 7 M files	Software installations, miscellaneous data	Replaces the purpose of $COLD for non-NHR users
$COLD	Project	12 TiB 1 M files	Data no longer actively used by compute jobs	NHR only
Tape Storage	Project	8 TiB 8000 files	Archived data	NHR only, not a 10 year long-term archive

Specific locations on the different storage systems are reserved for individual users or projects. These are called data stores. To see the available data stores you can use the show-quota command on one of our login nodes. It will also display the quota values that limit the amount of data you can store in particular locations.

By default, users only get one personal data store, their $HOME directory. It is very limited in size and only meant for configuration files, helper scripts and small software installations that do not take up much space.

Job-specific Directories

Each compute job gets a number of job-specific directories that allow for very high performant file operations during the job. If possible, you should use these directories. They get cleaned up after the job finishes, so results have to be copied to a different location at the end of your job.

Workspaces

Users can request additional personal data stores using the workspaces system. A workspace allows users for example to run a series of jobs that depend on each other and that need access to a shared filesystem for parallel I/O operations. After a job series has finished, the results should be moved to a more permanent location and the workspace should be released. Workspaces that are not released by the user will be cleaned up automatically when their expiration date has been reached.

Note

The file systems with the highest parallel performance (Lustre, BeeGFS) are typically only mounted on individual cluster islands. If you need speedy access to files from multiple cluster islands simultaneously, you can instead use a Ceph SSD workspace.

Project Specific Data Stores

Most of the more permanent data stores are project-specific. These are used in collaboration with other users of the same project.

You should coordinate with the other members of your project on how you use these storage locations. For example, you could create a sub-directory for every work-package, have directories for individual users, or all work in a single directory.

Ultimately all files in a project belong to the project’s PIs and their delegates, who can also request permission and ownership changes by our staff.

Note

These project-specific data stores are not available for NHR test accounts. Such users can instead request a workspace on the ceph-hdd storage system which has a longer max lifetime compared to the other workspace locations. Please apply for a full NHR project if you need a permanent location for larger amounts of data.

Data Store for Software Installation

The PROJECT data store, which can be located using the $PROJECT variable, can be used for larger software installations like larger conda environments.

Data Store for Cold Data

Our ceph-hdd storage system is perfectly suited for large amounts of data that are no longer actively used by compute jobs. For SCC projects this is again the $PROJECT data store, for NHR it is $COLD. It is not available for KISSKI.

Data Store for AI/Machine Learning

The VAST filesystem is designed to work well with AI workloads. The $PROJECT data stores for NHR and KISSKI projects are located here.

Other Options

If the above options are not suitable for your project and you need, for example, a permanent storage location on a particular file system (Lustre, BeeGFS, VAST), please contact our support with a detailed explanation of your use case.

Technical Documentation

We provide extensive technical documentation on our storage systems suitable for more experienced HPC users.