Data Stores
General Tips
Use the right data store for the right job. Each one has a different purpose with different engineering tradeoffs (e.g. performance vs. robustness), capacities, etc. Using a SCRATCH/WORK data store or temporary storage as a staging area to do a lot of IO (copy data into it, do operations, copy/move results to a different data store) can often greatly improve performance.
It is important to remember that the data stores are shared with other users and bad IO patterns by a single user can hurt the performance for everyone. A general recommendation for distributed network filesystems is to keep the number of file metadata operations (opening, closing, stat-ing, truncating, etc.) and checks for file existence or changes as low as possible. These operations often become a bottleneck for the IO of your job, and if bad enough can reduce the performance for other users. For example, if jobs request hundreds of thousands metadata operations like open, close, and stat per job; this can cause a “slow” filesystem (unresponsiveness) for everyone even when the metadata is stored on SSDs. See the Optimizing Performance page for more information on how to get good performance on the data stores.
Data Lifetime after Project/User Expiration
In general, we store all data for an extra year after the end of a project or user account. If not extended, the standard term of a project is 1 year. The standard term for a user account is the lifetime of the project it is a member in (lifetime of the last project for legacy NHR/HLRN accounts). Note that migrating a legacy NHR/HLRN project removes all legacy NHR/HLRN users from it (see the NHR/HLRN Project Migration for more information).
Project Data Stores
Every project gets one or more data stores to place data depending on the kind of project. In some cases, there will be more than one directory in the data store; but all share the same quota (see the Quota page for more information).
All projects in the HPC Project Portal have a Project Map with convenient symlinks to all the project’s data stores.
The project-specific usernames of these projects have a symlink ~/.project
to this directory.
See the Project Map and Project Management pages for more information.
Projects get the categories of data stores in the table below, which are then described further in the subsections for each category.
Data Store | SCC | NHR | KISSKI | REACT |
---|---|---|---|---|
Project | yes | yes | yes | |
SCRATCH/WORK | yes | yes | ||
ARCHIVE/PERM | yes | |||
Requestable Storage | special request |
User Data Stores
Every user gets a HOME directory and potentially additional data stores to place configuration, data, etc. depending on the kind of user and the project they are a member of. Users get the categories of data stores in the table below, which are then described further in the subsections for each category.
Data Store | SCC | NHR | KISSKI | REACT |
---|---|---|---|---|
HOME | yes | yes | yes | yes |
SCRATCH/WORK | yes | yes | ||
ARCHIVE/PERM | yes | legacy only | ||
Requestable Storage | special request | special request |
Data Store Categories
Each category is discussed in its own page with links listed below.