Data Stores

General Tips

Use the right data store for the right job. Each one has a different purpose with different engineering tradeoffs (e.g. performance vs. robustness), capacities, etc. Using a SCRATCH/WORK data store or temporary storage as a staging area to do a lot of IO (copy data into it, do operations, copy/move results to a different data store) can often greatly improve performance.

It is important to remember that the data stores are shared with other users and bad IO patterns by a single user can hurt the performance for everyone. A general recommendation for distributed network filesystems is to keep the number of file metadata operations (opening, closing, stat-ing, truncating, etc.) and checks for file existence or changes as low as possible. These operations often become a bottleneck for the IO of your job, and if bad enough can reduce the performance for other users. For example, if jobs request hundreds of thousands metadata operations like open, close, and stat per job; this can cause a “slow” filesystem (unresponsiveness) for everyone even when the metadata is stored on SSDs. See the Optimizing Performance page for more information on how to get good performance on the data stores.

Data Lifetime after Project/User Expiration

In general, we store all data for an extra year after the end of a project or user account. If not extended, the standard term of a project is 1 year. The standard term for a user account is the lifetime of the project it is a member in (lifetime of the last project for legacy NHR/HLRN accounts). Note that migrating a legacy NHR/HLRN project removes all legacy NHR/HLRN users from it (see the NHR/HLRN Project Migration for more information).

Project Data Stores

Every project gets one or more data stores to place data depending on the kind of project. In some cases, there will be more than one directory in the data store; but all share the same quota (see the Quota page for more information).

All projects in the HPC Project Portal have a Project Map with convenient symlinks to all the project’s data stores. The project-specific usernames of these projects have a symlink ~/.project to this directory. See the Project Map and Project Management pages for more information.

Projects get the categories of data stores in the table below, which are then described further in the subsections for each category.

Data StoreSCCNHRKISSKI
Projectyesyes
SCRATCH/WORKyesyes
ARCHIVE/PERMyes
Requestable Storagespecial requestspecial request

User Data Stores

Every user gets a HOME directory and potentially additional data stores to place configuration, data, etc. depending on the kind of user and the project they are a member of. Users get the categories of data stores in the table below, which are then described further in the subsections for each category.

Data StoreSCCNHRKISSKI
HOMEyesyesyes
SCRATCH/WORKyesyes
ARCHIVE/PERMyeslegacy only
Requestable Storagespecial requestspecial request

Data Store Categories

Each category is discussed in its own page with links listed below.