Data Retention Policy
Overview
User data stored on our HPC filesystems (see Storage Systems) cannot be retained indefinitely due to limited capacity and the need to periodically update/upgrade/replace storage systems. Consequently, data from expired projects and/or user accounts will eventually have to be deleted, after a grace period of varying length has passed.
Info
Note that the policy here only applies to data on HPC filesystems. It does not apply to the GWDG Unix HOME and GWDG Tape Archive (AHOME). They have their own separate data retention policies.
Active Projects and Usernames
The filesystem data of active projects and usernames is kept to the extent possible, with a few exceptions:
- Workspaces are temporary storage with each workspace having its own expiration date and a limited number of extensions. After expiration, workspace data can be restored for up to one month, after which it is permanently erased.
- Job Temporary Storage is deleted after the compute job has finished.
- When a storage system is retired, data that has not been migrated to another storage system is lost. In some cases, data will be migrated automatically. In other cases, it is the responsibility of users and project members to migrate their data. Storage system retirement and what will happen with the data is announced via email in advance. There will usually be multiple announcements, the first being sent as long in advance as feasible, with subsequent reminders as the deadline draws closer. Keep an eye on our mailing lists.
- When a storage system runs out of space, an announcement will be made telling everyone to reduce usage. As a last resort, in extreme situations, we may have to force the largest users to reduce consumption by moving files to another storage system, compressing data in-place, or deleting files that have not been accessed for a long time. This will never be done without advanced warning.
Expired Projects and Deactivated Usernames
Data is not deleted immediately after a project expires or a username is deactivated. After all, expired projects could be renewed or inactive usernames reactivated.
Existing workspaces from inactive projects are treated the same as those from active ones. They expire at their set date and follow the same grace period of one month after expiration until their data is erased. See Workspaces for more information.
Regular filesystem data is kept for at least one year after a project expires or a username is deactivated. At the start of each quarter, we will do a filesystem cleanup during which directories of projects and usernames expired for at least one year are moved to a cleanup directory, marking the data as eligible for deletion. Data in the cleanup directory of the Tape Archive is retained at best effort and is removed only when the archive runs low on space, starting with data that has been expired the longest. For other filesystems, the cleanup directory will be cleared at the start of the next quarter, but data may be removed earlier if necessary for smooth and performant operation of the filesystem.
This means project and user data remains accessible under the original location for at least four and up to five quarters after expiration, then resides in the cleanup directory for up to one additional quarter. Data in the cleanup directory is not directly accessible, but is possible to recover by contacting our support.
Note
In case of automatic data migration to a new storage system, data in the cleanup directory is not automatically migrated, while data of active and recently expired/deactivated projects and usernames is migrated.
Long-Term Data Archival
The HPC storage system cannot provide long-term archival, such as the 10 years required by many journals, and is common scientific best practice. While we retain data on the Tape Archive for as long as we can after a project expires, due to limited periodic funding (and uncertainty of continued funding for subsequent periods), we are not able to guarantee 5-year retention, let alone the often required 10 years.
Depending on your home institution, there are different options for long-term data archival.
Members of
- Georg-August-University of Göttingen
- Universitätsmedizin Göttingen (UMG)
- Max-Planck Gesellschaft (MPG)
- Deutsches Primatenzentrum (DPZ)
- Gesellschaft für wissenschaftliche Datenverarbeitung mbH Göttingen (GWDG)
have access to the GWDG Archive (AHOME) and GWDG Unix HOME and can transfer data there. This is possible even without an SCC project (e.g. NHR or KISSKI projects).
For members of other institutions, you will need to download your data to your home institution.