Data Migration Guide

While having a separate username for each project has some upsides such as separate data store quotas, never having to worry about submitting jobs with the wrong Slurm account, etc.; a major downside is that sometimes files must be copied or moved between usernames. Common scenarios are:

  • Copying configuration files in one’s HOME directory that took a lot of effor to create (e.g. .bashrc, .config/emacs/init.el, etc.)
  • Moving files from a legacy NHR/HLRN username to the new project-specific username after project migration (see NHR/HLRN Project Migration) since all legacy NHR/HLRN usernames are dropped from the project
  • Moving files from one’s username on an expired project to one’s username on a successor project

The most efficient way to move or copy the files is via ACLs (Access Control Lists). Let SRCUSERNAME and DESTUSERNAME be your usernames of where the files come from and who they are going to respectively, SRCDIR and DESTDIR be their data store directories (e.g. /scratch-emmy/usr/SRCUSERNAME and /mnt/lustre-emmy-hdd/usr/DESTUSERNAME), and DIR_TO_MIGRATE to be the name of the directory to be be migrated. You can copy data or move data. Copying can be done entirely on your own, but is very heavy on the IO for large data. Moving is more efficient as long as it is on the same filesystem (as expensive as copying data otherwise) but requires a support request for the last step.

Copy Data

Copying data is suitable when the data is at most medium size (both in space and in number of files/directories). But for large and very large sizes, you should instead move data if possible.

First, login to a login node with the source username (e.g. ssh SRCUSERNAME@glogin.hpc.gwdg.de) and do the following:

# Make it so the destination username can even reach the directory
setfacl -m u:DESTUSERNAME:x SRCDIR

# Make the whole directory readable for the destination username, and then make
# it possible for the destination username to enter directories (x permission)
# which has to be done with find
setfacl -R -m u:DESTUSERNAME:r SRCDIR/DIR_TO_MIGRATE
find $SRCDIR/DIR_TO_MIGRATE -type d -print0 \
    | xargs --null --no-run-if-empty setfacl -m u:DESTUSERNAME:rx

Then, login to a login node with the destination username (e.g. ssh DESTUSERNAME@glogin.hpc.gwdg.de) and do the following to copy the data.

cp -r SRCDIR/DIR_TO_MIGRATE DESTDIR/
Warning

For medium sized data (size or number of files/directories), the copy can take a very long time. For moderate data sizes, we recommend doing the copy in a terminal multiplexer session like tmux or screen (see the Terminal Multiplexer page for more information).

Then, back with the source username, remove the created ACLs with

setfacl -x u:DESTUSERNAME SRCDIR
setfacl -R -x u:DESTUSERNAME SRCDIR/DIR_TO_MIGRATE

Move Data

This is the best approach if the data is very large as it is quite cheap from an IO perspective as long as the source and destination are on the same filesystem. It also works when there is no intention of using the files/directories by the source username anymore. Unfortunately, this method does require starting a support request for the last step.

First, login to a login node with the destination username (e.g. ssh DESTUSERNAME@glogin.hpc.gwdg.de) and do following to setup the move (giving SRCUSERNAME the permissions to enter and create/delete files):

setfacl -m u:SRCUSERNAME:wx DESTDIR

Then, login to a login node with the source username (e.g. ssh SRCUSERNAME@glogin.hpc.gwdg.de) and do the following to move the data:

mv SRCDIR/DIR_TO_MIGRATE DESTDIR/

Then, back with the destination username, remove the ACL that let SRCUSERNAME do the operation with

setfacl -x u:SRCUSERNAME DESTDIR

At this point, the data is moved but it has the wrong owner and group (those of SRCUSERNAME). Start a support request (see Getting Started for the correct email address) requesting that the directory’s ownership be changed from your SRCUSERNAME to your DESTUSERNAME.