Data Migration Guide
While having a separate username for each project has some upsides such as separate data store quotas, never having to worry about submitting jobs with the wrong Slurm account, etc.; a major downside is that sometimes files must be copied or moved between usernames. Common scenarios are:
- Copying scripts or configuration files in your HOME directory that took effort to create (e.g.
.bashrc
,.config/emacs/init.el
, etc.) - Moving files from your legacy username to a new project-specific username
- Moving files from your username of an expired project to the new username of a successor project
- “Graduating” from using the SCC to a full NHR project
In all cases, pay attention to which data stores are available on each island of the cluster. Data can only be transferred between two data stores from a node that can access both. See Cluster Storage Map for information on which data stores are available where. If there is no node that can access both, you might have to use an intermediate data store.
This topic requires at least basic understanding of POSIX permissions and groups. Refer to the following links for further information:
https://en.wikipedia.org/wiki/File-system_permissions#Notation_of_traditional_Unix_permissions
https://en.wikipedia.org/wiki/Chmod
https://en.wikipedia.org/wiki/Unix_file_types#Representations
Or take a look at our self-paced tutorial for beginners.
Only the username owning a file/directory can change its group, permissions or ACLs. Even other usernames attached to the same AcademicID are unable to, because the operating system does not know that the different usernames are aliases for the same person.
Many directories have both a logical path like /user/your_name/u12345
and a real path that points to the actual location on the filesystem.
Please always operate on the real paths which are directories you can actually modify, unlike the symbolic links below /user
or /projects
which cannot be modified by users.
You can find out the real path with the following command:
realpath /path/to/directory
Alternatively, as a quick way to transparently resolve logical paths, you can add a trailing /
at the end of the path when using it in a command.
Also see our general tips on managing permissions. Especially, read the advanced commands section at the end, if any of the commands documented on this page are slow / take a long time to complete.
Strategy
In theory, you have two options for getting your data from the source to its destination:
- “Push”: Grant write permission to the destination to your username owning the source data and copy/move the data when logged in as the source username.
- “Pull”: Grant access to the source data to your username that owns the destination directory (or is a member of the destination project), then copy/move the data logged in as the destination username.
In practice, we strongly recommend to always use the “pull” method, and to not move, but copy the data. The reason is simple, regular (unprivileged) users are unable to change ownership of files/directories to another username. Copied files are owned by the user that created the copy, while moved files retain the original owner.
Using another strategy than pull+copy will result in your destination username being unable to change the group or permission of the migrated files/directories. This would be especially problematic when the old username eventually becomes inactive (due to legacy users being disabled or the old project expiring), you will no longer be able to login as that username.
Hint: Legacy usernames, SCC as well as HLRN/NHR, will be disabled at the end of 2025/start of January 2026.
A good way to still effectively “move” your data, is to use rsync
with the --remove-source-files
option.
This will delete each source file, directly after it was copied and the integrity of the copy verified.
It replicates the original directory structure at the destination, the only downside is that it also leaves behind an empty “skeleton” of the directory structure at the source.
You can use something like
find <source_path> -type d -delete
to remove all nested empty directories. This will leave any directories that still had files in them intact, allowing you to be sure that everything has been safely moved and nothing was forgotten or failed to copy that you might otherwise delete afterwards.
The rest of this page will focus on setting up the correct permissions, so you are able to execute the above transfer as the (implied) last step.
Determine Method to Use
There are various methods to migrate data, depending on the owner as well as the source and destination data stores. In this table, you can find the easiest methods we recommend for each kind of migration, in descending order:
Source | Destination | Method |
---|---|---|
project-specific username | project-specific username with same AcademicID | Shared AcademicID Group |
AcademicID Username (legacy SCC) | Shared AcademicID Group | |
your legacy HLRN/NHR username | Get Added to Shared AcademicID Group | |
any other username | ACL | |
legacy SCC username | project-specific username with the same AcademicID | Shared AcademicID Group |
any other username | ACL | |
legacy HLRN/NHR username | your project-specific usernames | Get Added to legacy Group |
your project-specific usernames | Get Added to Shared AcademicID Group | |
your legacy SCC username | Get Added to Shared AcademicID Group | |
any other username | ACL | |
project | any other project | Between Projects |
The ACL method works on most data stores (some don’t support them) and is the most powerful, but also the most complex. Often, you could use it, but we recommend using the other methods when possible. The data stores ACLs do not work on are:
- all ARCHIVE/PERM data stores
- all Tape Archives
Get Added to Legacy Group
Legacy HLRN/NHR usernames have an accompanying POSIX group of the same name, just like legacy projects.
Files and directories in the various data stores of your legacy username or project belong to these groups by default.
This means in most cases, you can completely skip the step of changing the group or permissions as documented for the other methods on this page, since your new, project-specific u12345
username can just be added to the legacy group by our support team.
Use the groups
command to list all groups that have your current username as a member, when logged in as your target username.
If those include your legacy user/project group, your new username should already have access to the old one’s data.
A notable exception are the “top-level” personal home or scratch directories, which by default are not read/write/executable by the group (while subdirectories often are).
For those, a quick and simple (non-recursive)
chmod g+rX <path>
is usually enough. For example, John Doe’s legacy HLRN username is nibjdoe
and his project-specific username u12345
is a member of the group nibjdoe
.
John would only need to run
chmod g+rX /scratch-grete/usr/nibjdoe
to grant u12345
full read access. Use rwX
for read/write access.
Get Added to AcademicID Group
Legacy HLRN/NHR usernames have different AcademicIDs than the AcademicID used for project-specific usernames in the HPC Project Portal and legacy SCC usernames.
Thus, the Shared AcademicID Group method cannot be directly used.
But, the legacy HLRN/NHR username can be added to your shared AcademicID POSIX group (HPC_u_<academicuser>
, where academicuser
is the username of the AcademicID) by our support team by writing a support ticket.
Make sure to use the email address associated with the accounts to send the ticket, or one of them if different email addresses are associated with each.
This proves you actually own the accounts in question (you may be asked for additional information to prove ownership if they are associated with different email addresses).
Make sure to clearly state both your legacy HLRN/NHR username and the AcademicID whose POSIX group it should be added to.
Once your legacy username has been added to the HPC_u_<academicuser>
group, proceed to the Shared AcademicID Group method.
Using Shared AcademicID Group
Every AcademicID with at least one project-specific username in the HPC Project Portal has a shared POSIX group of the form HPC_u_<academicuser>
(where academicuser
is the username of the AcademicID).
All of that AcademicID’s project-specific usernames as well as the primary username of the AcademicID itself are members of this group.
For example, John Doe with AcademicID username jdoe
is a member of two projects and has two project specific usernames u12345
and u56789
, he will have a group called HPC_u_jdoe
with 3 members jdoe
, u12345
, and u56789
.
This shared POSIX group is provided to facilitate easy data migration between usernames without the risk of giving access to others by accident.
To grant access to a file/directory to the other usernames in the shared AcademicID POSIX group, you would do the following as the username that owns the directories/files (your other usernames lack the permissions):
chgrp [OPTIONS] HPC_u_<academicuser> <path>
chmod [OPTIONS] g+<perms> <path>
If the <path>
is a directory, you should generally add the -R
option to make the command apply the group/permissions recursively to subdirectories and files.
<perms>
should be rX
for readonly access and rwX
for read-write access, where the capital X
gives execute permissions only to files/directories that are already executable by the owner.
Please do NOT use a lower-case x
in <perms>
when recursively changing directory permissions!
That would give execute permissions to all files, even those that should not be executable.
Having random files be executable without good reason is confusing in the best case and a potential security risk and risk to your data in the worst.
It is important to remember, your other usernames can’t access <dir>/<file>
unless they can also access <dir>
, so always be mindful of the parent directory/ies.
Since symlinks are used for many data stores, make sure to end <path>
with a /
when operating on directories or use $(realpath <path>)
to get the fully resolved destination after walking through all symlinks.
Otherwise, the commands will fail, trying to operate on the symlink and not the destination.
For example, /user/jdoe/u12345
would be a symlink to u12345
’s HOME directory, so if you wanted to share that with your other usernames in the same HPC_u_jdoe
group, you would run one of the following examples:
chgrp -R HPC_u_jdoe /user/jdoe/u12345/
chgrp -R HPC_u_jdoe $(realpath /user/jdoe/u12345)
Or you could of course just use the real path in the first place.
To give the destination username read-only access to the source, do the following:
- Login with the username of the source
- Change the group of the source to
HPC_u_<academicuser>
- Add
g+rX
permissions to the source directory (recursively) - If you are sharing a subdirectory in your data store, you will need to change the group of the parent directory/directories and add the permission
g+X
(non-recursively)
For example, suppose John Doe wants to give access to the .config
subdirectory of his HOME
directory of his legacy SCC username to his other usernames so the configuration files can be copied over.
John would do this by logging in with jdoe
and running
[jdoe@gwdu101 ~]$ chgrp HPC_u_jdoe ~/
[jdoe@gwdu101 ~]$ chgrp -R HPC_u_jdoe ~/.config
[jdoe@gwdu101 ~]$ chmod g+rX ~/
[jdoe@gwdu101 ~]$ chmod -R g+rX ~/.config
Then, John could access the files from his u12345
username like
[scc_cool_project] u12345@gwdu101 ~ $ cp -R /usr/users/jdoe/.config/emacs ~/.config/
If John wants to keep using the shared directory to create new files with the source username but by default grant access to the other usernames in HPC_u_<academicid>
, he could also set the SGID-bit on the shared directory, so any newly created files will also be owned by the correct group automatically:
find <path> -type d -exec chmod g+s {} \;
See the advanced commands section of the Managing Permissions page if you have a very large number of files/directories and the above commands are taking a long time.
Using ACLs
Data can be migrated using ACLs (Access Control Lists) on most data stores (some don’t support them), but it is more complex. ACLs can be more powerful than regular POSIX permissions, but are not immediately visible and can easily lead to confusion or mistakes. ACLs should be avoided unless you can’t use the easier Shared AcademicID Group or Get Added to Shared AcademicID Group methods.
The basic idea with ACLs is that you can give additional r/w/x permissions to specific users or groups without changing the file/directory owner, group, and main permissions. You can think of them as giving files/directories secondary users and groups with their own separate permissions.
You must use the username that owns the files/directories to add ACLs to them.
ACLs are added with the setfacl
command like
setfacl [OPTIONS] -m <kind>:<name>:<perms> <path>
and removed like
setfacl [OPTIONS] -x <kind>:<name> <path>
where <kind>
is u
for a user and g
for a group, <name>
is the username or group name, and <perms>
is the permissions.
For <perms>
; use r
for read access, w
for write access, and capital X
to grant execute permissions to files/directories already executable by the owner.
Add the -R
option to apply the ACL recursively to subdirectories and files.
Please do NOT use a lower-case x
in <perms>
as that gives execute permissions to all files, even those that should not be executable.
Having random files be executable without good reason is confusing in the best case and a potential security risk and risk to your data in the worst.
It is important to remember, your other usernames can’t access <dir>/<file>
unless they can also access <dir>
, so always be mindful of the parent directory/ies.
Since symlinks are used for many data stores, make sure with directories to end <path>
with a /
or use $(realpath <path>)
to get the fully resolved destination after walking through all symlinks.
Otherwise, the commands will fail, trying to operate on the symlink and not the destination.
For example, /user/jdoe/u12345
would be a symlink to u12345
’s HOME directory, so if you wanted to share that with username u31913
, you would either run
setfacl -m u:u31913:rX /user/jdoe/u12345/
or
setfacl -m u:u31913:rX $(realpath /user/jdoe/u12345)
You can see if a file/directory has ACLs when using ls -l
by looking for +
sign at the end of the permissions column.
ACLs can be displayed using getfacl
.
The following example demonstrates making two files bar
and baz
in subdirectory foo
, adding an ACL to bar
, showing the permissions with ls -l
, and then reading the ACLs on bar
:
[gzadmfnord@glogin4 ~]$ mkdir foo
[gzadmfnord@glogin4 ~]$ cd foo
[gzadmfnord@glogin4 foo]$ touch bar baz
[gzadmfnord@glogin4 foo]$ setfacl -m u:fnordsi1:r bar
[gzadmfnord@glogin4 foo]$ ls -l
total 0
-rw-r-----+ 1 gzadmfnord gzadmfnord 0 May 21 15:56 bar
-rw-r----- 1 gzadmfnord gzadmfnord 0 May 21 15:56 baz
[gzadmfnord@glogin4 foo]$ getfacl bar
# file: bar
# owner: gzadmfnord
# group: gzadmfnord
user::rw-
user:fnordsi1:r--
group::r--
mask::r--
other::---
To give the destination username readonly access to the source, do the following
- Login with the username of the source
- Add
g+rX
ACLs to the source directory (recursively) - If you are sharing a subdirectory in you data store, you will need to add a
g+X
ACL to the parent directory/directories (non-recursively)
For example, suppose John Doe wants to give access to the .config
subdirectory of his HOME
directory of his legacy HLRN/NHR username nibjdoe
to his project-specific username u12345
so the configuration files can be copied over.
John would do this by logging in with nibjdoe
and running
[nibjdoe@glogin3 ~]$ setfacl -m u:u12345:rX ~/
[nibjdoe@glogin3 ~]$ setfacl -R -m u:u12345:rX ~/.config
Then, John could access the files from his u12345
username like
[nib30193] u12345@gwdu101 ~ $ cp -R /mnt/vast-nhr/home/nibjdoe/.config/emacs ~/.config/
If John wants to keep using the shared directory to create new files with the source username but by default grant access to u12345
, he could set a default ACL on shared directory, so any newly created files and directories will automatically have the same ACL
setfacl -R -d u:u12345:rX <path>
where the -d
option is used to specify that the default ACL should be changed.
If you want to remove a default ACL, you also need to include the -d
option.
Between Projects
Have A Username in Both Projects
With your (legacy) username that is a member of both projects, you can just copy the data from one to the other as long as it isn’t too large with rsync
or cp
.
If the data is very large, this will be very slow and may harm filesystem performance for everyone. In this case, please write a support ticket so the best way to copy or move the data can be found (the admins have additional more efficient ways to transfer data in many cases).
Have Different Usernames in Both Projects
If the data is small, it can be transferred via an intermediate hop.
If the source project is A
and the destination project is B
and your usernames in both are userA
and userB
respectively, this would be done by:
- Copy/move the data from the project datastore of
A
to a user datastore ofuserA
. - Share the data from the datastore of
userA
with usernameuserB
using the respective method in the table above. - Using username
userB
, copy the data from the datastore ofuserA
to the destination datastore of projectB
.
Otherwise, please write a support ticket so a suitable way to migrate the data can be found.
Make sure to indicate the source and destination path, their projects, and your usernames in each.