Data Transfers

We support file transfer tools such as rsync and scp, which use the SSH protocol to establish and encrypt the connection. For this reason, a working SSH connection is a prerequisite for most methods described here. Each of the following sections deals with a specific “direction” for establishing the transfer connection. Independent of this direction, meaning the machine you start the e.g. rsync command from, data can always be transferred from or to the target host.

Data Transfers Connecting from the Outside World

It is highly recommended to specify shorthands for target hosts in your ssh config file, like laid out here. Those shorthands can also be used with rsync (recommended) or scp, making their use much easier and more comfortable.

rsync -av /home/john/data_files Emmy-p3:/mnt/lustre-grete/usr/u12345/
Info

Also see the “Tips and Tricks” section below for a quick description of rsync’s command line arguments/flags.

If necessary, the location of the private key file can also be specified explicitly when calling scp or rsync on the user’s local machine.

Using scp, the option -i <path_to_privatekey> can be added:

scp -i <path_to_privatekey> <user>@glogin.hpc.gwdg.de:<remote_source> <local_target>
scp -i <path_to_privatekey> <local_source> <user>@glogin.hpc.gwdg.de:<remote_target>

With rsync, it is a bit more tricky, using a nested ssh command -e 'ssh -i <path_to_privatekey>' like this:

rsync -av -e 'ssh -i <path_to_privatekey>' <user>@glogin.hpc.gwdg.de:<remote_source> <local_target>
rsync -av -e 'ssh -i <path_to_privatekey>' <local_source> <user>@glogin.hpc.gwdg.de:<remote_target>

<local_source> and <remote_source> can be either single files or entire directories.
<local/remote_target> should be a directory, we recommend always adding a slash / at the end. That avoids accidentally overwriting a file of that name, also works if the target is a symlink and is generally more robust.

For rsync, having a trailing slash or not for the source determines if the directory including its contents or just the contents should be copied.
For scp, if the source is a directory, you have to use the -r switch to recursively copy the directory and its contents.

Data Transfers Connecting to the Outside World

Connections to external machines using the standard port 22 located anywhere in the world can be established interactively from the login nodes. An SSH key pair may or may not be required to connect to external hosts, and additional rules imposed by the external host or institution may apply.

Warning

We do not allow private ssh keys on the cluster! For security reasons, private key files should never leave your local machine!

In order to still be able to use a private key residing on your local machine to establish connections from the cluster to external hosts, you can use an SSH agent. The agent will act as a proxy that forwards requests to access your private key to your local machine and sends back the result.

Here is an example of using an SSH agent:

john@doe-laptop ~ $ eval $(ssh-agent)
Agent pid 345678
john@doe-laptop ~ $ ssh-add ~/.ssh/private_key_for_zib
Identity added: .ssh/private_key_for_zib (john@doe-laptop)
john@doe-laptop ~ $ ssh -A u12345@glogin-p3.hpc.gwdg.de -i ~/.ssh/private_key_for_gwdg
Last login: Thu May  1 11:44:21 2025 from 12.34.56.78
Loading software stack: gwdg-lmod
Found project directory, setting $PROJECT_DIR to '/projects/extern/nhr/nhr_ni/nhr_ni_test/dir.project'
Found scratch directory, setting $WORK to '/mnt/lustre-grete/usr/u12345'
Found scratch directory, setting $TMPDIR to '/mnt/lustre-grete/tmp/u12345'
 __          ________ _      _____ ____  __  __ ______   _______ ____
 \ \        / /  ____| |    / ____/ __ \|  \/  |  ____| |__   __/ __ \
  \ \  /\  / /| |__  | |   | |   | |  | | \  / | |__       | | | |  | |
   \ \/  \/ / |  __| | |   | |   | |  | | |\/| |  __|      | | | |  | |
    \  /\  /  | |____| |___| |___| |__| | |  | | |____     | | | |__| |
  _  \/ _\/  _|______|______\_____\____/|_|  |_|______|____|_|__\____/
 | \ | | |  | |  __ \     ____    / ____\ \        / /  __ \ / ____|
 |  \| | |__| | |__) |   / __ \  | |  __ \ \  /\  / /| |  | | |  __
 | . ` |  __  |  _  /   / / _` | | | |_ | \ \/  \/ / | |  | | | |_ |
 | |\  | |  | | | \ \  | | (_| | | |__| |  \  /\  /  | |__| | |__| |
 |_| \_|_|  |_|_|  \_\  \ \__,_|  \_____|   \/  \/   |_____/ \_____|
                         \____/

 Documentation:  https://docs.hpc.gwdg.de
 Support:        nhr-support@gwdg.de

PARTITION    NODES (BUSY/IDLE)     LOGIN NODES
medium96s          95 /  296     glogin-p3.hpc.gwdg.de
standard96        745 /  245     glogin-p2.hpc.gwdg.de
Your current login node is part of glogin-p3
[nhr_ni_test] u12345@glogin11 ~ $ echo $SSH_AUTH_SOCK
/tmp/ssh-jxWgVZrgW5/agent.2462718
[nhr_ni_test] u12345@glogin11 ~ $ ssh nimjdoe@blogin.nhr.zib.de
Warning: Permanently added 'blogin.nhr.zib.de,130.73.234.2' (ECDSA) to the list of known hosts.

********************************************************************************
*                                                                              *
*               Welcome to NHR@ZIB system "Lise" on node blogin2               *
*               (Rocky Linux 9.5, Environment Modules 5.4.0)                   *
*                                                                              *
*  Manual   ->  https://user.nhr.zib.de                                        *
*  Support  ->  mailto:support@nhr.zib.de                                      *
*                                                                              *
********************************************************************************

Module NHRZIBenv loaded.
Module sw.clx.el9 loaded.
Module slurm (current version 24.11.5) loaded.
blogin2:~ $ exit
[nhr_ni_test] u12345@glogin11 ~ $ rsync -avP data_gwdg nimjdoe@blogin.nhr.zib.de:/scratch/usr/nimjdoe/
[...]
Note

When setting the agent up, it is strongly recommended to add the key with ssh-add -c <path-to-private-key>, if your system supports it. This way, whenever the remote machine needs your key, you will get a confirmation dialog on your local machine asking whether you want to allow it or not.

Without -c, you have no chance of noticing suspicious requests to use your key from the remote machine, although those are highly unlikely.

Some desktop environments are known to have problems (for example some versions of gnome-keyring do not play nice with ssh-agent confirmations), so please try if it works for you and leave out the -c option if it does not. You may have to install additional packages, like ssh-askpass on Ubuntu/Debian-based distributions.

Should you ever get the confirmation dialog at a time you didn’t initiate an SSH connection on the remote machine, someone on the remote machine is trying to use your key. As our admins will never try to steal your key, this probably means the login node or at least your session was compromised by an attacker. Deny the request and please contact our support immediately letting us know about the potentially compromised node (don’t forget to include details like the node name, your username, what exact commands you ran, etc.).

Data transfer in the context of a batch job is restricted due to limited network access of the compute nodes. If possible, avoid connections to the outside world within jobs, otherwise send a message to our support in case you need further help.

Data Transfers within our HPC cluster

You can generally use any login node to copy or move files between different filesystems, data stores and directories. We recommend using a tmux or screen session to do that, see the bottom of this page for details.

Note

Please do not start more than one or two larger copy or move operations in parallel, as this can quickly eat up all network bandwidth and make at least the login node you’re using and potentially more nodes using the same network links very slow for all users!

See our Data Migration Guide and Data Sharing pages for details regarding ownership and permissions to access directories across user accounts or from multiple users.

An exception to this is the legacy SCC home filesystem (Stornext), which is only connected to the legacy SCC login and compute nodes. Please use those nodes to transfer data between /usr/users///home/uniXX///home/mpgXX/ and the other filesystems.

If you have terabytes of data that need to be transferred, please contact us so we can provide a custom solution.

Data Transfer via ownCloud

It is possible to transfer data between GWDG’s ownCloud service and the HPC systems.

Using rclone with WebDAV provides in general three different methods to access your data on owncloud. The first one must not be done: You are not allowed to store the password to your GWDG Account on the HPC-frontend nodes to access your personal ownCloud space. This is highly unsafe. Never do this!.

The second option is better but also not recommended: You can create a special device password in ownCloud which will only work for ownCloud. However, this still gives full access to all your documents in ownCloud. This is also not recommended. If you want to do it anyway: You can create a dedicated password for ownCloud here. Then encrypt your Rclone configuration file by executing rclone config and hitting s to select Set configuration password. Your configuration is now encrypted, and every time you start Rclone you will have to supply the password. This provides full access to your owncloud account and all data within.

For security reasons, more fine-grained access can be authorized using the following, recommended method: To share data from ownCloud to HPC, only provide access to specific folders on ownCloud. For this purpose, create a public link to the folder, which should be secured with a password. The steps are as follows:

  1. Create a public link to the folder

    • Navigate ownCloud until the folder is visible in the list
    • Click on ..., then Details, select the Sharing tab, Public Links, then Create Public Link
    • Set link access depending on use case. Options are:
      • Download / View
      • Download / View / Upload
      • Download / View / Upload / Edit
    • Set password and expiration date (highly recommended) This password will be referred to as <folder_password> in this document
  2. Extract the folder ID from the public link which is required for the next steps The last portion of the link is the folder ID, i.e., https://owncloud.gwdg.de/index.php/s/<folder_id>

  3. On the HPC system, load the rclone module: module load rclone

  4. Run these commands on the HPC to download from or upload to the folder in ownCloud:

    Download

    rclone copy --webdav-url=https://owncloud.gwdg.de/public.php/webdav --webdav-user="<folder_id>" --webdav-pass="$(rclone obscure '<folder_password>')" --webdav-vendor=owncloud :webdav: <local_dir>

    Upload

    rclone copy --webdav-url=https://owncloud.gwdg.de/public.php/webdav --webdav-user="<folder_id>" --webdav-pass="$(rclone obscure '<folder_password>')" --webdav-vendor=owncloud <local_dir> :webdav:

    Where <folder_id> is the ID extracted from the public link, <folder_password> is the password that was set when creating the public link, and <local_dir> is the local folder to synchronize with the folder in ownCloud.

  5. When it’s not required anymore, remove the public link from the folder in ownCloud.

Tips and Tricks

Longer Data Transfers

Copying data can take longer if you move large amounts at the same time. When using rsync, use the option -P (short for --partial --progress), which allows you to resume interrupted transfers at the point they stopped and shows helpful information like the transfer speed and % complete for each file. When running transfer operations on our login nodes, and you don’t want to or can’t keep the terminal session open for the whole duration, there is the option to use a terminal multiplexer like tmux or screen. If your network connection is interrupted from time to time, it is strongly recommended to always run larger transfers in a tmux or screen session.

A guide on how to use both can be found on the terminal multiplexer page. Make sure you reconnect to the same login node to resume the session later!

rsync Command Line Arguments

rsync is a very powerful and versatile tool and accordingly, it has a lot of options and switches to finetune its behavior. The most basic ones that are useful in almost all circumstances are -a or --archive and -v or --verbose. The former is short for -rlptgoD (so you don’t have to remember those), which are various options to recursively copy directories and preserve most metadata (permissions, ownership, timestamps, symlinks and more). Notably, that does not include ACLs, extended attributes and times of last access. --verbose prints, as you might expect, the names or paths of each file and directory as they are copied.

Other often useful options include:
-P as mentioned above, useful or even critical for transfers of large files over slow or potentially unstable connections, but could hurt/slow down transfers of many small files.
-z or --compress to compress data in transfer, which can speed up transmission of certain types of data over slow connections, but can also (sometimes severely) slow down transmission of incompressible data or over fast connections.

For more options and information, please refer to the rsync manpage.

3-way Transfer (Remote to Remote)

While we generally do not recommend this method for many reasons, most of all because it’s inefficient and often very slow, there are cases where it can be a lot easier to transfer small amounts of data between two remote hosts you can both reach from your local machine. Basically, instead of establishing a direct connection between the two remotes, you channel all transferred data through your local machine. With scp, you can achieve this by using the switch -3.