Data Transfers

The systems in Göttingen are equipped with various file systems each. Their properties and their intended utilization are described here.

Disk quotas based on group ownerships are implemented on each site’s global (non-local) file systems.

We support file transfer tools like scp or rsync which use the SSH protocol in the background to establish the connection and to encrypt the data transfer. For this reason, a working SSH connection is a prerequisite for data transfer. Each of the following sections deals with a specific direction for establishing the transfer connection. Independent of the connection direction, data can always be transferred “from” or “to” the connected target host.

Data Transfer Connecting from the Outside World

External connections to the systems Göttingen require that an SSH key pair is used for authentication. More details can be found here. The location of the private key file can be specified when calling scp or rsync on the user’s local machine. Some examples including both data transfer directions are shown below.

Using scp, the option -i <fullpath_of_privatekeyfile> can be added:

$ scp -i <fullpath_of_privatekeyfile> <username>@glogin.hpc.gwdg.de:<remote_source> <local_target>
$ scp -i <fullpath_of_privatekeyfile> <local_source> <username>@glogin.hpc.gwdg.de:<remote_target>

With rsync, the nested option -e ‘ssh -i <fullpath_of_privatekeyfile>’ can be added:

$ rsync -e 'ssh -i <fullpath_of_privatekeyfile>' <username>@glogin.hpc.gwdg.de:<remote_source> <local_target>
$ rsync -e 'ssh -i <fullpath_of_privatekeyfile>' <local_source> <username>@glogin.hpc.gwdg.de:<remote_target>

Alternatively, the additional options shown above for specifying the location of the private key file can be omitted. In this case it is necessary to have a corresponding SSH configuration on the user’s local machine as described here. To verify this, the corresponding SSH connection must be working without specifying the private key file on the command line.

Data Transfer Connecting to the Outside World

Connections to external machines located anywhere in the world can be established interactively from the login nodes. In this case, the SSH key pair mentioned above for external connections to the login nodes is not required. However, additional rules imposed by the external host or institution may apply.

Data transfer in the context of a batch job is restricted due to limited network access of the compute nodes. Please send a message to the support mailing list in case you need further help.

Data Transfer between Emmy and the SCC

If you have previously been working on the SCC in Göttingen at the GWDG, you can follow these steps if you need to transfer data to/from the Emmy system:

  1. On an Emmy frontend node (glogin.hpc.gwdg.de or glogin[1-9].hpc.gwdg.de), generate a new SSH key (also documented at the SCC).

  2. Add the SSH key to your academic cloud account at the bottom of the security tab

  3. From an Emmy frontend node (glogin9.hpc.gwdg.de has access to both Emmy and Grete scratches (including the ones for KISSKI), while glogin.hpc.gwdg.de and glogin[1-8].hpc.gwdg.de only have access to the Emmy scratch; but all have access to $HOME ), transfer the files using rsync (see SCC documentation and rsync man page) to/from the SCC transfer node transfer-scc.gwdg.de. Some examples are given below

    • Copy a single file FOO from SCC $HOME into your current directory on Emmy
      rsync -e 'ssh -i <fullpath_of_privatekeyfile>' GWDGUSERNAME@transfer-scc.gwdg.de:/usr/users/GWDGUSERNAME/FOO .
    • Copy a single file FOO in your current directory on Emmy to $HOME on the SCC
      rsync -e 'ssh -i <fullpath_of_privatekeyfile>' FOO GWDGUSERNAME@transfer-scc.gwdg.de:/usr/users/GWDGUSERNAME/
    • Copy a directory in your SCC /scratch to your current directory on Emmy
      rsync -e 'ssh -i <fullpath_of_privatekeyfile>' -r GWDGUSERNAME@transfer-scc.gwdg.de:/scratch/projects/workshops/forest/synthetic_trees .
    • Copy a directory in your current directory on Emmy to /scratch on the SCC
      rsync -e 'ssh -i <fullpath_of_privatekeyfile>' -r synthetic_trees GWDGUSERNAME@transfer-scc.gwdg.de:/scratch/projects/workshops/forest/

If you have terrabytes of data that need to be transferred, please contact us so that we can provide a custom solution for this.

Data Transfer via ownCloud

It is possible to transfer data between GWDG’s ownCloud service and the HPC systems.

Using Rclone with WebDAV provides in general three different methods to access your data on owncloud. The first one must not be done: You are not allowed to store the password to your GWDG Account on the HPC-frontend nodes to access your personal ownCloud space. This is highly unsafe. Never do this!.

The second option is better but also not recommended: You can create a special device password in ownCloud which will only work for ownCloud. However, this still gives full access to all your documents in ownCloud. This is also not recommended. If you want to do it anyway: You can create a dedicated password for ownCloud here. Then encrypt your Rclone configuration file by executing rclone config and hitting s to select Set configuration password. Your configuration is now encrypted, and every time you start Rclone you will have to supply the password. This provides full access to your owncloud account and all data within.

For security reasons, more fine-grained access can be authorized using the following, recommended method: To share data from ownCloud to HPC, only provide access to specific folders on ownCloud. For this purpose, create a public link to the folder, which should be secured with a password. The steps are as follows:

  1. Create a public link to the folder

    • Navigate ownCloud until the folder is visible in the list
    • Click on ..., then Details, select the Sharing tab, Public Links, then Create Public Link
    • Set link access depending on use case. Options are:
      • Download / View
      • Download / View / Upload
      • Download / View / Upload / Edit
    • Set password and expiration date (highly recommended) This password will be referred to as <folder_password> in this document
  2. Extract the folder ID from the public link which is required for the next steps The last portion of the link is the folder ID, i.e., https://owncloud.gwdg.de/index.php/s/<folder_id>

  3. On the HPC system, load the rclone module: module load rclone

  4. Run these commands on the HPC to download from or upload to the folder in ownCloud:

    Download

    rclone copy --webdav-url=https://owncloud.gwdg.de/public.php/webdav --webdav-user="<folder_id>" --webdav-pass="$(rclone obscure '<folder_password>')" --webdav-vendor=owncloud :webdav: <local_dir>

    Upload

    rclone copy --webdav-url=https://owncloud.gwdg.de/public.php/webdav --webdav-user="<folder_id>" --webdav-pass="$(rclone obscure '<folder_password>')" --webdav-vendor=owncloud <local_dir> :webdav:

    Where <folder_id> is the ID extracted from the public link, <folder_password> is the password that was set when creating the public link, and <local_dir> is the local folder to synchronize with the folder in ownCloud.

  5. When it’s not required anymore, remove the public link from the folder in ownCloud.

Longer data transfers

Copying data can take longer if you move large amounts at the same time. When using rsync, use the option -P (short for --partial --progress), which allows you to resume interrupted transfers at the point they stopped and shows helpful information like the transfer speed and % complete for each file. When running transfer operations on our login nodes, and you don’t want to or can’t keep the terminal session open for the whole duration, there is the option to use either tmux or screen.

A guide on how to use both can be found on the terminal multiplexer page. Make sure you reconnect to the same login node to resume the session later.