Objectives

You will learn:

  • how to transfer small amounts of data to and from the system
  • how to transfer large amounts of data
  • how to bypass the lander node when copying data

transfer tools

The basic toolkit consists of scp and sftp.

  1. scp – secure copy
  2. sftp – secure file transfer protocol

These tools are available in Linux and MacOS terminals, as well as on Windows via MobaXterm, Git Bash, or Windows 10 Bash.

Note for MobaXterm users on Windows: MobaXterm automatically starts an sftp session alongside your ssh session that you can use to easily transfer data from your laptop or desktop computer to the HPC and back. Make sure that you activate “Display SFTP browser” in the “Advanced SSH settings” tab of the session settings window. Even though multiple files can be transfered in a single step by selecting a set of files in the browser, it may still be worthwhile creating a compressed tar archive (e.g., using the 7-Zip program) to reduce transfer time, or to handle a very large number of files. Using command line versions of scp, sftp, and tar is of course also possible on Windows, as noted above.

Preliminaries


NOTE

Both scp and sftp use ssh (secure shell), so it’s important that before you proceed, you verify that you can successfully ssh into the lander node and directly into mahuika/maui with the command ssh mahuika following the instructions at jumping across the lander node


scp

The syntax for the scp command is:

 scp [options] source_user@source_host:source_path dest_user@dest_host:dest_path
  • The source file is specified by the first argument, source_user@source_host:source_path, where:
argument description
source_user source username
source_host url or hostname of host on which the source file resides
source_path /path/to/filename of source file
  • The destination file is specified by the second argument, dest_user@dest_host:dest_path, where:
argument description
dest_user destination username
dest_host url or hostname of host to which the source file will be copied
dest_path /path/to/filename of destination file

NOTE

  • The hostname of the local host can be omitted
  • The directory of the file on the local host can be omitted if it is your current working directory

copying files to the cluster

Here the source is your workstation, the destination is mahuika.

scp filename mahuika:/home/username/ copy file to home dir
scp filename mahuika:~/ (same as previous)
scp filename mahuika:/path/to/storage/location/ copy file to another remote path
scp filename mahuika:/path/to/storage/new_filename copy file to renamed remote file
scp filename mahuika:/nesi/project/nesi123456 copy file to project folder
scp -r foldername mahuika:/path/to/storage/ copy _folder_ to path

Note: the -r option makes scp a recursive copy. See man scp for more options.

copying files from the cluster

Here the source is mahuika, the destination is your workstation.


Note:

  • location . stands for current local directory
  • man scp for more options

scp mahuika:/home/username/filename . copy file from remote home directory to current local directory
scp mahuika:~/filename . (same as previous)
scp mahuika:/path/to/storage/filename /another/path/on/local/ copy file from remote path to another local path
scp mahuika:/nesi/project/nesi123456/filename . copy path from remote project folder to current local directory
scp mahuika:/nesi/project/nesi123456/{a,b,c} . copy multiple remote files to here
scp -r mahuika:/nesi/project/nesi123456 . copy entire project folder to here

(No spaces between the commas and filenames!)

tar

There is overhead with each file transfer. If you are transferring a lot of small files, consider using tar to archive and compress the set of files and transfer just one file, the archive.

To transfer a directory tree from either end, navigate to that directory on the source and destination, and at the shell:

command explanation where issued
tar cvfz archive.tar.gz ./ create and compress archive on source
scp archive.tar.gz mahuika:path copy to mahuika on local
scp mahuika:path/archive.tar.gz . copy from mahuika on local
see archive.tar.gz see contents of archive wherever archive exists
tar xvfz archive.tar.gz extract archive after scp on destination

and everything will be copied over and finally uncompressed and unpacked to the same directory structure as it was before, with one file-copy.

See man tar for options on excluding files from the transfer, and other options. the c option means “create”, the “x” option means “extract”.

sftp

For an interactive file-transfer session, sftp is a more full-featured tool than scp. From your local host,

sftp mahuika

should open a prompt

sftp>

From here you can inspect the state of the remote while transferring, and issue the following:

command explanation
cd navigate on remote
lcd navigate on local
ls list files
mkdir create directories
rmdir remove directories
put from local to remote
get from remote to local
help see all commands

The two operative verbs in sftp are ‘get’ and ‘put’, as in:

  • get from remote to local
  • put from local to remote

Speeding up

You can use globbing to put/get a bunch of files all at once:

command explanation
put foo* put all files in cwd starting with foo from local to remote
put *.c put all files in cwd ending with .c from local to remote
get foo* get all files in cwd starting with foo from remote to local
get *.py get all files in cwd ending with ".py" from remote to local

where cwd == "current working directory", at either end! As you put and get, use ls -lt to see the state of your remote and local filesystems.


NOTE

  • Use man sftp to see options on get and put.
    • the -a option permits resuming a partial transfer
    • the -r option permits transferring a directory
    • the -p option permits transferring file permissions and timestamps
  • Use ls -lt to see long listing of directory on remote host.

rsync

A more powerful tool again for file transfer is rsync. It can be restarted after an incomplete transfer, optionally compress the transfer, ensure that symbolic links, devices, attributes, permissions, ownerships, etc. are preserved, and much more. See its man page for options and examples.

command explanation
rsync -avz host:src/bar /data/tmp pull (get) from remote source
rsync -azv /path/to/file host:dest push (put) to remote destination