Skip to content

Using Tar over SSH to move many files

Linux tools such as scp and rsync are commonly used to transfer files between a remote server and a local machine. However, these tools are not very practical for moving many small files.

A simple, much faster solution is to write a (compressed) tar package containing your data directly to the target system. This is accomplished by piping the output of tar over an ssh connection. Writing the archive directly to the destination helps also to conserve disk space on the source system.

For general information about tar, see Packing and compression tools.

Examples

All following commands are executed on a local workstation, which is assumed to have the required tools (tar, ssh and, optionally, gzip/bzip2) installed. Puhti is used as the example remote server.

The general syntax is:

tar c <directory_to_transfer> | ssh <username>@puhti.csc.fi 'cat > <target_path_on_puhti>'

For example, the command to copy a directory myfiles from your local machine to the directory /scratch/project_2001234 on Puhti would be:

tar c myfiles | ssh <username>@puhti.csc.fi 'cat > /scratch/project_2001234/myfiles.tar'

To extract the tar archive at the same time, replace the cat command as:

tar c myfiles | ssh <username>@puhti.csc.fi 'tar x -C /scratch/project_2001234'

Conversely, you could also copy data located on the remote server to your local machine using a command like:

ssh <username>@puhti.csc.fi 'tar c -C <parent_directory_on_puhti> <directory_to_transfer>' > <archive_on_local_machine>

For example, the command to copy a folder myfiles located in /scratch/project_2001234 to your local machine would be:

ssh <username>@puhti.csc.fi 'tar c -C /scratch/project_2001234 myfiles' > myfiles.tar

To extract the archive at the same time:

ssh <username>@puhti.csc.fi 'tar c -C /scratch/project_2001234 myfiles' | tar x

Enable compression

The above commands did not use compression. Replace tar c (tar x) with tar cz or tar cj (tar xz or tar xj) to enable gzip or bzip2 compression (decompression). Note that some files do not benefit from compression (e.g. binary files or files that are already compressed), in which case it is faster to transfer without compressing.

Avoid verbose output

Adding v option (e.g. tar czv) will produce verbose output, i.e. list the processed files in the console. This slows down the transfer process when moving many small files, so it is recommended to avoid using it.

Performance comparison

Below is a rough comparison of the time taken to transfer a directory with many small files to Puhti. Each file is 100 KiB in size and no compression is applied.

Number of files tar + ssh scp
100 0.5 s 2.4 s
1000 1.3 s 20.5 s
10000 9.4 s 201.2 s

More information