Table of contents of user guide
Command Line Interface and automated key management
SD Connect command line tool, sd-lock-util, as well as a-put and a-get commands support SD Connect compatible data upload and download with automatic encryption and decryption. After upload, the data can be downloaded through the SD Connect web interface and SD Desktop too. Note that files have been uploaded before February 2025 and were manually encrypted using your encryption key and will still need to be decrypted manually after download.
- Background information
- Command line tools and automated key management
- Command line tools and manual encryption
- Tutorials
Background information
SD Connect is part of CSC's Sensitive Data Services, offering a free and secure data processing environment for academic research projects at Finnish universities and research institutes. SD Connect enhances the Allas object storage system by adding an automatic encryption layer, enabling secure storage of sensitive data. Data stored in SD Connect can also be accessed through SD Desktop service. While SD Connect is typically accessed via the SD Connect Web interface, command-line tools may offer a more efficient way to manage data in certain situations.
This document provides instructions on how you can install SD Connect command line tools on your local environment (Linux, Mac) and how you can use them to upload and download to and from SD Connect.
Note
Allas itself does not differentiate between data uploaded via SD Connect (user interface or command line tools) and data uploaded to Allas using different methods. Data buckets may contain a mix of SD Connect data, other encrypted data, and regular data. It is the user's responsibility to manage data types within the buckets. However, it is recommended to store SD Connect data in separate buckets and folders to avoid mixing different data types.
Command line tools and automated key management
Step 1: Installing a-tools and sd-lock-util on your local environment
To upload and automatically encrypt sensitive data to SD Connect using command line, you need to install the allas-cli-utils and sd-lock-util to your laptop or local environment (Mac or Linux). The installation may require root access and for this reason, you might need support from your organization’s IT unit.
Here you can find step-by-step instructions for installing a-commands
and sd-lock-util
command.
Note
If you need to upload non-sensitive data (such as scripts, containers, or software for use in SD Desktop), note that these tools are also available on CSC's supercomputers (Puhti, Mahti, and LUMI). However, these systems are intended to non-sensitive data only. Sensitive data must be uploaded to SD Connect through the appropriate channels.
Step 2: Opening connection to SD Connect
To open SD Connect compatible Allas connection you must add option --sdc
to the configuration command. In CSC supercomputers the connection is opened with commands:
In local installations the connection is typically opened with commands like
export PATH=/some-local-path/allas-cli-utils:$PATH
source /some-local-path/allas-cli-utils/allas_conf -u your-csc-account --sdc
- The set up process asks first your CSC password (Haka or Virtu passwords can't be used here). After that you will select the CSC project to be used. This is identical to the normal login process for Allas.
- In the case of SD Connect, the process has an extra step where it asks you to give the SD Connect API token.
To retrieve the temporary SD Connect API token:
- Login to the SD Connect web interface. If you have multiple CSC projects, make sure you have selected the same SD Connect project in both the command line and the web interface (top left corner).
- In the top right corner of the web interface, click on Support, then select Create API Token from the dropdown menu.
- In the new dialog, enter a name for your temporary token. Avoid using special characters in the token name.
-
Click on Create Token. The token will be displayed only once. Once you see the token, copy it (click the icon to the left of the token). Important: make sure to store it securely, as it will not be retrievable later.
-
The token will be valid for 24 hours and will be automatically deleted after this period. Paste the token into the command line and press Enter to use it.
The SD Connect compatible Allas connection is now valid for next eight hours. And you can use commands like a-list
and a-delete
to manage both normal Allas objects and SD Connect objects.
Step 3: Data upload and automated encryption
Data can be uploaded to SD Connect by using command a-put
with option --sdc
.
For example to upload file my-secret-table.csv
to location 2000123-sens/dataset2
in Allas use command:
This will produce SD Connect object: 2000123-sens/dataset2/my-secret-table.csv.c4gh
All other a-put options and features can be used too. For example directories are
stored as tar
files, if --asis
option is not used.
Command:
Will produce SD connect object: 2000123-sens/dataset2/my-secret-directory.tar.c4gh
For massive data uploads, you can use sd-lock-util lock command. For example you could upload local
directory dataset3
to bucket 2000123-sens
with command:
sd-lock-util does not store the directory as a tar-archive file. Instead, all files in the directory will be stored as individual objects, named according to the location in the directory.
You can use option --prefix
to do define a specific location inside the target bucket:
Note
Do not use special characters or spaces in the bucket name.
Note
Since SD Connect was updated in October 2024, it is no longer straightforward to determine which encryption method was used for an encrypted .c4gh file stored in Allas/SD Connect. If you are now using a new encryption method to upload files to an existing CSC project, please ensure you add a note to your folders indicating that the encryption protocol has changed. You can either share this information with your colleagues or clearly include it in the folder name. As a good practice, we advise creating a new folder and avoiding mixing files encrypted with different methods.
Step 4: Data download and automated decryption
Data can be downloaded from SD Connect with command a-get
. If SD Connect connection is enabled, a-get will automatically try to decrypt objects with suffix .c4gh
.
So for example command:
Will produce local file: my-secret-table.csv
And similarly command:
Will produce local directory: my-secret-directory
For large downloads you can use sd-lock-util unlock
command. To download an entire bucket you can use command:
Like in the case of upload, option --prefix
can be used to select a subset from the bucket.
For example, to download from bucket 2000123-sens
just the object names starting with case-study2
you can use command:
Note that the automatic decryption with a-get or sd-lock-util works only for the files that have been stored using the new SD Connect that was taken in use in October 2024.
For the older SD Connect files and other Crypt4gh encrypted files you still must use a-get and
provide the matching secret key with option --sk
Unfortunately there is no easy way to know, which encryption method has been used in a .c4gh file stored in SD Connect.
Command line tools and manual decryption
In this Chapter we discuss how to decrypt Crypt4GH encrypted files that are not compatible with current SD Connect version. In these cases automatic decryption does not work. Instead the data needs first to be downloaded to your local computer after which the decryption is done with crypt4gh command or Crypt4GH graphical user interface.
Typical cases where this manual decryption is needed are files that have been stored to SD Connect using the old protocol and files that are exported from SD Desktop.
In these cases it is mandatory that you have access to the secret key (often called as private key) that matches the public key that was used for encrypting the data.
Only the download and decryption of files uploaded with CLI and own encryption key pair is discussed in this section. To encrypt and upload files via command line, please check this tutorial illustrating how to use the crypt4gh tool to upload files in Allas (visible from SD Connect).
2.1 Preparation
You can use any Allas compatible tool to download encrypted files from Allas. Commonly used command line tools include:
In addition to Allas compatible tool, you need Crypt4GH Encryption Utility. Crypt4GH is a written in Python. Python 3.6+ is required. If you need help installing Python, please follow these instructions.
-
Install the Crypt4GH encryption CLI tool. You can install Crypt4GH directly with pip tool:
or, if you prefer the latest sources from GitHub:
or even:
-
The usual
-h
flag shows you the different options that the tool accepts:$ crypt4gh -h Utility for the cryptographic GA4GH standard, reading from stdin and outputting to stdout. Usage: crypt4gh [-hv] [--log <file>] encrypt [--sk <path>] --recipient_pk <path> [--recipient_pk <path>]... [--range <start-end>] [--header <path>] crypt4gh [-hv] [--log <file>] decrypt [--sk <path>] [--sender_pk <path>] [--range <start-end>] crypt4gh [-hv] [--log <file>] rearrange [--sk <path>] --range <start-end> crypt4gh [-hv] [--log <file>] reencrypt [--sk <path>] --recipient_pk <path> [--recipient_pk <path>]... [--trim] [--header-only] Options: -h, --help Prints this help and exit -v, --version Prints the version and exits --log <file> Path to the logger file (in YML format) --sk <keyfile> Curve25519-based Private key When encrypting, if neither the private key nor C4GH_SECRET_KEY are specified, we generate a new key --recipient_pk <path> Recipient's Curve25519-based Public key --sender_pk <path> Peer's Curve25519-based Public key to verify provenance (akin to signature) --range <start-end> Byte-range either as <start-end> or just <start> (Start included, End excluded) -t, --trim Keep only header packets that you can decrypt --header <path> Where to write the header (default: stdout) --header-only Whether the input data consists only of a header (default: false) Environment variables: C4GH_LOG If defined, it will be used as the default logger C4GH_SECRET_KEY If defined, it will be used as the default secret key (ie --sk ${C4GH_SECRET_KEY})
You may notice that crypt4gh uses
--sk
option for the private key. This might seem odd but apparently, crypt4gh uses term secure key for private key, hencesk
, and consequentlypk
refers to public key instead of the private key.
2.2 Download and decrypt a file
To decrypt a file you will need a secret key which corresponds to one of the public keys used in encryption phase. Let's assume in our example that you are decrypting a file dog.jpg
that you have encrypted in SD Desktop with key groupA-pub
after which you exported the file to bucket 2000123-export
.
To retrieve the file to your local computer you a can do both download and decryption with a-put
command.
The command above asks the password of key file after which it downloads the data and decrypts it.
Alternatively you could use for example rclone
to download the data:
After that use crypt4gh decrypt
command for decryption:
The crypt4gh
command uses only standard input (stdin) and standard output (stdout) so you must use shell redirections: <
denotes an input file and >
and denotes an output file, hence <dog.jpg.c4gh
reads in an encrypted file called dog.jpg.c4gh
and >dog.jpg
writes out a decrypted file named dog.jpg
.
The command will ask the user to enter the password (passphrase) of your secret key. For security reasons the password is not displayed when you type it.
If you need to decrypt a large number of Crypt4GH encrypted files, you can check a tutorial that describes how all the files in a directory can be decrypted
Note
In case you are decrypting the file in SD Desktop and the CSC Sensitive Data public key has been used in encryption, decryption will be done automatically, and you do not need to specify any decryption keys. If you need to decrypt a large number of files, please check the tutorial Decrypting all files in a directory.
Additional information about data encryption.
Tutorials
- Tools for client side encryption for Allas
- Decrypting all files in a directory
- Using Allas storage service to receive sensitive research data