SD Connect (Sensitive Data Connect)
Before you start
SD Connect facilitates working with sensitive data and it is a user interface for Allas, CSC cloud storage solution. By default a project can store up to 10 TiB of data. The storage space remains available as long as the CSC project is active. CSC does not make backups of the data in SD Connect. You need to make your own backups of important datasets.
SD Connect and SD Desktop have not yet been security audited. Because of that users may not process any personal data granted for the purposes of the Act on the Secondary Use of Health and Social Data (552/2019) by Findata.
To access SD Connect you need:
- a CSC account
- a CSC project
- Service access to Allas (CSC cloud storage solution)
Login to SD Connect is currently possible only with Haka (a user identity federation system) and CSC credentials at:
The interface is compatible with all modern web browsers.
Once you log in to SD Connect you access the default front-page: Browser.
In this page you can :
view all the buckets available in your CSC project, in which you can store encrypted sensitive data. The buckets can be created, downloaded, deleted or shared, using the appropriate icons;
list and select your CSC project from the drop down menu bar (top left corner) to visualize buckets belonging to a specific CSC project;
open any bucket (double click) and view its content (uploaded files or folders). Any file can be downloaded or shared using the download link. From this view, you can also download the entire bucket, delete files or upload new files and folders.
In the User information page you can:
in Currently Consumes view statistics about the selected CSC project resource usage: billing unit consumption and the total project storage usage (default storage 10 TiB);
in Project usage you can view the SD Connect Project Identifier, an ID associated to your CSC project. This ID is required when you want to share containers with other CSC projects using SD Connect user interface. It does not contain sensitive information, thus it can be shared with your colleagues or collaborators via email.
access the Sharing API tokens through which you can generate a temporary token (necessary for data upload programmatically, using Swift client. For more info check below).
In the Shared page:
- in Shared to the project you can view the buckets that other CSC projects (belonging to your colleagues or collaborators) shared with you. Next to the bucket name, under Bucket Owner, it displays the ID associated with the CSC project to which the bucket belongs to (also called SD Account). With double click you can access the bucket and view the content (if you have reading access) or add files to the container (if you have edits rights).
All the buckets listed here are owned by other users which can decide when to revoke your access. You will not be able to access the file from SD Desktop until you make a copy of the bucket.
- in Shared with the project you can view the buckets which you shared with other CSC projects. In this case you own the shared buckets and you can decide when to revoke access.
Encryption with CSC encryption key - User Interface
With the following workflow, you can use a graphical user interface (Crypt4sds GUI) developed by CSC to encrypt and import a copy of your data to SD Desktop.
As this is a simplified workflow, it is designed to allow easy and safe encryption and automated decryption only using the Sensitive Data Services. Using this workflow does not allow you to include your encryption keys. Thus, you will not be able to decrypt this copy of the data. If you are interested in using your own encryption key pair check the following paragraph
Step 1: You can download the user interface specific to your operating system from the GitHub repository:
Step 2: Verify that the program has been digitally signed by CSC - IT Center for Science. After downloading and unzipping the file, you can find the Crypt4GH application in your download folder.
When you open the application you might encounter an error message. In this case, click on More info and verify that the publisher is CSC-IT Center for Science (or in Finnish CSC-Tieteen tietotekniikan keskus Oy) and then click on Run anyway.
- Step 3: Encrypt the files
With Crypt4GH GUI it is possible to encrypt only one file at the time. If you need to encrypt large datasets, check the instructions on how to programmatically encrypt files with Crypt4GH CLI below.
Open the Encryption tool
Next, press the Select File button. This opens a file browser that you can use to select the file that will be encrypted. When the file is selected, press the Encrypt button. This encrypts the selected file.
Encryption creates a new encrypted file that is named by adding to the end extension .c4gh. For example, encrypting file my_data1.csv will produce a new, encrypted file with name my_data.csv.c4gh. Currently, Crypt4GH application does not provide a progress bar. If the file/zipped folder contains a big dataset, the encryption process can last for up to minutes.
The encrypted file is now ready to be uploaded to SD Connect.
Data upload using SD Connect User Interface
To upload encrypted data in SD Connect it is sufficient to use the drag and drop function (files or folders, less than 100 GB) in the browser page. Once the upload has started, a progress bar will visualize the status of the upload. For bigger datasets or files, you can upload files programmatically using the clients described below.
If you did not create a bucket yet, the user interface will automatically create a bucket named: upload-nnn (where nnn is replaced with 13 digit number based on creation time). Note that it is not possible to rename buckets.
If you create a new bucket use the following suggestions to name it:
Bucket names must be unique across all existing buckets in all projects in SD-Connect and Allas. If you can't create a new bucket, it's possible that some other project is already using the name you would like to use. To avoid this kind of situation it is good practice to include some project specific identifiers (e.g. project ID number or acronym) in the bucket names.
Avoid using spaces and special characters in bucket names. Preferred characters are Latin alphabets (a-z), numbers (0-9), dash (-), underscore (_) and dot (.). SD Connect can cope with other characters too, but they may cause problems in some other interfaces.
All bucket names are public, so please do not include any confidential information in the bucket names
Encryption with CSC encryption key- Command Line Interface
Files that have been encrypted with the CSC Sensitive Data Services public key, can be decrypted only when imported in SD Desktop, thus using CSC Sensitive Data Services. If you wish to encrypt the data to transfer them to other services, you need to plan the encryption in advance and use your own encryption key pair. For more information, check the Data Sharing section in these paragraph below and the Data encryption for data sharing paragraph.
For general information about using Crypt4GH at CSC check: * crypt4gh GIT site
Step 1: Install the latest version of Crypt4GH encryption tool
Python 3.6+ required to use the crypt4gh encryption utility. To install Python: https://www.python.org/downloads/release/python-3810/
If you have a working python installation and you have permissions to add libraries to your python installation, you can install Crypt4GH with command:
pip install crypt4gh
Step 2: Download CSC Sensitive Data services Public key
Download CSC Sensitive Data Services public key from the link here, or copy/paste the three lines from the box below into a new file. The file should be saved in text-only format. Here we assume that the key file is named as csc-sd-services.pub.
-----BEGIN CRYPT4GH PUBLIC KEY----- dmku3fKA/wrOpWntUTkkoQvknjZDisdmSwU4oFk/on0= -----END CRYPT4GH PUBLIC KEY-----
Step 3: Encrypt a file
Cryp4GH is able to use several public keys for encryption. This can be very handy in cases were the encrypted data needs to be used by several users or services. Unfortunately SD Connect is not yet compatible with encryption with multiple keys. Because of that you must do the encryption using the CSC Sensitive Data Services public key only, if you plan to upload the data to SD Connect. In this case the syntax of the encryption command is:
crypt4gh encrypt --recipient_pk public-key < input > output
crypt4gh encrypt --recipient_pk csc-sd-services.pub < my_data1.csv > my_data1.csv.c4gh
Data encryption and upload with Allas help tool: a-put
The allas client utilities is a set of command line tools that can be installed and used in Linux and MacOSX machines. If you have these tools, you can use data upload command a-put with command line option --sdx to upload data to Allas/SD Connect so that the uploaded files are automatically encrypted with the CSC Sensitive Data Services public key before the upload. The public key is included to the tool so that you don't need to download your own copy of the key.
You can upload a single file with command like:
a-put --sdx my_data1.csv
You can also upload complete directories and define a specific target bucket. For example the command below will encrypt and upload all the files in directory my_data to SD Connect into bucket 1234_SD_my_data.
a-put --sdx my_data -b 1234_SD_my_data
Programmatic data upload and download with SD Connect
To upload encrypted data to SD Connect programmatically, you need to use your CSC credentials (CSC username and password).
SD Connect is a user interface for CSC Allas object storage. In practice this means that any data which you can access in Allas, can also be imported to SD Desktop with SD-Connect Downloader.
Thus you can use any of the Allas compatible clients to upload your data to SD-Connect programmatically. However, as SD Connect is based on Swift protocol, it is recommended that you use upload tools that are based on swift protocol.
- rclone (with normal Allas configuration)
- swift command line client
- Horizon web interface in https://pouta.csc.fi
- CyberDuck Graphical data transport tool for Windows and Mac.
Note that if you use these tools, you must encrypt your sensitive data, before you upload it to SD Connect.
Data Sharing with SD Connect user interface
For more information about encryption with private keys check: Data encryption for data sharing.
SD Connect user interface provides a simple way of sharing containers between different projects.
To share a container with another CSC project (and thus one of your colleagues or collaborators) you need to:
know in advance the SD Account of the CSC project you want to share a container with (see above in User Interface paragraph, where this can be found)
in the browser page click on the share button on the row of the container in the container listing
Clicking the button takes you to Share the container view, in which the user needs to specify the project/projects the container is going to be shared to, and what rights to give:
select Grant read permission if you want your colleagues to be able to see the files and folder inside the container and download them
select also Grant write permissions if you want your colleague to be able to add files and folder to the shared container select. If you select only this option, your colleague or collaborator will be only able to add files to the container, but not be able to see its content.
in Project Indetifiers to share with add the SD Connect Project Identifier of the project you want to share the container with
Next click on Share
At this point the user interface will redirect you to the Shared page and the container will be listed under Shared from project. Here you will be able to interrupt the sharing clicking on Revoke container access.
|Decryption||I cannot decrypt the data I downloaded from CSC services.||You can decrypt the data only if you have used your own public key for the encryption. If you used a CSC Sensitive Data Services public for the encryption, the data can be decrypted only in SD Desktop. In that case, the decryption is automatic. If you used your collaborator’s public key to encrypt the data, only they can decrypt the data with their private key.|
|Encryption||Encryption takes a long time.||For large files and datasets, the encryption can take up to a few minutes.|
|Folder encryption||I can not select the folder I want to encrypt with Crypt4GH graphical user interface.||It is not possible to encrypt an entire folder, just single files|
|Data upload||I am trying to upload a big file/folder with the user interface and the upload is stuck.||To upload files or folders that are larger than 200 GB, the data should be uploaded programmatically.|
|Low upload speed (programmatically)||Average upload speed can go from 100 to 200 MiB/s. Specific scripts can be used to optimize the upload of large files.|
|Bucket||I am not able to create a new bucket.||1) Check in MyCSC portal that your current project has service access for Allas 2) Try to use a bucket name that is unique and doesn’t contain special characters. 3) Select the correct project in SD Connect user interface|
|I cannot find my bucket.||Check if the bucket is stored under a different project. If someone has shared the bucket with you, you can find it under the ‘Shared to’ section and copy it. If someone has shared the bucket with you, they could have revoked the sharing.|
|I cannot upload data into my bucket||Check that your project still has storage space left.|
|Shared bucket||I cannot upload data into a shared bucket.||Your colleague didn’t add editing rights when they shared the bucket.|
|I cannot see the content of a shared bucker.||Your colleague didn’t add reading rights when they shared the bucket.|