Moving data between IDA and CSC computing environment
IDA is a general storage service for research data. It is part of the Fairdata.fi research data management environment and not directly linked to the CSC computing environment. Use of the IDA service requires that the stored data are described as research dataset with Fairdata Qvain tool for others to discover. Even though CSC produces and hosts the IDA service and the IDA storage space is applied to a CSC project, the storage space is granted by the home organization of the user (Finnish higher education institute or state research institute. IDA users can use the storage space from both their own computers and from the servers hosted by CSC. More information about applying for IDA storage space can be found from through IDA's website:
IDA can be used with a web browser user interface as well as with a command line client tool ida
that is available on the computing servers hosted by CSC (Puhti and Mahti). IDA client is also available for download from GitHub.
The storage of files in IDA can be managed using the web and command line client interfaces; however, the contents of the stored files can't be modified directly. Instead, a stored file must be first retrieved from IDA to either CSC supercomputers or some other computer in order to analyse or modify the data. In that sense IDA resembles very much the Allas storage environment. However, IDA and Allas are designed to serve different use cases:
- Allas is low-level and high-capacity storage service for utilising research data at CSC and other computing environments.
- IDA is designed for storing and sharing well defined and stable datasets, that are not used and modified on a daily basis.
In a typical research project the raw data is first stored in Allas. When the research work has produced a more refined dataset from the original data, it can be stored to IDA so that metadata and persistent identifiers can be associated with the data via additional services.
Each IDA project has two storage areas: staging area and frozen area. The staging area is intended for collecting and organizing data in preparation for longer term storage and publication. Data files that will not change anymore can be moved to frozen area to be stored in an immutable state.
Files in the frozen area are visible to other Fairdata services and can be included in datasets using the Qvain metadata tool. Files in the staging area are not visible to other services and cannot be included in datasets.
Configuring and using IDA in CSC supercomputers
The IDA client and configuration tools are activated with command:
module load ida
When you start using IDA client in CSC supercomputers for the first time, you must set up your IDA connection by running the following command:
ida_configure
Once you have configured the connection, you can start using the ida
command line client that enables data transport between the supercomputer and IDA. Data can be uploaded and downloaded from the IDA staging area. In the case of frozen area, only download is possible. Note that some key features of IDA, like moving data from staging area to the frozen area is possible only through the IDA WWW interface.
The basic syntax of the ida commands is:
ida task -options target_in_ida target_in_puhti
To check the content of you staging area in IDA, use the command:
ida info /
-f
to the ida command makes the command reference the frozen area instead of the staging area. For example the following command would give you information about the file test2, locating in the root of the frozen area:
[kkayttaj@puhti-login12 ~] ida info -f /test2 project: 2000136 pathname: /test2 area: frozen type: file pid: 5bc456a74ba89743214993f23695474 size: 113926178937 encoding: application/octet-stream modified: 2018-10-15T08:17:53Z frozen: 2018-10-15T08:58:15Z
Uploading and downloading files and directories between Puhti and IDA is done with the commands:
ida upload target_in_ida local_file ida download target_in_ida local_file
For example in Puhti, the command:
ida upload /test123/data1 test_data
If you download a directory, the downloaded files are stored to a zip archive file. Thus you should define the local target file to have name extension .zip. For example:
ida download /project1 project1_data.zip
More information about using and configuring the IDA client, with additional examples, can be found from https://github.com/CSCfi/ida2-command-line-tools