a-commands, easy and safe
The Allas object storage system can be used in multiple ways and for many purposes. In many cases, using Allas efficiently requires that the user know the features of both the object storage system and the software or protocol used to manage the data in Allas.
For users who simply want to use Allas for storing data that is in the CSC computing environment, CSC provides a set of commands for managing and moving data between the CSC computing environment and Allas:
a-command | help text | Function |
---|---|---|
a-put | help | Upload a file or directory to Allas |
a-check | help | Check if all the objects, that a-put should have created, are found in Allas |
a-list | help | List buckets and objects in Allas |
a-publish | help | Upload a file to Allas into a bucket that allows public access over the internet |
a-flip | help | Upload a file temporarily to Allas into a bucket that allows public access over the internet |
a-get | help | Download a stored dataset (object) from Allas |
a-find | help | Search and locate data uploaded with a-put |
a-delete | help | Delete an object in Allas |
a-info | help | Display information about an object in Allas |
a-access | help | Control access permissions of a bucket in Allas |
a-stream | help | Stream the content of an object to standard output |
a-encrypt | help | Make an encrypted copy of an object uploaded in Allas |
In addition to the above commands, there are separate tools for other purposes:
- allas_conf : Set up and open a connection to Allas
- allas-backup : Create a backup copy of a local dataset in a backup repository in Allas.
- allas-mount : Mount a bucket in allas to be used as a read-only directory in the local environment.
- allas-health-check : Check the integrity of over 5 GB objects in Allas.
- allas-dir-to-bucket : copy a local file or directory to Allas. Parallel upload processes are used for over 5GB files.
If you use the a-commands outside the supercomputers, check the allas-cli-utils documentation for how to install these tools.
Below we discuss briefly of the most frequetly used features of a-commands. New features are added to a-commands every now and then and they may not be covered in the examples below. Use the help option --help
to check the command specific information. For example:
a-put --help
Example: Saving data from scratch directory to Allas
Opening a connection
In order to use these tools in Puhti and Mahti, first load a-commands:
module load allas
allas-conf
By default, allas-conf lists your projects that have access to Allas, but if you know the name of the project, you can also give it as an argument:
allas-conf project_201234
If you are running big, multistep processes (e.g. batch jobs), it may be that your data management pipeline takes more than eight hours. In those cases you can add option -k
to the allas-conf
command.
allas-conf -k
Copying data between Puhti scratch directory and Allas
Copying data from directory /scratch/project_201234/dataset_3 to Allas:
cd /scratch/project_201234
a-put dataset_3
a-list
a-list 201234-puhti-SCRATCH
a-get 201234-puhti-SCRATCH/dataset_3.tar
A commands in more detail
a-put uploads data to Allas
a-put
is used to upload data from the disk environment of Mahti and Puhti to
the Allas storage environment. The basic syntax of the command:
a-put directory_or_file
By default, this tool performs the following operations:
-
Ensure that there is a working connection to the Allas storage service and define the project that will be used to store the data.
-
In the case of a directory, the content of the directory is collected as a single file using the
tar
command. -
The packed data is uploaded to Allas using the
rclone
command and the Swift protocol.
By default, a-put uses the standard bucket and object names that depend on the username, project and location of the data uploaded:
- a) Data from /scratch in Puhti is uploaded to the bucket projectNumber-puhti-SCRATCH
- b) Data from /scratch in Mahti is uploaded to the bucket projectNumber-mahti-SCRATCH
- c) Data from /projappl in Puhti is uploaded to the bucket projectNumber-puhti-PROJAPPL
- d) Data from /projappl in Mahti is uploaded to the bucket projectNumber-mahti-PROJAPPL
- e) Data from $LOCAL_SCRATCH in Puhti is uploaded to the bucket projectNumber-puhti-LOCAL_SCRATCH
- f) In other cases, the data is uploaded to username-projectNumber-MISC
For example, for the user kkayttaj, a member of the project 12345, data located in the HOME directory is uploaded to the bucket kkayttaj-12345-MISC.
If you wish to use other than the standard bucket, you can define a bucket name with the option -b or
--bucket.
The compressed dataset is stored as one object. By default, the object name depends on the file name and location. The possible subdirectory path in Puhti or Mahti is included in the object name, e.g. a file called test_1.txt in /scratch/project_2012345 in Puhti can be stored using the commands:
cd /scratch/project_2012345
a-put test_1.txt
In this case, the file is stored in the bucket 2012345-puhti-SCRATCH. as the object test_1.txt
If you have another file called test_1.txt located in /scratch/project_2012345/kkayttaj/project2/, you can store it using the commands
cd /scratch/project_2012345/kkayttaj/project2/
a-put test_1.txt
cd /scratch/project_2012345/kkayttaj
a-put project2/test_1.txt
In addition to the actual data object, another object containing metadata is created. This metadata object has the same name as the main object with the extension _ameta. This metadata file is used by the other a-commands, and normally, it is not displayed to the user, but if you examine the buckets using tools like swift or rclone, you will see these metadata objects as well.
If you wish to use a name differing from the default object name, you can define it with the option -o or
--object:
cd /scratch/project_2012345
a-put project2/test_1.txt -b newbucket1 - o case1.txt -c
The command above uploads the file test_1.txt to Allas in the bucket newbucket1 as the object case1.txt.zst. As the option -c is used, the data is stored in zstd compressed format.
You can give several file or directory names for a-put and use * as a wildcard character when naming the data to be uploaded. Note that in these cases each item (file or directory) will be stored as a separate object. For example, say that we have a directory called job123 that contains files input1.txt, input2.txt and program.py. In addition there are directories output_dir_1 and output_dir_2 .
Command:
a-put job123/output_dir_1 jobs123/input1.txt
Similarly command
a-put job123/output_dir*
During upload datasets that are larger than 5 GB will be split and stored as several objects. This is done automatically to a bucket that is named by adding extension _segments
to the original bucket name. For example, if you would upload a large file to bucket kkayttaj-12345-MISC the actual data would be stored as several pieces into bucket kkayttaj-12345-MISC_segments. The target bucket (kkayttaj-12345-MISC) would contain just a front object that contains information what segments make the stored dataset. Operations performed to the front object are automatically reflected to the segments. Normally users don't need to operate with the segments buckets at all and objects inside these buckets should not be deleted or modified.
a-check
This command goes through the Allas object names, that a corresponding a-put
command would create, and then checks if object with the same name already exists in Allas. The main purpose of this command is to provide a tool to check if a large a-put
command was successfully executed. a-check
accepts the same command line options as a-put
.
For example, if a dataset is uploaded with command:
a-put job123/*
a-check job123/*
missing_job123_67889
(the number in the end is
just a random number).
This file of missing items can be used with a-put option --input-list, to continue the failed upload process:
a-put --input-list missing_job123_67889
You should note, that a-check does does not check if the actual contents of the object is correct. It checks only the object names, which may originate from some other sources.
In addition to checking, if upload was successful, a-check can be used to do a "dry-run" test for a-put to see, what objects will be created or replaced before running the actual a-put command.
a-list
a-list is used to show the names of buckets and objects stored to Allas. a-list is designed to be used for objects uploaded with a-put but it shows objects that have been uploaded with other tools too. However, it doesn't show the ameta metadata file files created by a-put, to keep the object listings shorter.
a-list examples
List all buckets belonging to a project:
a-list
a-list bucket_name
a-list bucket_name/beginning_of_the_object
-l
a-list -l
-d
make a-list to interpret /-characters in object names as pseudofolder separators.
a-list -d
a-publish
a-publish
copies a file to Allas in a bucket that can be publicly accessed. Thus, anyone with the address (URL) of the
uploaded data object can read and download the data with a web browser or tools like wget and curl.
a-publish works similarly to a-put with some differences:
1) a-publish can upload only files, not directories. 2) The access control of the target bucket is set so that it is available for any user in read-only mode.
The basic syntax:
a-publish file_name
The public URL of a data object will be:
https://a3s.fi/username-projectNumber-pub/object_name
An object uploaded with a-publish can be removed from Allas using the command a-delete.
A sample session with a-publish, uploading the document presentation.pdf to the default public bucket in Allas:
a-publish presentation.pdf Files to be uploaded: presentation.pdf Bucket: kkayttaj-1234567-pub Processing: presentation.pdf Checking total size of presentation.pdf. Please wait. Uploading data to allas. Transferred: 4.188M / 4.188 MBytes, 100%, 7.700 MBytes/s, ETA 0s Errors: 0 Checks: 0 / 0, - Transferred: 1 / 1, 100% Elapsed time: 500ms Confirming upload... presentation.pdf OK Adding metadata for uploaded presentation.pdf presentation.pdf uploaded to kkayttaj-1234567-pub Publick link: https://a3s.fi/kkayttaj-1234567-pub/presentation.pdf Upload ready
a-flip
a-flip
is a tool to make individual files temporarily available over the internet. It is intended for situations where you
want to make a copy of a file visible on the internet for a short while e.g. for copying to another platform shared with a co-worker.
a-flip copies a file to Allas into a bucket that can be publicly accessed. Thus, anyone with the address (URL) of the uploaded data object can read and download the data with a web browser or tools like wget and curl. a-flip works similarly to a-publish with some differences:
- Only the predefined bucket name (username-projectNumber-flip) can be used.
- Upon execution, it checks the content of the flip bucket and deletes objects that are older than two days.
The basic syntax:
a-flip file_name
https://a3s.fi/username-projectNumber-flip/file_name
a-find
The a-find
command lists and locates data that has been uploaded to Allas using a-put
.
The basic syntax:
a-find query_term
The query term is compared to the names and original paths of the files that have been uploaded to Allas, and matching objects are reported (but not downloaded).
The query term is processed as a regular repression where some characters, e.g. period (.), have a special meaning. The same regular expression syntax is used with e.g. the grep, awk and sed commands. The most commonly occurring special characters:
- Period (.) is used to define any single character.
- ^ marks the beginning of a line.
- $ marks the end of a line.
- [ ] matches any character inside the brackets. For example, [abc] would match a, b or c.
- [^ ] matches any character except the characters inside the brackets.
For example, [^abc] would select all rows that contain characters than are not a, b and c. - ** * ** matches zero or more of the preceding characters or expressions.
\{n,m\}
matches n to m occurrences of the preceding characters or expressions.
Options:
- -a, --all By default only the standard buckets, used by a-put, are searched. Option
--all
defines that all the buckets of the project will be included in the search. - -f, --files List the names of matching files inside the objects in addition to the object names.
- -p,--project project_ID Search matches in the buckets of the defined project instead of the currently configured project.
- -b, --bucket bucket_name By default, all default buckets used by
a-put
are searched. The option -bucket allows you to specify a single bucket for the search. Use this option also in cases where you have stored data in a bucket with a non-standard name. - -s, -silent Print only object names and the number of hits. If the -f option is used, print the object name and the matching file names on one row.
a-info shows information about an uploaded dataset
The command a-info
allows you to get information about a dataset that has been uploaded to Allas using a-put
.
a-info bucket/object_name
a-info
a-get retrieves stored data
This tool is used to download data that has been uploaded to the Allas service using the a-put
command.
The basic syntax:
a-get object_name
-target
option.
Options:
- -p, --project project_ID Retrieve data from the buckets of the defined project instead of the currently configured project.
- -f, --file file_name Retrieve only a specific file or directory from the stored dataset. Note: Define the full path of the file or directory within the stored object.
- -d --target_dir
If this option is defined, a new target directory is created and the data is retrieved there. - -t --target_file
Define a file name for the object for the object to be downloaded. - -l --original_location Retrieve the data to the original location in the directory structure.
- --asis Download the object without unpacking tar files and uncompressing zst compressed data.
- --s3cmd Use S3 protocol and s3cmd command for data retrieval in stead of Swift protocol and rclone.
At the moment, a-get can download only one object at a time. If you need to download large number of objects you need to use loops. For example to download all the objects in bucket bucket_123 , you could use commands:
#make a list of objects
a-list bucket_123 > object_list_bucket123
#use the list in for loop
for ob in $(cat object_list_bucket123)
do
a-get $ob
done
#remove the object list
rm object_list_bucket123
a-delete
a-delete is used to remove data that has been uploaded to Allas service using the a-put command. The basic syntax of the command is:
a-delete object_name
By default a-delete asks user to confirm the removal of an object. This checking can be skipped with option -f
.
If you want to remove a bucket, you can use option --rmb
. By default a-delete --rmb removes only empty buckets. If you want to delete non-empty bucket, you need to add option --FORCE
to the command.
a-access
By default, only project members can read and write the data in a bucket. Members of the project can grant read and write access to the bucket and the objects it contains, for other Allas projects or make the bucket publicly accessible to the internet.
a-access is a tool to control access permissions (swift protocol) of a bucket in Allas.
Syntax
a-access +/-type project_id bucket
- +r, +read
Grant read access to the bucket for the project. - +w, +write
Grant write access to the bucket for the project. - +rw, +read-write
Grant read and write access to the bucket for the project. - -r, -read
Remove read access from the bucket. - -w, -write
Remove write access from the bucket. - -rw, -read-write
Remove read and write access from the bucket to the project. - +p, +public Give public read-only access to the bucket.
- -p, -public Remove public read-only access to the bucket.
For example, to allow members of project: project_2001234 to have read-only access to bucket: my_data_bucket, you can use command:
a-access +r project_2001234 my_data_bucket
Note, that bucket listing tools don't show the bucket names of other projects, not even in cases were the project has read and/or write permissions to the bucket.
For example in this case a user, belonging to project project_2001234, don't see the my_data_bucket in the bucket list produced by command:
a-list
a-list my_data_bucket
a-access manages the access permissions only in the project and bucket level. Use swift post command for more sophisticated access control.
If you run a-access command for a bucket without any modification options, it will print out the current settings of the bucket.
Configuring your a-commands
A users can modify the default settings of a-commands by making a configuration file named as .a_tools_conf to their home directory. In this file you can set default values for many of the functions that are defined with a-put command options.
For example, if you are working mostly with files that would benefit from compression, you might like to use the --compress option with a-put. If you want this to be default setting you could create .a_tools_conf file that contains setting:
compression=1
a-put my_data.b
a-put --nc my_data.b
You can check most commonly used settings from this sample .a_tools_conf file. Copy the sample file to your home directory and un-comment and define the variables you wish to use.