Using Allas with S3 using Python boto3 library
boto3 is a Python library for working S3 storage and other AWS services. boto3
works with Allas over S3 protocol.
In general for analyzing Allas data with Python:
* Save input data to Allas, possibly using other Allas tools
* Download the data from Allas to the local computer (inc. supercomputers) with boto3
.
* Analyze the data using the local copy of data.
* Write your results to local disk.
* Upload the new files to Allas with boto3
.
Some Python libraries might support also direct reading and writing with S3, for example AWS SDK for Pandas, GDAL-based Python libraries for spatial data analysis.
This page shows how to:
- Install
boto3
- Set up S3 credentials
- Create
boto3 client
- List buckets and objects
- Create a bucket
- Upload and download an object
- Remove buckets and objects
Note, that S3 and SWIFT APIs should not be mixed.
Installation
boto3
is available for Python 3.8 and higher and can be installed with pip
or conda
.
pip install boto3
boto3
in CSC supercomputers
Some existing Python modules might have boto3
pre-installed, for example geoconda.
To other modules, it is possible to add boto3
with pip.
Configuring S3 credentials
If you have not used Allas with S3 before, then first create S3 credentials. The credentials are saved to ~/.aws/credentials
file, so they need to be set only once from a new computer or when changing project. The credential file can be also copied from one computer to another.
In CSC supercomptuers allas
module can be used with allas-conf --mode s3cmd
to configure the credentials.
boto3
usage
Create boto3 resource
For all next steps, first boto3 resource must be created.
import boto3
s3_resource = boto3.resource('s3', endpoint_url='https://a3s.fi')
Create a bucket
Create a new bucket using the following script:
s3_resource.create_bucket(Bucket="examplebucket")
List buckets and objects
List all buckets belonging to a project:
for bucket in s3_resource.buckets.all():
print(bucket.name)
And all objects belonging to a bucket:
my_bucket = s3_resource.Bucket('examplebucket')
for my_bucket_object in my_bucket.objects.all():
print(my_bucket_object.key)
Download an object
Download an object:
s3_resource.Object('examplebucket', 'object_name_in_allas.txt').download_file('local_file.txt')
Upload an object
Upload a small file called my_snake.txt
to the bucket snakebucket
:
s3_resource.Object('examplebucket', 'object_name_in_allas.txt').upload_file('local_file.txt')
Remove buckets and objects
Delete all objects from a bucket:
my_bucket = s3_client.Bucket('examplebucket')
my_bucket.objects.all().delete()
Delete a bucket, must be empty:
s3_resource.Bucket('examplebucket').delete()