Using Allas in batch jobs
The Allas initiation command allas-conf
opens an Allas connection that is valid for eight hours.
In the case of interactive usage this eight-hour limit is not problematic as allas-conf can be
executed again to extend the validity of the connection.
In the case of batch jobs, the situation is different, as the execution of a batch job can take several days, and in some cases, it may take more than eight hours before the job even starts. In these cases, you should open Allas connection with command:
allas-conf -k
-k
indicates that the password, entered for allas-conf, will be
stored in the environment variable $OS_PASSWORD
. With this variable defined, you no longer need to
define the password when you re-execute allas-conf with the -k option and the Allas project name.
You can define the project name either explicitly:
allas-conf -k project_2012345
allas-conf -k $OS_PROJECT_NAME
Note that if you mistype your password when using the -k option, you must use unset
command to reset the OS_PASSWORD variable before
you can try again:
unset OS_PASSWORD
-f
to the
command, to skip certain internal checks that are not compatible with batch jobs.
Further, allas-conf is just an alias of a source command that reads the Allas configuration script allas_conf
.
This aliased command is not available in batch jobs, so instead of allas-conf, you must use the command:
Puhti:
source /appl/opt/csc-cli-utils/allas-cli-utils/allas_conf -f -k $OS_PROJECT_NAME
source /appl/opt/csc-tools/allas-cli-utils/allas_conf -f -k $OS_PROJECT_NAME
Thus after opening an Allas connection with the commands
module load allas
allas-conf -k
In a-commands (a-put, a-get, a-list, a-delete), this feature is included, so you do not need to add the
configuration commands to the batch job script, but you must still remember to run allas-conf -k
before
submitting the job:
module load allas
allas-conf -k
sbatch my_long_job.sh
#!/bin/bash
#SBATCH --job-name=my_allas_job
#SBATCH --account=project_2012345
#SBATCH --time=48:00:00
#SBATCH --mem-per-cpu=1G
#SBATCH --partition=small
#SBATCH --output=allas_output_%j.txt
#SBATCH --error=allas_errors_%j.txt
#download data
a-get 178-data-bucket/dataset34/data2.txt.zst
#do the analysis
my_analysis_command -in dataset34/data2.txt -outdir results34
#upload results
a-put -b 178-data-bucket results34
If you use rclone or swift instead of the a-commands, you need to add the source commands to your script. In this case, the batch job script for Puhti could look like:
#!/bin/bash
#SBATCH --job-name=my_allas_job
#SBATCH --account=project_2012345
#SBATCH --time=48:00:00
#SBATCH --mem-per-cpu=1G
#SBATCH --partition=small
#SBATCH --output=allas_output_%j.txt
#SBATCH --error=allas_errors_%j.txt
#make sure connection to Allas is open
source /appl/opt/csc-cli-utils/allas-cli-utils/allas_conf -f -k $OS_PROJECT_NAME
#download input data
rclone copy allas:178-data-bucket/dataset34/data2.txt ./
#do the actual analysis
my_analysis_command -in dataset34/data2.txt -outdir results34
#make sure connection to Allas is open
source /appl/opt/csc-cli-utils/allas-cli-utils/allas_conf -f -k $OS_PROJECT_NAME
#upload results to allas
rclone copyto results34 allas:178-data-bucket/
/appl/opt/csc-tools/allas-cli-utils/allas_conf
instead of /appl/opt/csc-cli-utils/allas-cli-utils/allas_conf
in all places where you need to make sure the connection is open.