Bioapplications puhti
This tutorial page provides step-by-step instructions on how to run Singularity containers of selected bio-applications in Puhti supercomputer. The selected examples shown below would serve as recipe for building your own customised singularity jobs.
DeepVariant pipeline
DeepVariant pipeline (Poplin et al., Nature biotechnology, 2018) is used to perform variant calling on WGS and WES data sets. More information about the deepvariant programmes can be found here
One needs to get DeepVariant docker image, models and test data in order to run the pipeline. Additionally, other prerequisites for running deepvariant method includes 1) obtaining a reference genome in FASTA format and its corresponding index file (.fai). 2) An aligned reads file in BAM format and its corresponding index file (.bai).
Convert docker image into Singularity image on your local machine
One way to build a Singularity image is to download the deepvariant docker image to local registry and then convert it to singularity one to avoid any possible erros with docker images pulled directly with singularity build command from google registry. Please note that one has to do these image conversions in your local machine or virtual machine in cPouta as Puhti does not grant root access to users.
Pull DeepVariant image and push it into local registry as below:
sudo docker pull gcr.io/deepvariant-docker/deepvariant:0.8.0
sudo docker tag gcr.io/deepvariant-docker/deepvariant:0.8.0 localhost:5000/deepvariant:latest
sudo docker run -d -p 5000:5000 --restart=always --name registry registry:2
sudo docker push localhost:5000/deepvariant:latest
then create a definition file (deffile) as follows:
Bootstrap: docker
Registry: http://localhost:5000
Namespace:
From:deepvariant:latest
sudo SINGULARITY_NOHTTPS=1 singularity build deepvariant.simg deffile
wget https://a3s.fi/pilot_projects/Deepvariant_singularity.zip
Prepare a batch script to run deepvariant pipeline on Puhti (file: deepvariant_puhti.sh)
#!/bin/bash
#SBATCH --time=00:05:00
#SBATCH --partition=test
#SBATCH --account=project_xxx
export TMPDIR=$PWD
singularity -s exec -B $PWD:/data \
deepvariant.simg \
/opt/deepvariant/bin/run_deepvariant \
--model_type=WGS \
--ref=/data/testdata/ucsc.hg19.chr20.unittest.fasta \
--reads=/data/testdata/NA12878_S1.chr20.10_10p1mb.bam \
--regions "chr20:10,000,000-10,010,000" \
--output_vcf=output.vcf.gz \
--output_gvcf=output.g.vcf.gz
Submit job using sbatch command
sbatch deepvariant_puhti.sh
Please note that one can use gpu version of deepvariant with the following sbatch script
#!/bin/bash
#SBATCH --time=00:05:00
#SBATCH --partition=gputest
#SBATCH --gres=gpu:v100:1
#SBATCH --account=project_xxx
export TMPDIR=$PWD
singularity -s exec --nv -B $PWD:/data \
deepvariant_gpu.simg \
/opt/deepvariant/bin/run_deepvariant \
--model_type=WGS \
--ref=/data/testdata/ucsc.hg19.chr20.unittest.fasta \
--reads=/data/testdata/NA12878_S1.chr20.10_10p1mb.bam \
--regions "chr20:10,000,000-10,010,000" \
--output_vcf=output.vcf.gz \
--output_gvcf=output.g.vcf.gz
Deepvariant as an interactive job in Puhti
One can run singularity containers also in the interactive mode. For example, deepvariant can be run as follows:
Download and unzip the deepvariant folder with data and singularity images in the login node
wget https://a3s.fi/pilot_projects/Deepvariant_singularity.zip
unzip Deepvariant_singularity.zip
sinteractive -i
cd Deepvariant_singularity
export TMPDIR=$PWD
singularity shell -B $PWD:/data deepvariant.simg
export PATH=/opt/deepvariant/bin:$PATH
cd /data/testdata/
run_deepvariant --model_type=WGS --ref=ucsc.hg19.chr20.unittest.fasta --reads=NA12878_S1.chr20.10_10p1mb.bam --regions "chr20:10,000,000-10,010,000" --output_vcf=output.vcf.gz --output_gvcf=output.g.vcf.gz
exit
exit