QIIME
Description
QIIME (Quantitative Insights Into Microbial Ecology) is a package for comparison and analysis of microbial communities, primarily based on high-throughput amplicon sequencing data (such as SSU rRNA) generated on a variety of platforms, but also supporting analysis of other types of data (such as shotgun metagenomic data). QIIME takes users from their raw sequencing output through initial analyses such as OTU picking, taxonomic assignment, and construction of phylogenetic trees from representative sequences of OTUs, and through downstream statistical analysis, visualization, and production of publication-quality graphics.
On 2017 a totally rewritten version QIIME2 was released. The development of the original QIIME version has stopped. QIIME2 is strongly suggested for most uses.
License
Free to use and open source under BSD 3-Clause License.
Available
- Puhti: 1.9.1, 2022.8
Usage
In Puhti To use QIIME1 do:
module load qiime1
To use QIIME2
module load qiime2
After that you can start Qiime2 with command:
qiime
Please check Qiime2 home page for more instructions.
Note that many Qiime tasks involve heavy computing. Thus, these tasks should be executed as batch jobs. Qiime needs to have access to a local node specific file system for handling temporary data. This kind of directory is available on the NVMe nodes of Puhti. Therefore, you must include a request for NVMe space in your batch job file.
The easiest way to start using Qiime is to use command sinteractive
to launch an interactive batch job:
sinteractive -i
csc-workspaces
cd /scratch/<project>
module load qiime2
Interactive batch jobs include local temporary disk that is mandatory for running Qiime.
In case of normal batch jobs, you must reserve NVMe disk area that will be used as $TMPDIR area.
For example, to reserve 100 GB of local disk space:
#SBATCH --gres=nvme:100
export TMPDIR="$LOCAL_SCRATCH"
#!/bin/bash
#SBATCH --job-name=qiime_denoise
#SBATCH --account=<project>
#SBATCH --time=01:00:00
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --output==qiime_out_8
#SBATCH --error=qiime_err_8
#SBATCH --cpus-per-task=8
#SBATCH --mem=16G
#SBATCH --partition=small
#SBATCH --gres=nvme:100
#set up qiime
module load qiime2
export TMPDIR="$LOCAL_SCRATCH"
# run task. Don't use srun in submission as it resets TMPDIR
qiime dada2 denoise-single \
--i-demultiplexed-seqs demux.qza \
--p-trim-left 0 \
--p-trunc-len 120 \
--o-representative-sequences rep-seqs-dada2.qza \
--o-table table-dada2.qza \
--o-denoising-stats stats-dada2.qza \
--p-n-threads $SLURM_CPUS_PER_TASK
Maximum running time is set to 1 hour (--time=01:00:00
). As QIIME2 uses threads based parallelization,
the job is requested to use one task (--ntasks=1
) where all cores need to be in the same node (--nodes=1
).
This one task will use eight cores as parallel threads --cpus-per-task=8
that
can use in total up to 16 GB of memory (--mem=16G
). Note that the number of cores to be used needs to be defined in
actual qiime command, too. That is done with Qiime option --p-n-threads
. In this case we use $SLURM_CPUS_PER_TASK
variable that contains the cpus-pre-task value ( we could as well use --p-n-threads 8
but then we have to remember
to change the value if the number of reserved CPUs is changed).
The job is submitted to the to the batch job system with sbatch
command. For example, if the batch job
file is named as qiime_job.sh then the submission command is:
sbatch qiime_job.sh