Skip to content

Kraken

Description

Kraken is a sequence classifier that assigns taxonomic labels to DNA sequences. Kraken examines the k-mers within a query sequence and uses the information within those k-mers to query a database. That database maps k-mers to the lowest common ancestor of all genomes known to contain a given k-mer.

License

Free to use and open source under MIT License.

Version

  • Puhti: 2.1.2

Usage

Kraken in included in the biokit module. To set it up, run command:

module load biokit

Now you Kraken2 starts with commad kraken2. For example:

kraken2 --help

There are several Kraken2 reference databases available in Puhti. By default Kraken2 uses the standard database that is based on taxonomic information and complete genomes in RefSeq for the bacterial, archaeal, and viral domains, along with the human genome and a collection of known vectors (UniVec_Core).

Available databases in Puhti are:

name Mem. request description
standard 40 GB NCBI taxonomic information, as well as the complete genomes in RefSeq for the bacterial, archaeal, and viral domains, along with the human genome and a collection of known vectors (UniVec_Core).
krak_microb 44 GB RefSeq bacterial, archea, viral, fungi and protozoa
16S_Greengenes_k2db 1 GB Greengenes 16S data
16S_RDP_k2db 1 GB RDP 16S data
16S_SILVA132_k2db 1 GB Silva 132 16S data
16S_SILVA138_k2db 1 GB Silva 138 16S data
minikraken_8GB_20200312 1 GB  

Using Kraken2 with a large reference database will require plenty on memory. For example jobs with the standard Karken2 database require 40 GB of memory. Thus Kraken should in practice always be executed as a batch job. Below is a sample Karaken job using 4 cores 40 GB of memory and 6 hours or runtime:

#!/bin/bash -l
#SBATCH --job-name=kraken2
#SBATCH --output=output_%j.txt
#SBATCH --error=errors_%j.txt
#SBATCH --time=06:00:00
#SBATCH --partition=small
#SBATCH --ntasks=1
#SBATCH --nodes=1  
#SBATCH --cpus-per-task=4
#SBATCH --account=project_123456
#SBATCH --mem=40000
#

module load biokit
kraken2 -db standard --threads $SLURM_CPUS_PER_TASK input.fasta --output results.txt

You can submit the batch job file to the batch job system with command:

sbatch batch_job_file.bash
See the Puhti user guide for more information about running batch jobs.

More information