Skip to content

Kraken

Kraken is a sequence classifier that assigns taxonomic labels to DNA sequences. Kraken examines the k-mers within a query sequence and uses the information within those k-mers to query a database. That database maps k-mers to the lowest common ancestor of all genomes known to contain a given k-mer.

License

Free to use and open source under MIT License.

Available

  • Puhti: 2.1.2

Usage

Kraken in included in the biokit module. To set it up, run the command:

module load biokit

This loads the Kraken2 package which can be started with the command kraken2. For example:

kraken2 --help

There are several Kraken2 reference databases available on Puhti. By default, Kraken2 uses the standard database that is based on taxonomic information and complete genomes in RefSeq for the bacterial, archaeal, and viral domains, along with the human genome and a collection of known vectors (UniVec_Core).

Available databases in Puhti are:

Name Mem. request Description
standard 40 GB NCBI taxonomic information, as well as the complete genomes in RefSeq for the bacterial, archaeal, and viral domains, along with the human genome and a collection of known vectors (UniVec_Core).
krak_microb 44 GB RefSeq bacterial, archea, viral, fungi and protozoa
16S_Greengenes_k2db 1 GB Greengenes 16S data
16S_RDP_k2db 1 GB RDP 16S data
16S_SILVA132_k2db 1 GB Silva 132 16S data
16S_SILVA138_k2db 1 GB Silva 138 16S data
minikraken_8GB_20200312 1 GB  

Using Kraken2 with a large reference database will require plenty on memory. For example, jobs with the standard Kraken2 database require 40 GB of memory. Thus, Kraken should in practice always be executed as a batch job. Below is a sample Kraken job using 4 cores 40 GB of memory and 6 hours of runtime:

#!/bin/bash -l
#SBATCH --job-name=kraken2
#SBATCH --output=output_%j.txt
#SBATCH --error=errors_%j.txt
#SBATCH --time=06:00:00
#SBATCH --partition=small
#SBATCH --ntasks=1
#SBATCH --nodes=1  
#SBATCH --cpus-per-task=4
#SBATCH --account=project_123456
#SBATCH --mem=40000

module load biokit
kraken2 -db standard --threads $SLURM_CPUS_PER_TASK input.fasta --output results.txt

You can submit the batch job file to the batch job system with the command:

sbatch batch_job_file.bash

See the Puhti user guide for more information about running batch jobs.

More information