Skip to content

Prokka

Prokka is a software tool to annotate bacterial, archaeal and viral genomes.

License

Free to use and open source under GNU GPLv3.

Available

  • Puhti: 1.4.6, 1.14.6

Usage

On Puhti, Prokka should be executed as a batch job. An interactive batch job for testing Prokka can be started with the command:

sinteractive -i -m 8G

To activate Prokka environment, run the command:

module load prokka

After that you can launch Prokka with the command prokka. By default, Prokka tries to use 8 computing cores, but in this interactive batch job case, you have just one core available. Therefore, you should always define the number of cores that Prokka will use with option -cpus.

For example:

prokka --cpus 1 contigs.fasta

Larger analyses should be executed as a batch job utilizing several cores. A sample batch job script (batch_job_file.bash) is provided below:

#!/bin/bash -l
#SBATCH --job-name=prokka
#SBATCH --output=output_%j.txt
#SBATCH --error=errors_%j.txt
#SBATCH --time=24:00:00
#SBATCH --ntasks=1
#SBATCH --nodes=1  
#SBATCH --cpus-per-task=8
#SBATCH --mem=16000
#SBATCH --account=your_project_name

#set up prokka
module load prokka

#Run prokka
prokka --cpus $SLURM_CPUS_PER_TASK --outdir results_case1 --prefix mygenome contigs_case1.fa

In the batch job example above one Prokka task (--ntasks=1) is executed. The job reserves 8 cores (--cpus-per-task=8) with total of 16 GB of memory (--mem=16000). The maximum duration of the job is 24 hours (--time 24:00:00). All the cores are assigned from one computing node (--nodes=1). In addition to the resource reservations, you have to define the billing project for your batch job. This is done by replacing your_project_name with the name of your project. You can use command csc-projects to see what CSC projects you have access to.

You can submit the batch job file to the batch job system with the command:

sbatch batch_job_file.bash

See the Puhti user guide for more information about running batch jobs.

More information