minimap2

Description

Minimap2 is a fast general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It can be used for:

mapping of accurate short reads (preferably longer that 100 bases)
mapping 1kb genomic reads at error rate 15% (e.g. PacBio or Oxford Nanopore genomic reads)
mapping full-length noisy Direct RNA or cDNA reads
mapping and comparing assembly contigs or closely related full chromosomes of hundreds of megabases in length.

minimap2

License

Free to use and open source under MIT License

Available

Puhti: 2.24
Chipster graphical user interface

Usage

In Puhti, minimap2 can be taken in use as part of the biokit module collection:

module load biokit

The biokit modules sets up a set of commonly used bioinformatics tools, including MInimap2. (Note however that there are bioinformatics tools in Puhti, that have a separate setup commands.). Once biopkit is loaded, Minimap2 starts in with command:

minimap2

Without any options, minimap2 takes a reference database and a query sequence file as input and produce approximate mapping, without base-level alignment (i.e. no CIGAR), in the PAF format:

minimap2 ref.fa query.fq > approx-mapping.paf

If you wish to get the output in sam format you can use option -a.

For different data types minimap2 needs to be tuned for optimal performance and accuracy. With option -x you can take in use case specific parameter sets, pre-defined and recommended by the minimap2 developers.

Map long noisy genomic reads (map-pb and map-ont).

PacBio subreads (map-db):

minimap2 -ax map-pb  ref.fa pacbio-reads.fq > aln.sam

Oxford Nanopore reads (map-ont):

minimap2 -ax map-ont ref.fa ont-reads.fq > aln.sam

Map long mRNA/cDNA reads (splice)

PacBio Iso-seq/traditional cDNA

minimap2 -ax splice -uf ref.fa iso-seq.fq > aln.sam

Nanopore 2D cDNA-seq

minimap2 -ax splice ref.fa nanopore-cdna.fa > aln.sam

Nanopore Direct RNA-seq

minimap2 -ax splice -uf -k14 ref.fa direct-rna.fq > aln.sam

mapping against SIRV control

minimap2 -ax splice --splice-flank=no SIRV.fa SIRV-seq.fa

Find overlaps between long reads (ava-pb and aca-ont)

PacBio read overlap

minimap2 -x ava-pb  reads.fq reads.fq > ovlp.paf

Oxford Nanopore read overlap

minimap2 -x ava-ont reads.fq reads.fq > ovlp.paf

Map short accurate genomic reads (sr)

Note, minimap2 does work well with short spliced reads.

single-end alignment

minimap2 -ax sr ref.fa reads-se.fq > aln.sam

paired-end alignment

minimap2 -ax sr ref.fa read1.fq read2.fq > aln.sam

paired-end alignment

minimap2 -ax sr ref.fa reads-interleaved.fq > aln.sam

Full genome/assembly alignment asm5

assembly to assembly

minimap2 -ax asm5 ref.fa asm.fa > aln.sam

Example batch script for Puhti

In Puhti, minimap2 jobs should be run as batch jobs. Below is a sample batch job file, for running a minimap2 paired end alignment in Puhti.

#!/bin/bash -l
#SBATCH --job-name=minimap2
#SBATCH --output=output_%j.txt
#SBATCH --error=errors_%j.txt
#SBATCH --time=04:00:00
#SBATCH --partition=small
#SBATCH --ntasks=1
#SBATCH --nodes=1  
#SBATCH --cpus-per-task=8
#SBATCH --account=<project>
#SBATCH --mem=16000
#

module load biokit
minimap2 -t $SLURM_CPUS_PER_TASK -ax splice -uf ref.fa iso-seq.fq > aln.sam

In the batch job example above one task (-n 1) is executed. The Minimap2 job uses 8 cores (--cpus-per-task=8 ) with total of 16 GB of memory (--mem=16000). The maximum duration of the job is four hours (-t 04:00:00 ). All the cores are assigned from one computing node (--nodes=1 ). In addition to the resource reservations, you have to define the billing project for your batch job. This is done by replacing the with the name of your project. (You can use command csc-workspaces to see what projects you have in Puhti).

You can submit the batch job file to the batch job system with command:

sbatch batch_job_file.bash

See the Puhti user guide for more information about running batch jobs.

Support

servicedesk@csc.fi

Manual

More information about Mimimap2 can be found from the Minimap2 home page.