Minimap2
Minimap2 is a fast general-purpose alignment program to map DNA or long mRNA sequences against a large reference database. It can be used for:
- mapping of accurate short reads (preferably longer than 100 bases)
- mapping 1kb genomic reads at error rate 15% (e.g. PacBio or Oxford Nanopore genomic reads)
- mapping full-length noisy Direct RNA or cDNA reads
- mapping and comparing assembly contigs or closely related full chromosomes of hundreds of megabases in length
License
Free to use and open source under MIT License.
Available
- Puhti: 2.24
- Chipster graphical user interface
Usage
On Puhti, Minimap2 can be used as part of the biokit
module collection:
module load biokit
The biokit module sets up a set of commonly used bioinformatics tools, including Minimap2. Note however that there are other bioinformatics tools on Puhti that have separate setup commands. Once biokit module is loaded, Minimap2 starts with the command:
minimap2
Without any options, minimap2
takes a reference database and a query sequence file as input and produce approximate mapping, without base-level alignment (i.e. no CIGAR), in the PAF format:
minimap2 ref.fa query.fq > approx-mapping.paf
If you wish to get the output in SAM format, you can use option -a
.
For different data types, Minimap2 needs to be tuned for optimal performance and accuracy.
With option -x
you can use case specific parameter sets, pre-defined and recommended by the Minimap2 developers.
Map long noisy genomic reads (map-pb and map-ont)
- PacBio subreads (map-db):
minimap2 -ax map-pb ref.fa pacbio-reads.fq > aln.sam
- Oxford Nanopore reads (map-ont):
minimap2 -ax map-ont ref.fa ont-reads.fq > aln.sam
Map long mRNA/cDNA reads (splice)
- PacBio Iso-seq/traditional cDNA
minimap2 -ax splice -uf ref.fa iso-seq.fq > aln.sam
- Nanopore 2D cDNA-seq
minimap2 -ax splice ref.fa nanopore-cdna.fa > aln.sam
- Nanopore Direct RNA-seq
minimap2 -ax splice -uf -k14 ref.fa direct-rna.fq > aln.sam
- mapping against SIRV control
minimap2 -ax splice --splice-flank=no SIRV.fa SIRV-seq.fa
Find overlaps between long reads (ava-pb and aca-ont)
- PacBio read overlap
minimap2 -x ava-pb reads.fq reads.fq > ovlp.paf
- Oxford Nanopore read overlap
minimap2 -x ava-ont reads.fq reads.fq > ovlp.paf
Map short accurate genomic reads (sr)
Note, Minimap2 does not work well with short spliced reads.
- single-end alignment
minimap2 -ax sr ref.fa reads-se.fq > aln.sam
- paired-end alignment
minimap2 -ax sr ref.fa read1.fq read2.fq > aln.sam
- paired-end alignment
minimap2 -ax sr ref.fa reads-interleaved.fq > aln.sam
Full genome/assembly alignment (asm5)
- assembly to assembly
minimap2 -ax asm5 ref.fa asm.fa > aln.sam
Example batch script for Puhti
On Puhti, Minimap2 jobs should be run as batch jobs. Below is a sample batch job file for running a Minimap2 paired-end alignment on Puhti.
#!/bin/bash -l
#SBATCH --job-name=minimap2
#SBATCH --output=output_%j.txt
#SBATCH --error=errors_%j.txt
#SBATCH --time=04:00:00
#SBATCH --partition=small
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --cpus-per-task=8
#SBATCH --account=<project>
#SBATCH --mem=16000
module load biokit
minimap2 -t $SLURM_CPUS_PER_TASK -ax splice -uf ref.fa iso-seq.fq > aln.sam
In the batch job example above, one task (--ntasks=1
) is executed. The Minimap2 job
uses 8 cores (--cpus-per-task=8
) with a total of 16 GB of memory (--mem=16000
).
The maximum duration of the job is four hours (--time=04:00:00
). All the cores
are assigned from one computing node (--nodes=1
). In addition to the resource
reservations, you have to define the billing project for your batch job. This
is done by replacing the <project>
with the name of your project. You can
use command csc-projects
to see what projects you have on Puhti.
You can submit the batch job file to the batch job system with the command:
sbatch batch_job_file.bash
See the Puhti user guide for more information about running batch jobs.
Support
More information
- More information about Minimap2 can be found from the Minimap2 home page.