Skip to content

Seqtk

Description

Seqtk is a fast and lightweight tool for processing sequences in the FASTA or FASTQ format. It seamlessly parses both FASTA and FASTQ files which can also be optionally compressed by gzip.

License

Free to use and open source under MIT License.

Available

  • Puhti: 1.3-r106

Usage

Seqtk in included in the biokit module:

module load biokit

Seqtk command syntax is:

seqtk <command> <arguments>
The available seqtk commands are:

Command Function
seq common transformation of FASTA/Q
comp get the nucleotide composition of FASTA/Q
sample subsample sequences
subseq extract subsequences from FASTA/Q
fqchk fastq QC (base/quality summary)
mergepe interleave two PE FASTA/Q files
trimfq trim FASTQ using the Phred algorithm
hety regional heterozygosity
gc identify high- or low-GC regions
mutfa point mutate FASTA at specified positions
mergefa merge two FASTA/Q files
famask apply a X-coded FASTA to a source FASTA
dropse drop unpaired from interleaved PE FASTA/Q
rename rename sequence names
randbase choose a random base from hets
cutN cut sequence at long N
listhet extract the position of each het

Examples:

Convert FASTQ to FASTA:

seqtk seq -a in.fq.gz > out.fa
Extract sequences with names in file name.lst, one sequence name per line:
seqtk subseq in.fq name.lst > out.fq
Extract sequences in regions contained in file reg.bed:
seqtk subseq in.fa reg.bed > out.fa

Manual