HMMER
Description
Hidden Markov Models (HMM) are mathematical tools that can be used to describe and analyze related or similar sequence areas. HMM-models can be derived from multiple sequence alignments so that they contain position specific information about the probabilities of having certain nucleotides or amino acids in each position of an alignment.
The HMMER package contains tools to create and modify sequence alignmnet based HMM-models, use them to do database searches and extend sequence alignments.
Database searches with HMM profiles can require very long computing times in normal computers.
License
Free to use and open source under GNU GPLv3.
Available
Version on CSC's Servers
- Puhti: 3.2.1, 3.3.2
Usage
To use default version HMMER in Puhti, load the biokit module:
module load biokit
If you want to use a some other version, load the HMMER module, e.g.
module load hmmr/3.2.1
After this the command line options of each hmmer command can be checked with option -h
. For example:
hmmsearch -h
Pfam database
In Puhti you can use Pfam_A database with HMMER commands. You can also create your own HMM databases. For example, comparing a protein sequence against a Pfam-A HMM-database could be performed with following commands.
First, open an interactive batch job session and load biokit:
sinteractive -m 4G -c 4
module load biokit
hmmpfam
and hmmserach
commands by using several
processors. The number of processors, e.g. 4, to be used is indicated with option --cpu 4
but the number is better replaced with an environment variable which already has it i.e.
$SLURM_CPUS_PER_TASK
so it's always in sync with the batch script request:
hmmscan --cpu $SLURM_CPUS_PER_TASK $PFAMDB/pfam_a.hmm protein.fasta > result.txt
In Puhti, HMMER jobs should be run as interactive batch jobs or normal batch jobs. Here is an example batch job file using 4 processor cores:
#!/bin/bash
#SBATCH --job-name=hmmer_job
#SBATCH --output=output_%j.txt
#SBATCH --error=errors_%j.txt
#SBATCH --time=04:00:00
#SBATCH --partition=small
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH --account=project_123456
#SBATCH --mem=8000
#
module load biokit
hmmscan --cpu $SLURM_CPUS_PER_TASK $PFAMDB/pfam_a.hmm protein.fasta > result.txt
The job is submitted with command (where batch_job_file is the name of your batch job file):
sbatch batch_job_file