HMMER
Hidden Markov Models (HMM) are mathematical tools that can be used to describe and analyze related or similar sequence areas. HMM-models can be derived from multiple sequence alignments so that they contain position specific information about the probabilities of having certain nucleotides or amino acids in each position of an alignment.
The HMMER package contains tools to create and modify sequence alignment based HMM-models, use them to do database searches and extend sequence alignments.
Database searches with HMM profiles can require very long computing times in normal computers.
License
Free to use and open source under GNU GPLv3.
Available
- Puhti: 3.2.1, 3.3.2, 3.4
Usage
To use default version of HMMER on Puhti, load the biokit module:
module load biokit
If you want to use some other version, load the particular version of the HMMER module. For example:
module load hmmer/3.2.1
After this, the command line options of each hmmer
command can be checked with option -h
. For example:
hmmsearch -h
Pfam database
On Puhti, you can use Pfam-A database with HMMER commands. You can also create your own HMM databases. For example, comparing a protein sequence against a Pfam-A HMM-database could be performed with the following commands.
First, open an interactive batch job session and load biokit:
sinteractive -m 4G -c 4
module load biokit
With native HMMER, you can speed up the hmmpfam
and hmmserach
commands by using several
processors. The number of processors, e.g. 4, to be used is indicated with option --cpu 4
,
but the number is better replaced with an environment variable which already has it, i.e.
$SLURM_CPUS_PER_TASK
, so it's always in sync with the batch script request:
hmmscan --cpu $SLURM_CPUS_PER_TASK $PFAMDB/pfam_a.hmm protein.fasta > result.txt
In Puhti, HMMER jobs should be run as interactive batch jobs or normal batch jobs. Here is an example batch job file using 4 processor cores:
#!/bin/bash
#SBATCH --job-name=hmmer_job
#SBATCH --output=output_%j.txt
#SBATCH --error=errors_%j.txt
#SBATCH --time=04:00:00
#SBATCH --partition=small
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --cpus-per-task=4
#SBATCH --account=project_123456
#SBATCH --mem=8000
module load biokit
hmmscan --cpu $SLURM_CPUS_PER_TASK $PFAMDB/pfam_a.hmm protein.fasta > result.txt
The job is submitted with command (where batch_job_file is the name of your batch job file):
sbatch batch_job_file
For more information on running batch jobs, see the Computing User Guide.