Trinity
Description
Trinity is used for de novo reconstruction of transcriptomes from RNA-seq data. Trinity combines three independent software modules: Inchworm, Chrysalis, and Butterfly, applied sequentially to process large volumes of RNA-seq reads. Trinity partitions the sequence data into many individual de Bruijn graphs, each representing the transcriptional complexity at at a given gene or locus, and then processes each graph independently to extract full-length splicing isoforms and to tease apart transcripts derived.
The Trinity module at CSC also includes TransDecoder and Trinotate tools to anlyze the results of a Trinity run.
License
Free to use and open source under [Broad Institute License]https://github.com/genome-vendor/trinity/blob/master/LICENSE).
Available
Version on CSC's Servers
Puhti: 2.15.1, 2.14.0, 2.13.2, 2.11.0, 2.8.5
Using Trinity
In Puhti, Trinity is set up with command:
module load biokit
module load trinty/2.13.2
Trinity should be used used interactively in a compute node or preferably through the batch job system. Below is an example batch job file for Trinity.
#!/bin/bash
#SBATCH --job-name=trinity
#SBATCH --output=output_%j.txt
#SBATCH --error=errors_%j.txt
#SBATCH --time=48:00:00
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --cpus-per-task=6
#SBATCH --mem=24000
#SBATCH --account=project_1234567
#
#
module load trinity/2.13.2
Trinity --seqType fq --max_memory 22G --left reads.left.fq --right \
reads.right.fq --SS_lib_type RF --CPU $SLURM_CPUS_PER_TASK \
--output trinity_run_out --grid_exec sbatch_commandlist
6 * 4 GB = 24 GB
. In Puhti, you must use batch job option
--account=
to define the project to be used. You should replace project_1234567 used in the example, with your own project. You can check your
projects with command: csc-workspaces
.
In the actual Trinity command the number, of computing cores to be used (--CPU) is set using environment variable: $SLURM_CPUS_PER_TASK
.
This variable contains the value set the --cpus-per-task
SLURM option.
In Puhti you can also use distributed computing to speed up the trinity job. When definition:
--grid_exec sbatch_commandlist
--grid_exec sbatch_commandlist_trinity
When the batch job file is ready, it can be submitted to the batch queue system with command:
sbatch batch_job_file
Please check the Trinity site to get hints for estimating the required resources,
Using autoTrinotate
You can analyse the results of your Trinity job with autoTrininotate
. You need two files, resulting from a successful Trinity assembly.
1. Fasta formatted nucleotide sequence file containing the final contigs created by Trinity (Trinity.fasta
)
2. gene-to-trans map for the input fasta file (Trinity.fasta.gene_to_trans_map
)
Note that depending on Trinity version, these names may have a prefix as defined with the --output
option (e.g. trinity_run_out.Trinity.fasta
).
Copy a template sqlite database for your analysis:
cp $TRINOTATE_HOME/databases/Trinotate.sqlite mydb.sqlite
You can then launch autoTrinotate with command:
$TRINOTATE_HOME/auto/autoTrinotate.pl --Trinotate_sqlite mydb.sqlite --transcripts Trinity.fasta --gene_to_trans_map Trinity.fasta.gene_to_trans_map --conf $TRINOTATE_HOME/auto/conf.txt --CPU $SLURM_CPUS_PER_TASK
Note
autoTrinotate analysis can require much resources so you should execute the command in with sinteractive or as a batch job.
AutoTrinotate produces an SQLite database file that can be further analyzed with command:
$TRINOTATE_HOME/Trinotate