MetaGT: A pipeline for de novo assembly of metatranscriptomes with the aid of metagenomic data

public public 1yr ago Version: Version 1 0 bookmarks

Assembly and quantification metatranscriptome using metagenome data .

MetaGT is a bioinformatics analysis pipeline used for improving and quantification metatranscriptome assembly using metagenome data. The pipeline supports Illumina sequencing data and complete metagenome and metatranscriptome assemblies. The pipeline involves the alignment of metatranscriprome assembly to the metagenome assembly with further extracting CDSs, which are covered by transcripts.

The pipeline is built using Nextflow, a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.

Pipeline Summary

Optionally, if raw reades are used:

  • Sequencing quality control ( FastQC )
  • Assembly metagenome or metatranscriptome ( metaSPAdes, rnaSPAdes )

By default, the pipeline currently performs the following:

  • Annotation metagenome ( Prokka )
  • Aligning metatranscriptome on metagenome ( minimap2 )
  • Annotation unaligned transcripts ( TransDecoder )
  • Clustering covered CDS and CDS from unaligned transcripts ( MMseqs2 )
  • Quantifying abundances of transcripts ( kallisto )

Code Snippets

 94
 95
 96
 97
 98
 99
100
"""
echo $workflow.manifest.version > v_pipeline.txt
echo $workflow.nextflow.version > v_nextflow.txt
abricate --version > v_abricate.txt
echo \$(mmseqs 2>&1) > v_varscan.txt
scrape_software_versions.py &> software_versions_mqc.yaml
"""
31
32
33
34
35
36
"""
samtools index $bam

extract_covered_cds.py --threads $task.cpus --gff $gff --bam $bam --genome $genome --output ${prefix}_covered_cds
extract_unused.py ${prefix}_covered_cds.used_contigs.list $transcriptome unaligned.transcripts.fasta
"""
24
25
26
27
"""

kallisto index -i index $fasta 
"""
45
46
47
48
49
"""

kallisto quant -i $index $input_reads -t $task.cpus -o ./
cp ./abundance.tsv abudance.tsv   
"""
28
29
30
31
32
33
34
35
36
"""

minimap2 -t $task.cpus -aY --MD $genome $transcriptome > ${prefix}.align.sam

samtools sort ${prefix}.align.sam -o ${prefix}.align.sorted.bam

change_name.py  $transcriptome ${meta_t.id}.all_transcripts.fasta

"""
29
30
31
32
33
"""
cat $cov_transcripts $cds_from_unaligned > all.fasta
mmseqs easy-linclust all.fasta res tmp --min-seq-id ${params.cluster_idy}  --cluster-mode 0 --seq-id-mode 2 --threads  $task.cpus --cov-mode 1
mv res_rep_seq.fasta ${prefix}.rep_seq.fasta
"""
26
27
28
29
30
"""
[ ! -f  ${prefix}.fasta ] && ln -s $fasta ${prefix}.fasta
prokka  ${prefix}.fasta --outdir ./ --force --prefix ${prefix} --metagenome --cpus $task.cpus

"""
25
26
27
28
29
"""
TransDecoder.LongOrfs -t $fasta --output_dir ./ 
mv ./longest_orfs.cds ${prefix}_cds_from_all_transcripts.fasta

"""
34
35
36
37
38
"""

fastqc $options.args --threads $task.cpus `parse_yaml.py $reads` -o ./
fastqc --version | sed -e "s/FastQC v//g" > ${software}.version.txt
"""
40
41
42
43
44
45
"""
[ ! -f  ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz
[ ! -f  ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz
fastqc $options.args --threads $task.cpus ${prefix}_1.fastq.gz ${prefix}_2.fastq.gz
fastqc --version | sed -e "s/FastQC v//g" > ${software}.version.txt
"""
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
"""
$command \\
    $options.args \\
    --threads $task.cpus \\
    $custom_hmms \\
    $input_reads \\
    -o ./
mv spades.log ${prefix}.spades.log

if [ -f scaffolds.fasta ]; then
    mv scaffolds.fasta ${prefix}.scaffolds.fa
fi
if [ -f contigs.fasta ]; then
    mv contigs.fasta ${prefix}.contigs.fa
fi
if [ -f transcripts.fasta ]; then
    mv transcripts.fasta ${prefix}.transcripts.fa
fi
if [ -f assembly_graph_with_scaffolds.gfa ]; then
    mv assembly_graph_with_scaffolds.gfa ${prefix}.assembly.gfa
fi

if [ -f gene_clusters.fasta ]; then
    mv gene_clusters.fasta ${prefix}.gene_clusters.fa
fi

echo \$(spades.py --version 2>&1) | sed 's/^.*SPAdes genome assembler v//; s/ .*\$//' > ${software}.version.txt
"""
NextFlow From line 41 of spades/main.nf
ShowHide 2 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/ablab/metaGT
Name: metagt-a-pipeline-for-de-novo-assembly-of-metatran
Version: Version 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...