Shotgun metagenomic sequencing processing pipeline

public public 1yr ago Version: 0.4.3 0 bookmarks
Loading...

VDB Shotgun Pipeline

Prerequisites

Important Notes :

  • Set the location of your profile to the environment variable $SNAKEMAKE_PROFILE (eg export SNAKEMAKE_PROFILE=/path/to/your/profile/ )

  • For the purposes of the examples, we added the --dry-run flag for the user to preview the rules to be executed. Remove this step to execute the commands.

  • All database paths are configured in config/config.yaml Change the paths to reflect where the databases can be found on your machine. For a uniform way to fetch and build all the databases, see https://github.com/vdblab/resources

Main Pipeline

Usage

snakemake \
 --directory tmpout/ \
 --config \
 sample=473 \
 R1=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R1_001.fastq.gz] \
 R2=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R2_001.fastq.gz] \
 nshards=4 \
 stage=all \
 --dry-run

Outputs

  • MultiQC-ready reports

  • Microbe relative abundances (MetaPhlAn3, Kraken2)

  • Metabolic pathway relative abundances (HUMAnN3)

  • Metagenome assembled genomes (MetaSPAdes)

  • AMR profiles with Abricate and RGI

  • MAGs with MetaWRAP (Metabat2, CONCOCT, Maxbin2)

  • Gene prediction and annotation (MetaErg)

  • Secondary metabolite gene clusters (antiSMASH)

  • Antimicrobial resistance and virulence genes (ABRicate, AMRFinderPlus)

  • Carbohydrate active enzyme (CAZyme) annotation (dbCAN3)

Workflow

The rule DAG for a single sample looks like this:

Main Shotgun Pipeline DAG

Different modules of the workflow can be run indenpendently using the stage config entry.

MultiQC

Just run MultiQC on a directory, no need to use Snakemake

cp -r tmppre/reports tmpreports
cp tmpassembly/quast/quast_473/report.tsv ./tmpreports/
ver="v1.12"
docker run -V $PWD:$PWD docker://ewels/multiqc:${ver} multiqc \
 --config vdb_shotgun/multiqc_config.yaml --force \
 --title "a multiqc report for some test data" \
 -b "generated by ${ver}" --filename multiqc_report.html \
 reports/ --interactive

Preprocessing

Shotgun Preprocessing Pipeline DAG

snakemake \
 --directory tmppreprocess/ \
 --config \
 sample=473 \
 R1=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R1_001.fastq.gz] \
 R2=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R2_001.fastq.gz] \
 nshards=4 \
 dedup_platform=NovaSeq \
 stage=preprocess \
 --dry-run

Tools used

Biobakery

Shotgun Biobakery Profiling Pipeline DAG

snakemake \
 --directory tmpbiobakery/ \
 --config \
 sample=473 \
 R1=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R1_001.fastq.gz] \
 R2=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R2_001.fastq.gz] \
 stage=biobakery \
 --dry-run

Tools used

Kraken2/Bracken

Shotgun Kraken/Bracken Pipeline DAG

snakemake \
 --directory tmpkraken/ \
 --config \
 sample=473 \
 R1=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R1_001.fastq.gz] \
 R2=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R2_001.fastq.gz] \
 dedup_platform=NovaSeq \
 stage=kraken \
 --dry-run

Tools used

Assembly

Shotgun Assembly Pipeline DAG

snakemake \
 --directory tmpassembly/ \
 --config \
 sample=473 \
 R1=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R1_001.fastq.gz] \
 R2=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R2_001.fastq.gz] \
 stage=assembly \
 --dry-run

Tools used

Annotation

Shotgun Assembly Annotation DAG

snakemake \
 --directory tmpannotate/ \
 --config \
 sample=473 \
 R1=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R1_001.fastq.gz] \
 R2=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R2_001.fastq.gz] \
 assembly=tmpassembly/473.contigs.fasta \
 stage=annotate \
 --dry-run

Tools used

Binning

Shotgun Assembly Binning Pipeline DAG

snakemake \
 --directory tmpbinning/ \
 --config \
 sample=473 \
 R1=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R1_001.fastq.gz] \
 R2=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R2_001.fastq.gz] \
 assembly=tmpassembly/473.contigs.fasta \
 stage=binning \
 --dry-run

RGI

Shotgun RGI Pipeline DAG

snakemake \
 --directory tmprgi/ \
 --config \
 sample=473 \
 R1=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R1_001.fastq.gz] \
 R2=[/data/brinkvd/data/shotgun/test/473/473_IGO_12587_1_S132_L003_R2_001.fastq.gz] \
 stage=rgi \
 --dry-run

Tools used

Strainphlan Pipeline

This pipeline StrainPhlAn for each specified species. Strainphlan requires two inputs: sample-level marker pickle files, and strain-level markers extracted from the main database. These are stored in central subdirectory in the Metaphlan database directory to aid re-running. If you provide the .sam.bz2 file for a samples that has already been processed into a pkl file, it will use the pregenerated result.

This workflow accepts as input a list of sample's metaphlan sam.bz2 alignment files, and a list of species of interest. A config argument strainphlan_markers_dir serves as a central place for storing both the species- and the sample-level marker files; these are specific to a version of the MetaPHlan database, so we recommend placing that within the metaphlan database directory.

Usage

snakemake \
 --snakefile workflow/strainphlan.smk \
 --directory tmpstrain/ \
 --config \
 sams=[path/to/sample1.sam.bz2,path/to/sample2.sam.bz2] \
 strainphlan_markers_dir=/data/brinkvd/resources/dbs/metaphlan/mpa_vJan21_CHOCOPhlAnSGB_202103/marker_outputs/ \
 metaphlan_db=/data/brinkvd/resources/dbs/metaphlan/mpa_vJan21_CHOCOPhlAnSGB_202103/ \
 marker_in_n_samples=2 \
 --dry-run

Outputs

For each input species:

  • Multiple sequence alignment of strains detected in samples

  • Phylogenetic tree of strains detected in samples

Workflow

The rule DAG for two example input species looks like this:

StrainPhlAn Shotgun Pipeline DAG

Testing and Development

Please see development.md .

Code Snippets

147
148
script:
    "../scripts/parse_antismash_gbk.py"
466
467
script:
    "../scripts/merge_logs.py"
82
83
script:
    "../scripts/plot_RGI_heatmap.R"
SnakeMake From line 82 of rules/RGI.smk
ShowHide 3 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/vdblab/vdblab-shotgun
Name: vdblab-shotgun
Version: 0.4.3
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...