Anvi'o Analysis Workflow Using Snakemake

public public 1yr ago 0 bookmarks

snakemake rules for running anvio

This Snakefile will run a basic anvi'o analysis of a set of assemblies and a set of samples, using the snakemake workflow manager.

Files

Snakefile

The file containing the snakemake rules

config.yaml.template

An example file containing the specific sample and assembly names that will be used in the workflow, as well as additional variable specifications.

Must be renamed 'config.yaml' for use.

cluster.json

A file specifying compute cluster job resource parameters for particular rules in the Snakefile.

requirements.txt

A file specifying conda requirements for the snakefile conda environment

launch.sh

Example launch command for launching on a Torque (qsub) job manager

anvio_install.sh

The commands I used for setting up my Anvi'o conda environment on Centos6

dag.svg

Visualization of the workflow steps run when executing the example config.yaml

data [directory]

Fake example data structure. For your own data, replace the files in data/assemblies and data/samples with the assembled contigs and the per-sample paired-end gzipped fastqs, respectively. Files must be named [assembly].fa and [sample].R[1,2].fq.gz

Code Snippets

29
30
shell:
    "tar -czf {output} {input}"
38
39
shell:
    "bowtie2-build {input} data/assemblies/{wildcards.assembly}"
53
54
55
56
shell:
    "bowtie2 -x {params.idx_base} -p {params.threads} --no-unal " \
              "-q -1 {input.R1} -2 {input.R2} | " \
              "samtools view -bS - > {output}"
63
64
shell:
    "samtools sort -O bam -o {output} {input}"
71
72
shell:
    "samtools index {input}"
79
80
shell:
    "anvi-gen-contigs-database -f {input} -o {output}"
89
90
91
92
shell:
    """
    anvi-run-hmms -c {input} --num-threads {params.threads}
    """
 99
100
shell:
    "anvi-get-dna-sequences-for-gene-calls -c {input} -o {output}"
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
shell:
    """
    export CENTRIFUGE_BASE={params.centrifuge_base}
    centrifuge -f --threads {params.threads} \
    -x {params.centrifuge_models} \
    {input.fa} \
    -S {output.hits} \
    --report-file {output.report}

    ln -s {output.hits} data/anvio/{wildcards.assembly}/centrifuge_hits.tsv
    ln -s {output.report} data/anvio/{wildcards.assembly}/centrifuge_report.tsv

    cd data/anvio/{wildcards.assembly}

    anvi-import-taxonomy -c {wildcards.assembly}.db \
    -i centrifuge_report.tsv centrifuge_hits.tsv \
    -p centrifuge

    rm centrifuge_hits.tsv
    rm centrifuge_report.tsv

    cd ../../../
    """
148
149
150
151
152
153
154
shell:
    """
    anvi-profile -i {input.sorted} \
    -c {input.db} \
    --overwrite-output-destinations \
    -o data/sorted_reads/{wildcards.assembly}.{wildcards.sample}.bam-ANVIO_PROFILE
    """
168
169
170
171
172
173
174
shell:
    """
    anvi-merge {input.profiles} \
    -o data/anvio/{wildcards.assembly}/SAMPLES_MERGED \
    -c {input.db} \
    -W
    """
182
183
184
185
186
187
188
shell:
    """
    anvi-summarize -p {input.prof} \
    -c {input.db} \
    -o data/anvio/{wildcards.assembly}/{wildcards.assembly}_SAMPLES-SUMMARY \
    -C CONCOCT
    """
ShowHide 8 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/tanaes/snakemake_anvio
Name: snakemake_anvio
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...