Command line bioinformatics workflows, created with Snakemake workflow management tool.

public public 1yr ago Version: V0.4 0 bookmarks

Update: March 2021, Pipeline No Longer Supported as QIIME is available in a 2.0 Version, and 1.9 is no longer supported.

Synopsis

This workflow describes a series of steps executed to get from raw fastq files, resulting from the amplicon sequencing of sample(s), to OTU table, describing the taxonomic determination summary for the analysed sample(s). It executes on Linux command line, using a Snakemake workflow management system.

Workflow

  1. QC of input files

  2. Trim input files

  3. QC of trimmed files

  4. Join forward and referse sequences

  5. QC of joined sequences

  6. Cluster sequences

  7. Pick representative sequences

  8. Detect and remove chimeric representative sequences/clusters

  9. Taxonomic classification

  10. Create OTU table

Setup

  1. Create a directory with input data. This should be paired illumina sequences.

    • This can be a directory of symbolic links to data elsewhere on a file system.
  2. In config.yaml file:

    • Verify the working directory, this is the location of the pipeline output

    • Verify input directory, this can be an absolute path or relative to the working directory

    • Verify reference fasta and taxonomy, this can be absolute paths or relative to the working directory

    • Verify that input sequences file extension

    • Verify the input_file_forward_postfix parameter corresponds to the naming of your raw files. Change it, if necessary.

  3. Optional: change tools parameters/paths in config.yaml file.

For details on the workflow tools, their version, arguments used, and order of execution see Snakefile .

Execute

To check if the workflow will run correctly without executing the steps:

$ snakemake -np --configfile config.yaml

To execute the workflow:

$ snakemake --configfile config.yaml

Note: If you are not in the same directory as the Snakefile you will need the extra parameter --snakefile with the path to the Snakefile

Installation

This worflow runs on Linux. To install this workflow, either locally or on a cluster, you will need to have the following requirements installed.

Requirements
  • Python 3.5+

  • PyYAML 5.4+

  • Snakemake 3.7.1

  • FastQC 0.11.5

  • Trimomatic 0.36

  • Qiime 1.9

Download the latest release of this project:

https://github.com/AAFC-MBB/snakemake-amplicon-metagenomics/releases

OR

Check out this project (requires git):

$ git clone https://github.com/AAFC-MBB/snakemake-workflows.git

Tests

Automated test is located in snakemake-workflows/amplicon_workflow/test/ . To run the test, first download the test data to snakemake-workflows/amplicon_workflow/test/data/ directory (see README in that directory for the instructions). Before you execute the test please note, that the test runs Snakefile with the test data and therefore uses the same output directory as a regular snakemake command (default is snakemake-workflows/amplicon_workflow/data ). Therefore if you already have some input or intermetiate workflow execution data in your data directory and you would like to keep it - back it up.

Execute the tests:

$ ./test.sh -clean -run

Info

For more information about Snakemake, visit their website: https://bitbucket.org/snakemake/snakemake/wiki/Home

Authors

Oksana Korol

Christine Lowe

Licensing

See License file.

Code Snippets

 97
 98
 99
100
101
102
shell:
    """
    initial_data_quality_cmd="fastqc {input} --outdir=step0_initial_data_quality" ;\
    echo "Executed command:\n" $initial_data_quality_cmd ;\
    $initial_data_quality_cmd 
    """
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
shell:
    """
    touch null.fa
    java -jar {input.exec} PE -threads {threads} -phred33 {input.forward} {input.reverse} \
    {output.forward_paired} {output.forward_unpaired} {output.reverse_paired} {output.reverse_unpaired} \
    ILLUMINACLIP:{config[trimmomatic][ILLUMINACLIP]} \
    HEADCROP:{config[trimmomatic][HEADCROP]} \
    LEADING:{config[trimmomatic][LEADING]} \
    SLIDINGWINDOW:{config[trimmomatic][SLIDINGWINDOW]} \
    TRAILING:{config[trimmomatic][TRAILING]} \
    AVGQUAL:{config[trimmomatic][AVGQUAL]} \
    MINLEN:{config[trimmomatic][MINLEN]} \
    CROP:{config[trimmomatic][CROP]}
    rm null.fa
    """
149
150
151
152
153
154
155
156
157
shell:
    """
    trimm_quality_cmd="fastqc {input.forward} --outdir=step1_trimmomatic/quality" ;\
    echo "Executed command:\n" $trimm_quality_cmd ;\
    $trimm_quality_cmd
    trimm_quality_cmd="fastqc {input.reverse} --outdir=step1_trimmomatic/quality" ;\
    echo "Executed command:\n" $trimm_quality_cmd ;\
    $trimm_quality_cmd
    """
175
176
177
178
179
180
181
182
183
shell:
    """
    join_cmd="join_paired_ends.py -f {input.forward_paired} -r {input.reverse_paired} -o step2_join/ -m fastq-join" ;\
    echo "Executed command:\n" $join_cmd ;\
    $join_cmd ;\
    mv step2_join/fastqjoin.join.fastq {output.joined_seqs} ;\
    mv step2_join/fastqjoin.un1.fastq {output.unjoined_forward_seqs} ;\
    mv step2_join/fastqjoin.un2.fastq {output.unjoined_reverse_seqs} ;\
    """
SnakeMake From line 175 of master/Snakefile
197
198
199
200
201
202
shell:
    """
    join_quality_cmd="fastqc {input} --outdir=step2_join/quality" ;\
    echo "Executed command:\n" $join_quality_cmd ;\
    $join_quality_cmd
    """
215
216
217
218
219
220
221
222
223
224
225
shell:
    """
    for file in {input.fastq}; do \
        sample_id=$(echo $file | rev | cut -d'/' -f1 | rev |cut -d'_' -f1-4);\
        s_id=$(echo $sample_id | sed -e 's/_/\./g');\
        echo "Converting file $file";\
        echo "Original sample id: $sample_id";\
        echo "New sample id: $s_id";\
        sed -n '1~4s/^@/>'"$s_id"'_/p;2~4p' "$file" >> {output}; \
    done
    """
SnakeMake From line 215 of master/Snakefile
237
238
239
240
241
242
shell:
    """
    cluster_otus_cmd="pick_otus.py -i {input} -m uclust -s {config[pick_otus][s]} -o step4_pick_otu" ;\
    echo "Executed command:\n" $cluster_otus_cmd ;\
    $cluster_otus_cmd
    """
SnakeMake From line 237 of master/Snakefile
255
256
257
258
259
260
shell:
    """
    pick_representatives_cmd="pick_rep_set.py -i {input.otu} -f {input.fasta} -m longest -o {output}" ;\
    echo "Executed command:\n" $pick_representatives_cmd ;\
    $pick_representatives_cmd
    """
SnakeMake From line 255 of master/Snakefile
274
275
276
277
278
279
280
281
282
shell:
    """
    check_chimeric_seqs_cmd="parallel_identify_chimeric_seqs.py -i {input.dataset} -t {input.reference_txt} -r {input.reference_fasta} -m blast_fragments -o {output.chimeric_list} -O {config[threads]}" ;\
    echo "Executed command:\n" $check_chimeric_seqs_cmd ;\
    $check_chimeric_seqs_cmd
    remove_chimeric_seqs_cmd="filter_fasta.py -f {input.dataset} -o {output.rep_set} -s {output.chimeric_list} -n" ;\
    echo "Executed command:\n" $remove_chimeric_seqs_cmd ;\
    $remove_chimeric_seqs_cmd
    """
SnakeMake From line 274 of master/Snakefile
296
297
298
299
300
301
302
shell:
   """
   classify_cmd="parallel_assign_taxonomy_rdp.py -i {input.dataset} -o step7_classify \
   -r {input.reference_fasta} -t {input.reference_txt} --rdp_max_memory 10000 -c {config[assign_taxonomy][c]} -O {config[threads]}" ;\
   echo "Executed command:\n" $classify_cmd ;\
   $classify_cmd
   """
SnakeMake From line 296 of master/Snakefile
316
317
318
319
320
321
shell:
    """
    make_otu_cmd="make_otu_table.py -i {input.otu} -t {input.assigned_taxonomy} -o {output}" ;\
    echo "Executed command:\n" $make_otu_cmd ;\
    $make_otu_cmd
    """
SnakeMake From line 316 of master/Snakefile
332
333
334
335
336
337
shell:
    """
    convert_otu_table_cmd="biom convert -i {input} -o {output} --to-tsv --header-key taxonomy" ;\
    echo "Executed command:\n" $convert_otu_table_cmd ;\
    $convert_otu_table_cmd
    """
SnakeMake From line 332 of master/Snakefile
ShowHide 10 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/AAFC-BICoE/snakemake-amplicon-metagenomics
Name: snakemake-amplicon-metagenomics
Version: V0.4
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...