Analysis of 16S sequencing data via `DADA2` using a `snakemake` workflow

public public 1yr ago Version: 2 0 bookmarks

16S_DADA2

About

  • This repository can be used for the analyses of 16S sequencing data via DADA2 using a snakemake workflow

Set up

git clone https://github.com/susheelbhanu/16S_DADA2.git
  • You might want to adjust some settings in the files config/sbatch.sh and config/config.yaml , e.g. the name of the snakemake conda environment, paths and number of cores/threads.

Setting up samples.tsv and metadata.tsv

Step1:

  • There is no other option but to create a metadata file outside of this workflow and import

  • NOTE: the first column of the metadata file should be named, "Sample"

  • Import file and place in the config folder

Step2:

  • Edit line 15 , i.e. path to the fastq.gz files in the notes/samples_tsv.sh script

  • From the 16S_DADA2 folder run the following:

bash notes/samples_tsv.sh

Conda

Conda user guide

# install miniconda3
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
chmod u+x Miniconda3-latest-Linux-x86_64.sh
./Miniconda3-latest-Linux-x86_64.sh # follow the instructions
# create venv
conda env create -f requirements.yml -n "YourEnvName"

Run the analysis

# in an interactive session
./config/sbatch.sh
# as a slurm job
sbatch ./config/sbatch.sh

Code Snippets

13
14
shell:
    "(date && wget -O {output} {params.url} && date) &> {log}"
SnakeMake From line 13 of rules/asv.smk
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
run:
    import re
    from pandas import read_csv
    # read in data
    counts = read_csv(input.counts, sep='\t', header=0, index_col=0) # sample x asv
    tax    = read_csv(input.tax,    sep='\t', header=0, index_col=0) # asv x taxonomy
    # create output files
    for sid in counts.index:
        sid_re = re.compile(".*\.krona\.%s\.txt$" % sid) # pattern should match the output file pattern
        with open(list(filter(sid_re.match, output))[0], "w") as ofile: # matching output file
            for asv in counts.columns:
                if counts.loc[sid, asv] > 0:
                    assert asv in tax.index
                    c = counts.loc[sid, asv] # count
                    t = "\t".join(tax.loc[asv].fillna(value="NA")) # taxonomy
                    t = re.sub("^NA(\tNA)+$", "Unknown", t) # replace completely unknown taxonomy
                    t = re.sub("(\tNA)+$", "", t) # remove trailing NAs
                    ofile.write("%d\t%s\n" % (c, t))
147
148
shell:
    "ktImportText {input} -o {output} && sed -i 's/ASV\.krona\.//g' {output}"
15
16
17
shell:
    "ln -sf $(realpath {input.r1}) {output.r1} && "
    "ln -sf $(realpath {input.r2}) {output.r2}"
102
103
shell:
    "fastqc -q -f fastq -t {threads} -o $(dirname {output.zip}) {input} &> {log}"
121
122
shell:
    "multiqc --interactive -p -f -m fastqc -o $(dirname {output.html}) $(dirname {input[0]}) &> {log}"
139
140
shell:
    "multiqc --interactive -p -f -m cutadapt -o $(dirname {output.html}) $(dirname {input[0]}) &> {log}"
ShowHide 2 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/susheelbhanu/16S_DADA2
Name: 16s_dada2
Version: 2
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Other Versions:
Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...