Metabarcoding Data Analysis Workflow Using Snakemake and Obitools Suite
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
About
This is a snakemake workflow based on the obitools suite of programs, that analyzes DNA metabarcoding data.
Sequence analysis is performed with cutadapt (Martin, 2011), sumaclust (Mercier et al. 2013) and obitools softwares (Boyer et al. 2016), through a Snakemake pipeline (Molder et al. 2021).
Usage
The pipeline is meant to be executed on a computing cluster running with SLURM for now.
To run the workflow in a single command on the cluster:
sbatch sub_smk.sh
Two configuration files can be modified:
workflow/cluster.yaml
that sets up the ressources avaiable for each rule, and
config/config.yaml
where you can edit the values of the parameters used by the rules. You initial data (fastq and ngsfilter files) should be copied in a folder named
resources
(unless you modify the name of the folder in
config/config.yaml
). All output files will be written in a folder named
results
.
Code Snippets
32 33 34 35 36 | shell: """ set +u; module load bioinfo/cutadapt-3.4; set -u cutadapt -a '{params.forwardadapter};min_overlap={params.minoverlap}' -A '{params.reverseadapter};min_overlap={params.minoverlap}' -o {output.R1} -p {output.R2} {input.R1} {input.R2} 2> {log} """ |
50 51 52 53 54 55 | shell: """ set +u; module load bioinfo/obitools-v1.2.11; set -u obidistribute -n {params.nfiles} -p {params.R1} {input.R1} obidistribute -n {params.nfiles} -p {params.R2} {input.R2} """ |
65 66 67 68 69 | shell: """ set +u; module load bioinfo/obitools-v1.2.11; set -u illuminapairedend -r {input.R2} {input.R1} > {output} """ |
78 79 80 81 | shell: """ cat {input} > {output} """ |
94 95 96 97 98 | shell: """ set +u; module load bioinfo/obitools-v1.2.11; set -u obiannotate -S library:'{params.lib} if score>=0 else "err"' {input} > {output} 2> {log} """ |
113 114 115 116 117 | shell: """ set +u; module load bioinfo/obitools-v1.2.11; set -u obiannotate -S ali:'"good" if score>{params.minscore} else "bad"' {input} | obisplit -t ali -p {params.prefix} 2> {log} """ |
131 132 133 134 135 | shell: """ set +u; module load bioinfo/obitools-v1.2.11; set -u obiannotate --without-progress-bar --sanger -S 'Avgqphred:-int(math.log10(sum(sequence.quality)/len(sequence))*10)' {input} | ngsfilter --fasta-output -t {params.ngs} -u {output.unassigned} > {output.demultiplexed} 2> {log} """ |
145 146 147 148 | shell: """ cat {input} > {output} 2> {log} """ |
160 161 162 163 164 | shell: """ set +u; module load bioinfo/obitools-v1.2.11; set -u obiannotate -S start:"hash(str(sequence))%{params.nfiles}" {input} | obisplit -t start -p {params.tmp} """ |
173 174 175 176 177 | shell: """ set +u; module load bioinfo/obitools-v1.2.11; set -u obiuniq -m sample {input} > {output} """ |
187 188 189 190 | shell: """ cat {input} > {output} 2> {log} """ |
203 204 205 206 207 | shell: """ set +u; module load bioinfo/obitools-v1.2.11; set -u obiannotate --length -S 'GC_content:len(str(sequence).replace("a","").replace("t",""))*100/len(sequence)' {input} | obigrep -l {params.minlength} -s '^[acgt]+$' -p 'count>{params.mincount}' > {output} 2> {log} """ |
221 222 223 224 225 | shell: """ set +u; module load bioinfo/sumaclust_v1.0.31; set -u sumaclust -t {params.minsim} -p {threads} {input} > {output} """ |
236 237 238 239 240 | shell: """ set +u; module load bioinfo/obitools-v1.2.11; set -u obiselect -c cluster -n 1 --merge sample -M -f count {input} > {output} 2> {log} """ |
251 252 253 254 255 | shell: """ set +u; module load bioinfo/obitools-v1.2.11; set -u obitab -n NA -d -o {input} > {output} 2> {log} """ |
Support
- Future updates