Metabarcoding Data Analysis Workflow Using Snakemake and Obitools Suite

public public 1yr ago 0 bookmarks

About

This is a snakemake workflow based on the obitools suite of programs, that analyzes DNA metabarcoding data.

Sequence analysis is performed with cutadapt (Martin, 2011), sumaclust (Mercier et al. 2013) and obitools softwares (Boyer et al. 2016), through a Snakemake pipeline (Molder et al. 2021).

Usage

The pipeline is meant to be executed on a computing cluster running with SLURM for now.

To run the workflow in a single command on the cluster:

sbatch sub_smk.sh

Two configuration files can be modified: workflow/cluster.yaml that sets up the ressources avaiable for each rule, and config/config.yaml where you can edit the values of the parameters used by the rules. You initial data (fastq and ngsfilter files) should be copied in a folder named resources (unless you modify the name of the folder in config/config.yaml ). All output files will be written in a folder named results .

Code Snippets

32
33
34
35
36
shell:
	"""
	set +u; module load bioinfo/cutadapt-3.4; set -u  
	cutadapt -a '{params.forwardadapter};min_overlap={params.minoverlap}' -A '{params.reverseadapter};min_overlap={params.minoverlap}' -o {output.R1} -p {output.R2} {input.R1} {input.R2} 2> {log}
	"""
50
51
52
53
54
55
shell:
	"""
	set +u; module load bioinfo/obitools-v1.2.11; set -u
	obidistribute -n {params.nfiles} -p {params.R1}  {input.R1}
	obidistribute -n {params.nfiles} -p {params.R2} {input.R2}
	"""
65
66
67
68
69
shell:
	"""
	set +u; module load bioinfo/obitools-v1.2.11; set -u
	illuminapairedend -r {input.R2} {input.R1} > {output}
	"""
78
79
80
81
	shell:
		"""
		cat {input} > {output}
  		"""
94
95
96
97
98
shell:
	"""
	set +u; module load bioinfo/obitools-v1.2.11; set -u
	obiannotate -S library:'{params.lib} if score>=0 else "err"' {input} > {output} 2> {log}
	"""
113
114
115
116
117
shell:
	"""
	set +u; module load bioinfo/obitools-v1.2.11; set -u
	obiannotate -S ali:'"good" if score>{params.minscore} else "bad"' {input} | obisplit -t ali -p {params.prefix} 2> {log}
	"""
131
132
133
134
135
shell:
	"""
	set +u; module load bioinfo/obitools-v1.2.11; set -u
	obiannotate --without-progress-bar --sanger -S 'Avgqphred:-int(math.log10(sum(sequence.quality)/len(sequence))*10)' {input} | ngsfilter --fasta-output -t {params.ngs} -u {output.unassigned} > {output.demultiplexed} 2> {log}
	"""
145
146
147
148
shell:
	"""
	cat {input} > {output} 2> {log}
	"""
160
161
162
163
164
shell:
	"""
	set +u; module load bioinfo/obitools-v1.2.11; set -u
	obiannotate -S start:"hash(str(sequence))%{params.nfiles}" {input} | obisplit -t start -p {params.tmp}
	"""
173
174
175
176
177
shell:
	"""
	set +u; module load bioinfo/obitools-v1.2.11; set -u
	obiuniq -m sample {input} > {output}
	"""
187
188
189
190
shell:
	"""
	cat {input} > {output} 2> {log}
	"""
203
204
205
206
207
shell:
	"""
	set +u; module load bioinfo/obitools-v1.2.11; set -u
	obiannotate --length -S 'GC_content:len(str(sequence).replace("a","").replace("t",""))*100/len(sequence)' {input} | obigrep -l {params.minlength} -s '^[acgt]+$' -p 'count>{params.mincount}' > {output} 2> {log}
	"""
221
222
223
224
225
shell:
	"""
	set +u; module load bioinfo/sumaclust_v1.0.31; set -u
	sumaclust -t {params.minsim} -p {threads} {input} > {output}
	"""
236
237
238
239
240
shell:
	"""
	set +u; module load bioinfo/obitools-v1.2.11; set -u
	obiselect -c cluster -n 1 --merge sample -M -f count {input} > {output} 2> {log}
	"""
251
252
253
254
255
shell:
	"""
	set +u; module load bioinfo/obitools-v1.2.11; set -u
	obitab -n NA -d -o {input} > {output} 2> {log}
	"""
ShowHide 12 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/AnneSoBen/ecofeed_workflow_obitools
Name: ecofeed_workflow_obitools
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...