Snakemake Workflow for ATAC-Seq Data Analysis: Replicating Bioinformatics Analysis

public public 1yr ago 0 bookmarks

TP Snakemake

The goal of this project is the initiation to a workflow language Snakemake and the discovery of its interest compared to the construction of a rudimentary workflow built in bash.

The project is based on the construction of a workflow in Snakemake allowing to reproduce the analysis steps conducted in the ATACseq project from the quality control of the fastq.gz sequences to the identification of regions of accessibility to the DNA in 2 biological conditions considered. Due to the impossibility to instantiate a VM with the BioPipes image (the cloud) and to install Snakemake on the Mesocenter compute cluster, I could not properly debug my snakemake script.

The snakemake workflow corresponds to the Snakefile and the configuration files are config.yaml and env.yaml for the working environment.

Code Snippets

16
17
18
19
20
shell:
	"""
	mkdir -p tmp
	gunzip -c {input} > {output}
	"""
SnakeMake From line 16 of main/Snakefile
31
32
33
34
35
shell:
	"""
	mkdir -p results/fastqc_init
	fastqc {input} -o "results/fastqc_init" -t {threads}
	"""
48
49
50
51
52
53
54
shell:
	"""
	mkdir -p results/cutadapt
	cutadapt -j 1 -a R1=CTGTCTCTTATACACATCTCCGAGCCCACGAGAC -A R2=CTGTCTCTTATACACATCTGACGCTGCCGACGA \
	--output={output} --paired-output={output} --error-rate0.1 --times=1 --overlap=3 --minimum-length=20 --pair-filter=any --quality-cutoff=20 \
	-j {threads} {input}
	"""
67
68
69
70
71
72
shell:
	"""
	mkdir -p results/bowtie2
	bowtie2 --very-sensitive -p {threads} -k 10 -x {input.genome} -1 {input.r1} -2 \
	| samtools view -q 2 -bs - | samtools sort - -o {output}
	"""
83
84
85
86
shell:
	"""
	samtools index -b {input.bam}
	"""
94
95
96
97
shell:
	"""
	samtools idxstats {input} > {output}
	"""
107
108
109
110
111
112
113
114
shell:
	"""
	java -jar /opt/apps/picard-2.18.25/picard.jar MarkDuplicates \
	I={input} \
	O={output.bam} \
	M={output.met} \
	REMOVE_DUPLICATES=true
	"""
122
123
124
125
shell:
"""
samtools index -b {input}
"""
133
134
135
136
shell:
"""
samtools idxstats {input} > {output}
"""
149
150
151
152
shell:
	"""
	macs2 callpeak -t {input.cult_24h} -c {input.cult_0h} -f BAM -g 'mm' -n
	"""
ShowHide 5 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/embusefala/Projet_Cloud
Name: projet_cloud
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...