ATAC-Seq Data Analysis Workflow with Snakemake and Conda Environment

public public 1yr ago 0 bookmarks

ATAC-seq workflow.

Mohamed Malek CHAOUCHI

Université Clermont auvergne, 2021

Dependencies, in order of use:

	- FastQC	- Trimmomotic	- FastQC	- Bowtie2	- Picard(MarkDuplicates)	- deepTools(multiBamSummary)	- deepTools(plotCorrelation)	- deepTools(plotCoverage)	- MACS2(callpeak)

Launching workflow:

  • Execute Snakefile in snakemake conda env.

Directory structure

	├── .gitignore	├── README.md	├──Snakefile	├── config	 ├── config.yaml	 └── env	 ├── qc.yaml	 ├── trim.yaml	 ├── bowtie2.yaml	 ├── deeptools.yaml	 ├── picard.yaml	 └── macs2.yaml	├── data	 └── mydatalocal	 ├── bowtie2	 └── atacseq	 └──subset	└── results
  • Datasets go into data/mydatalocal/atacseq/subset

  • Bowtie2 indexes go into data/mydatalocal/bowtie2

  • All output will be in results

  • Snakemake config is in config

  • Conda environment config files are in config/env

Code Snippets

40
41
42
43
44
shell:
    """
    mkdir -p tmp
    gunzip -c {input} > {output}
    """
54
55
56
57
58
shell:
    """
    mkdir -p results/fastqc_init
    fastqc {input} -o "results/fastqc_init" -t {threads}
    """
71
72
73
74
75
76
77
78
79
80
81
shell:
    """
    mkdir -p results/trim
    trimmomatic PE -threads {threads} \
    -trimlog results/trim/trim.log -summary results/trim/stats \
    {input.r1} {input.r2} \
    {output.fwd_P} \
    {output.fwd_U} \
    {output.rvr_P} \
    {output.rvr_U} ILLUMINACLIP:data/mydatalocal/atacseq/subset/NexteraPE-PE.fa:2:30:10:2:keepBothReads LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:33
    """
91
92
93
94
95
shell:
    """
    mkdir -p results/fastqc_post
    fastqc {input} -o "results/fastqc_post" -t {threads}
    """
105
106
107
108
109
shell:
    """
    bowtie2  --very-sensitive -p 1 -k 10  -x data/mydatalocal/bowtie2/all -1 {input.r1}  -2 {input.r2} | samtools view -q 2 -bS  -  |  samtools sort - -o {output.aln}
    samtools index -b {output.aln}
    """
120
121
122
123
124
125
126
127
128
shell:
    """
    picard MarkDuplicates \
    I={input.aln_a} \
    O={output.aln_clean} \
    M={output.aln_clean_txt} \
    REMOVE_DUPLICATES=true
    samtools index -b {output.aln_clean}
    """
137
138
139
140
141
shell:
    """
    multiBamSummary bins -b {input} \
    -o {output:q}
    """
SnakeMake From line 137 of master/Snakefile
152
153
154
155
156
157
158
159
160
161
shell:
    """
    plotCorrelation \
    -in {input}  \
    --corMethod spearman --skipZeros \
    --plotTitle "Spearman Correlation of Read Counts" \
    --whatToPlot heatmap --colorMap RdYlBu --plotNumbers \
    -o {output.png}   \
    --outFileCorMatrix {output.matrix}
    """
SnakeMake From line 152 of master/Snakefile
172
173
174
175
176
177
178
179
shell:
    """
    plotCoverage --bamfiles {input} \
    --plotFile {output.png} \
    --plotTitle "Coverage" \
    --outRawCounts {output.matrix} \
    --ignoreDuplicates
    """
SnakeMake From line 172 of master/Snakefile
189
190
191
192
193
194
195
shell:
    """
    macs2 callpeak -t {input}  \
    -f BAM \
    -n {wildcards.sample} \
    --outdir results/macs2
    """
ShowHide 5 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/mmchaouchi/Cloud
Name: cloud
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...