ChIPseq Data Processing Workflow with Snakemake

public public 1yr ago 0 bookmarks

This workflow performs some genernal data process for ChIPseq

Workflow

dag1

Install Snakemake and Git

Please make sure that Snakemake and Git are correctly installed

Snakemake: https://snakemake.readthedocs.io/en/stable/getting_started/installation.html

Git: https://anaconda.org/anaconda/git

clone workflow into working directory

git clone https://github.com/mhu10/SMK_ChIPseq path/to/workdir

Edit config file and workfileas as needed

./SMK_ChIPseq/config/'config.yaml

./SMK_ChIPseq/Snakefile

Activate snakemake

conda activate snakemake

Dry run workflow

snakemake -n

Execute workflow

snakemake --use-conda --cores 12

Build DAG workflow chart

snakemake --rulegraph | dot -Tsvg > dag1.svg

Code Snippets

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
library(dplyr)

out_file1 <- snakemake@output[["bed_ctl"]]
out_file2 <- snakemake@output[["bed_treatment"]]

datainput_ctl <- read.table(snakemake@input[['peaks_ctl']], sep="\t")[,c(2,3,4,1,6,5)]
datainput_treatment <- read.table(snakemake@input[['peaks_treatment']], sep="\t")[,c(2,3,4,1,6,5)]


colnames(datainput_ctl) = c('Chromosome','Start','End','PeakID','Column5','Strand')
colnames(datainput_treatment) = c('Chromosome','Start','End','PeakID','Column5','Strand')


datainput_ctl = arrange(datainput_ctl, Chromosome)
datainput_treatment = arrange(datainput_treatment, Chromosome)


write.table(datainput_ctl, out_file1, quote = FALSE,sep='\t',row.names = FALSE)
write.table(datainput_treatment, out_file2, quote = FALSE,sep='\t',row.names = FALSE)
49
50
wrapper:
    "v1.7.1/bio/fastqc"
74
75
wrapper:
    "v1.7.1/bio/trimmomatic/pe"
SnakeMake From line 74 of main/Snakefile
89
90
91
92
93
94
shell:
    "bowtie2 -p 8 --local --no-discordant --no-mixed "
    "-x {params.bowtieindex}"
    "-1 {input.r1} "
    "-2 {input.r2} "
    "-S {output} "
104
105
shell:
    "samtools view -bS {input} > {output} "
116
117
shell:
    "samtools sort -@ 5 {input}  -o {output} "
128
129
shell:
    "samtools view -bh -q 10 {input} > {output} "
143
144
wrapper:
    "v1.20.0/bio/picard/markduplicates"
SnakeMake From line 143 of main/Snakefile
158
159
wrapper:
    "v1.25.0/bio/samtools/index"
SnakeMake From line 158 of main/Snakefile
172
173
174
175
176
177
shell:
    "multiBamSummary bins "
    "--ignoreDuplicates -p 8 "
    "--bamfile {input.bams} "
    "-out {output.countnpz} "
    "--outRawCounts {output.counttab}"
SnakeMake From line 172 of main/Snakefile
191
192
193
194
195
shell:
    "plotPCA --corData {input.countnpz} "
    "--plotFile ./results/MultiBamSummary/deepTools_pcaplot.png "
    "-T 'PCA of read counts' "
    "--outFileNameData ./results/MultiBamSummary/deeptools_pcaProfile.tab "
206
207
208
209
210
211
212
shell:
    "plotCorrelation --corData {input.countnpz} "
    "--plotFile ./results/MultiBamSummary/deepTools_heatmap_spearman.png "
    "--corMethod spearman "
    "--whatToPlot heatmap "
    "--plotTitle 'Spearman Correlation of Read Counts' "
    "--plotNumbers "
225
226
shell:
    "macs2 callpeak -t {input} -n {wildcards.sample} --outdir results/MACS2/{wildcards.sample} -g hs -f BAM --keep-dup auto --bdg "
239
240
wrapper:
    "v1.25.0/bio/homer/mergePeaks"
SnakeMake From line 239 of main/Snakefile
253
254
wrapper:
    "v1.25.0/bio/homer/mergePeaks"
SnakeMake From line 253 of main/Snakefile
266
267
script:
    "scripts/ConvertPeaks.R"
SnakeMake From line 266 of main/Snakefile
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
__author__ = "Johannes Köster, Christopher Schröder"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "[email protected]"
__license__ = "MIT"


import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

log = snakemake.log_fmt_shell()

extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)

bams = snakemake.input.bams
if isinstance(bams, str):
    bams = [bams]
bams = list(map("--INPUT {}".format, bams))

if snakemake.output.bam.endswith(".cram"):
    output = "/dev/stdout"
    if snakemake.params.embed_ref:
        view_options = "-O cram,embed_ref"
    else:
        view_options = "-O cram"
    convert = f" | samtools view -@ {snakemake.threads} {view_options} --reference {snakemake.input.ref} -o {snakemake.output.bam}"
else:
    output = snakemake.output.bam
    convert = ""

with tempfile.TemporaryDirectory() as tmpdir:
    shell(
        "(picard MarkDuplicates"  # Tool and its subcommand
        " {java_opts}"  # Automatic java option
        " {extra}"  # User defined parmeters
        " {bams}"  # Input bam(s)
        " --TMP_DIR {tmpdir}"
        " --OUTPUT {output}"  # Output bam
        " --METRICS_FILE {snakemake.output.metrics}"  # Output metrics
        " {convert} ) {log}"  # Logging
    )
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
__author__ = "Jan Forster"
__copyright__ = "Copyright 2020, Jan Forster"
__email__ = "[email protected]"
__license__ = "MIT"

from snakemake.shell import shell
import os.path as path
import sys

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


class PrefixNotSupportedError(Exception):
    pass


if "-prefix" in extra:
    raise PrefixNotSupportedError(
        "The use of the -prefix parameter is not yet supported in this wrapper"
    )

shell("(mergePeaks" " {snakemake.input}" " {extra}" " > {snakemake.output})" " {log}")
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "[email protected]"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)

# Samtools takes additional threads through its option -@
# One thread for samtools merge
# Other threads are *additional* threads passed to the '-@' argument
threads = "" if snakemake.threads <= 1 else " -@ {} ".format(snakemake.threads - 1)

shell(
    "samtools index {threads} {extra} {snakemake.input[0]} {snakemake.output[0]} {log}"
)
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "[email protected]"
__license__ = "MIT"


from os import path
import re
from tempfile import TemporaryDirectory

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)


def basename_without_ext(file_path):
    """Returns basename of file path, without the file extension."""

    base = path.basename(file_path)
    # Remove file extension(s) (similar to the internal fastqc approach)
    base = re.sub("\\.gz$", "", base)
    base = re.sub("\\.bz2$", "", base)
    base = re.sub("\\.txt$", "", base)
    base = re.sub("\\.fastq$", "", base)
    base = re.sub("\\.fq$", "", base)
    base = re.sub("\\.sam$", "", base)
    base = re.sub("\\.bam$", "", base)

    return base


# Run fastqc, since there can be race conditions if multiple jobs
# use the same fastqc dir, we create a temp dir.
with TemporaryDirectory() as tempdir:
    shell(
        "fastqc {snakemake.params} -t {snakemake.threads} "
        "--outdir {tempdir:q} {snakemake.input[0]:q}"
        " {log}"
    )

    # Move outputs into proper position.
    output_base = basename_without_ext(snakemake.input[0])
    html_path = path.join(tempdir, output_base + "_fastqc.html")
    zip_path = path.join(tempdir, output_base + "_fastqc.zip")

    if snakemake.output.html != html_path:
        shell("mv {html_path:q} {snakemake.output.html:q}")

    if snakemake.output.zip != zip_path:
        shell("mv {zip_path:q} {snakemake.output.zip:q}")
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
__author__ = "Johannes Köster, Jorge Langa"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "[email protected]"
__license__ = "MIT"


from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts

# Distribute available threads between trimmomatic itself and any potential pigz instances
def distribute_threads(input_files, output_files, available_threads):
    gzipped_input_files = sum(1 for file in input_files if file.endswith(".gz"))
    gzipped_output_files = sum(1 for file in output_files if file.endswith(".gz"))
    potential_threads_per_process = available_threads // (
        1 + gzipped_input_files + gzipped_output_files
    )
    if potential_threads_per_process > 0:
        # decompressing pigz creates at most 4 threads
        pigz_input_threads = (
            min(4, potential_threads_per_process) if gzipped_input_files != 0 else 0
        )
        pigz_output_threads = (
            (available_threads - pigz_input_threads * gzipped_input_files)
            // (1 + gzipped_output_files)
            if gzipped_output_files != 0
            else 0
        )
        trimmomatic_threads = (
            available_threads
            - pigz_input_threads * gzipped_input_files
            - pigz_output_threads * gzipped_output_files
        )
    else:
        # not enough threads for pigz
        pigz_input_threads = 0
        pigz_output_threads = 0
        trimmomatic_threads = available_threads
    return trimmomatic_threads, pigz_input_threads, pigz_output_threads


def compose_input_gz(filename, threads):
    if filename.endswith(".gz") and threads > 0:
        return "<(pigz -p {threads} --decompress --stdout {filename})".format(
            threads=threads, filename=filename
        )
    return filename


def compose_output_gz(filename, threads, compression_level):
    if filename.endswith(".gz") and threads > 0:
        return ">(pigz -p {threads} {compression_level} > {filename})".format(
            threads=threads, compression_level=compression_level, filename=filename
        )
    return filename


extra = snakemake.params.get("extra", "")
java_opts = get_java_opts(snakemake)
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
compression_level = snakemake.params.get("compression_level", "-5")
trimmer = " ".join(snakemake.params.trimmer)

# Distribute threads
input_files = [snakemake.input.r1, snakemake.input.r2]
output_files = [
    snakemake.output.r1,
    snakemake.output.r1_unpaired,
    snakemake.output.r2,
    snakemake.output.r2_unpaired,
]

trimmomatic_threads, input_threads, output_threads = distribute_threads(
    input_files, output_files, snakemake.threads
)

input_r1, input_r2 = [
    compose_input_gz(filename, input_threads) for filename in input_files
]

output_r1, output_r1_unp, output_r2, output_r2_unp = [
    compose_output_gz(filename, output_threads, compression_level)
    for filename in output_files
]

shell(
    "trimmomatic PE -threads {trimmomatic_threads} {java_opts} {extra} "
    "{input_r1} {input_r2} "
    "{output_r1} {output_r1_unp} "
    "{output_r2} {output_r2_unp} "
    "{trimmer} "
    "{log}"
)
ShowHide 13 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/mhu10/SMK_ChIPseq
Name: smk_chipseq
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...