Snakemake workflow template for preparing single-end ChIP-Seq data

public public 1yr ago Version: v0.1.1 0 bookmarks

This is a simple snakemake workflow template for preparing single-end ChIP-Seq data. The steps implemented are:

  1. Download raw fastq files from SRA

  2. Trim and Filter raw fastq files using

Code Snippets

16
17
18
19
20
21
22
23
24
25
26
shell:
    """
    AdapterRemoval \
        --file1 {input.sample} \
        {params.extra} \
        {params.adapters} \
        --threads {threads} \
        --output1 {output.fastq} \
        --discarded /dev/null \
        --settings {output.settings} &> {log}
    """
12
13
14
15
16
17
18
19
20
21
22
shell:
    """
    ## Sort the file
    echo -e "Started sorting at $(date)" >> {log}
    sort \
      -k1,1 -k2,2n \
      -S {resources.mem_mb}M \
      {input.bdg} | \
      egrep $'^chr[0-9XY]+\t' > {output.bdg}
    echo -e "Finished sorting at $(date)" >> {log}
    """	
38
39
40
41
42
43
shell:
    """
    echo -e "Started conversion at $(date)" >> {log}
    bedGraphToBigWig {input.bedgraph} {input.chrom_sizes} {output.bigwig}
    echo -e "Finished conversion at $(date)" >> {log}
    """
18
19
script:
    "../scripts/bowtie2.py"
35
36
37
38
39
40
41
42
43
44
45
46
shell:
    """
    DIR=$(dirname {output})
    if [[ ! -d $DIR ]]; then
        mkdir -p $DIR
    fi

    bowtie2-inspect --summary {params.ref} | \
      egrep '^Sequence' | \
      sed -r 's/ /\\t/g' | \
      cut -f2,4 > {output}
    """
15
16
17
18
19
20
21
22
shell:
    """
    fastqc \
      {params.extra} \
      -t {threads} \
      --outdir {params.outdir} \
      {input.fq} &> {log}
    """
72
73
74
75
76
77
78
79
80
81
82
shell:
    """
    macs2 callpeak \
        -t {input.bam}\
        -c {input.control} \
        -f BAM --bdg --SPMR \
        {params.extra} \
        -n {params.prefix} \
        --outdir {params.outdir} 2> {log}
    cp {log} {output.log}
    """
113
114
115
116
117
118
119
120
121
122
123
shell:
    """
    macs2 callpeak \
        -t {input.bam}\
        -c {input.control} \
        -f BAM --bdg --SPMR \
        {params.extra} \
        -n {params.prefix} \
        --outdir {params.outdir} 2> {log}
    cp {log} {output.log}
    """
149
150
151
152
153
154
155
156
shell:
    """
    macs2 bdgcmp \
        -t {input.treatment} \
        -c {input.control} \
        {params} \
        -o {output} 2> {log}
    """
169
170
171
172
shell:
    """
    Rscript --vanilla {input.script} {input.log} {output}  >> {log} 2>&1
    """
SnakeMake From line 169 of rules/macs2.smk
17
18
19
20
21
22
23
24
25
shell:
  """
  Rscript --vanilla \
    {input.script} \
    {input.bam} \
    {input.chrom_sizes} \
    {params.genome} \
    {output.greylist} >> {log} 2>&1
  """
17
18
19
20
21
22
23
24
25
shell:
    """
    multiqc \
      {params.extra} \
      --force \
      -o {params.outdir} \
      -n multiqc.html \
      {input} 2> {log}
    """
16
17
18
19
20
21
22
shell:
    """
    Rscript --vanilla \
        {input.script} \
        {input.bam} \
        {output} >> {log} 2>&1
    """
40
41
42
43
44
45
46
47
shell:
    """
    Rscript --vanilla \
        {input.script} \
        {input.peaks} \
        {input.bam} \
        {output} >> {log} 2>&1
    """ 
14
15
script:
    "../scripts/picard_markduplicates.py"
11
12
script:
    "../scripts/create_site_yaml.R"
33
34
35
36
shell:
    """
    R -e "rmarkdown::render_site('{input.rmd}')" >> {log} 2>&1
    """
56
57
58
59
shell:
    """
    R -e "rmarkdown::render_site('{input.rmd}')" >> {log} 2>&1
    """
83
84
85
86
shell:
    """
    R -e "rmarkdown::render_site('{input.rmd}')" >> {log} 2>&1
    """
107
108
109
110
shell:
    """
    R -e "rmarkdown::render_site('{input.rmd}')" >> {log} 2>&1
    """		
123
124
125
126
127
128
129
130
131
132
133
shell:
    """
    ## Create the generic markdown
    Rscript --vanilla \
        {input.r} \
        {wildcards.target} \
        {output.rmd} >> {log} 2>&1

    ## Add the module directly as literal code
    cat {input.rmd} >> {output.rmd}
    """
161
162
163
164
shell:
    """
    R -e "rmarkdown::render_site('{input.rmd}')" >> {log} 2>&1
    """
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
shell:
    """
    TEMPDIR=$(mktemp -d -t samXXXXXXXXXX)
    echo -e "Writing to $TEMPDIR" >>{log}

    samtools view -h {params.view} {input} |\
        samtools sort {params.sort} \
            -@ {threads} -m 4G \
            -T $TEMPDIR \
            -O BAM \
            -o {output} 2>> {log}

    echo -e "Deleting $TEMPDIR" >> {log}
    rm -rf $TEMPDIR
    """
46
47
48
49
shell:
    """
    samtools index -@ {threads} {input} {output}
    """
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "[email protected]"
__license__ = "MIT"


import os
from snakemake.shell import shell
from snakemake_wrapper_utils.samtools import get_samtools_opts


def get_format(path: str) -> str:
    """
    Return file format since Bowtie2 reads files that
    could be gzip'ed (extension: .gz) or bzip2'ed (extension: .bz2).
    """
    if path.endswith((".gz", ".bz2")):
        return path.split(".")[-2].lower()
    return path.split(".")[-1].lower()


bowtie2_threads = snakemake.threads - 1
if bowtie2_threads < 1:
    raise ValueError(
        f"This wrapper expected at least two threads, got {snakemake.threads}"
    )

# Setting parse_threads to false since samtools performs only
# bam compression. Thus the wrapper would use *twice* the amount
# of threads reserved by user otherwise.
samtools_opts = get_samtools_opts(snakemake, parse_threads=False)

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)


n = len(snakemake.input.sample)
assert (
    n == 1 
), "input->sample must have 1 (single-end) element."

reads = ""
if get_format(snakemake.input.sample[0]) in ("bam", "sam"):
    reads = f"-b {snakemake.input.sample[0]}"
else:
    if snakemake.params.get("interleaved", False):
        reads = f"--interleaved {snakemake.input.sample[0]}"
    else:
        reads = f"-U {snakemake.input.sample[0]}"


if all(get_format(sample) in ("fastq", "fq") for sample in snakemake.input.sample):
    extra += " -q "
elif all(get_format(sample) == "tab5" for sample in snakemake.input.sample):
    extra += " --tab5 "
elif all(get_format(sample) == "tab6" for sample in snakemake.input.sample):
    extra += " --tab6 "
elif all(
    get_format(sample) in ("fa", "mfa", "fasta") for sample in snakemake.input.sample
):
    extra += " -f "


metrics = snakemake.output.get("metrics")
if metrics:
    extra += f" --met-file {metrics} "

unaligned = snakemake.output.get("unaligned")
if unaligned:
    extra += f" --un {unaligned} "

unpaired = snakemake.output.get("unpaired")
if unpaired:
    extra += f" --al {unpaired} "

unconcordant = snakemake.output.get("unconcordant")
if unconcordant:
    extra += f" --un-conc {unconcordant} "

concordant = snakemake.output.get("concordant")
if concordant:
    extra += f" --al-conc {concordant} "


index = os.path.commonprefix(snakemake.input.idx).rstrip(".")


shell(
    "(bowtie2"
    " --threads {bowtie2_threads}"
    " {reads} "
    " -x {index}"
    " {extra}"
    "| samtools view --with-header "
    " {samtools_opts}"
    " -"
    ") {log}"
)
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
conda_pre <- system2("echo", "$CONDA_PREFIX", stdout = TRUE)
if (conda_pre != ""){
  conda_lib_path <- file.path(conda_pre, "lib", "R", "library")
  if (!dir.exists(conda_lib_path)) conda_lib_path <- NULL
  prev_paths <- .libPaths()
  paths_to_set <- unique(c(conda_lib_path, prev_paths))
  .libPaths(paths_to_set)
}

library(tidyverse)
library(yaml)
library(glue)
library(magrittr)

config <- read_yaml(here::here("config", "config.yml"))
samples <- read_tsv(here::here(config$samples))
targets <- sort(unique(samples$target))
dir <- basename(getwd())

site_yaml <- list(
  name = dir,
  output_dir = "../docs",
  navbar = list(
    title = dir,
    left = list(
      list(text = "Home", href = "index.html"),
      list(
        text = "QC", menu = list(
          list(text = "Raw Data", href = "raw_qc.html"),
          list(text = "Trimmed Data", href = "trimmed_qc.html"),
          list(text = "Alignments", href = "align_qc.html")
        )
      ),
      list(
        text = "Results", menu = list()
      )
    ),
    right = list(
      list(icon = "fa-github", href = "https://github.com/smped/prepareChIPs")
    )
  ),
  output = list(
    html_document = list(
      code_folding = "hide",
      toc = TRUE, toc_float = TRUE,
      theme = "sandstone", highlight = "textmate",
      includes = list(after_body = "footer.html")
    )
  )
)


## Create the output directory, if it doesn't exist
if (!is.null(site_yaml$output_dir)) {
  out_dir <- here::here("docs")
  message("Checking for directory: ", out_dir)
  if (!dir.exists(out_dir)) {
    message("Creating: ", out_dir)
    stopifnot(dir.create(out_dir))
  }
  message(out_dir, " exists")
}


site_yaml$navbar$left[[3]]$menu <- lapply(
  targets,
  function(x) {
    list(text = x, href = glue("{x}_macs2_summary.html"))
  }
)

## Create the analysis directory if it doesn't exist
analysis_dir <- here::here("analysis")
if  (!dir.exists(analysis_dir)) dir.create(analysis_dir)
write_yaml(site_yaml, file.path(analysis_dir, "_site.yml"))
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
__author__ = "Johannes Köster, Derek Croote"
__copyright__ = "Copyright 2020, Johannes Köster"
__email__ = "[email protected]"
__license__ = "MIT"

import os
import tempfile
from snakemake.shell import shell
from snakemake_wrapper_utils.snakemake import get_mem


log = snakemake.log_fmt_shell(stdout=True, stderr=True)
extra = snakemake.params.get("extra", "")


# Parse memory
mem_mb = get_mem(snakemake, "MiB")


# Outdir
outdir = os.path.dirname(snakemake.output[0])
if outdir:
    outdir = f"--outdir {outdir}"


# Output compression
compress = ""
mem = f"-m{mem_mb}" if mem_mb else ""

for output in snakemake.output:
    out_name, out_ext = os.path.splitext(output)
    if out_ext == ".gz":
        compress += f"pigz -p {snakemake.threads} {out_name}; "
    elif out_ext == ".bz2":
        compress += f"pbzip2 -p{snakemake.threads} {mem} {out_name}; "


with tempfile.TemporaryDirectory() as tmpdir:
    mem = f"--mem {mem_mb}M" if mem_mb else ""

    shell(
        "(fasterq-dump --temp {tmpdir} --threads {snakemake.threads} {mem} "
        "{extra} {outdir} {snakemake.wildcards.accession}; "
        "{compress}"
        ") {log}"
    )
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
__author__ = "Johannes Köster, Christopher Schröder"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "[email protected]"
__license__ = "MIT"


import tempfile
from pathlib import Path
from snakemake.shell import shell
from snakemake_wrapper_utils.java import get_java_opts
from snakemake_wrapper_utils.samtools import get_samtools_opts, infer_out_format


log = snakemake.log_fmt_shell()
extra = snakemake.params.get("extra", "")
# the --SORTING_COLLECTION_SIZE_RATIO default of 0.25 might
# indicate a JVM memory overhead of that proportion
java_opts = get_java_opts(snakemake, java_mem_overhead_factor=0.3)
samtools_opts = get_samtools_opts(snakemake)


tool = "MarkDuplicates"
if snakemake.params.get("withmatecigar", False):
    tool = "MarkDuplicatesWithMateCigar"


bams = snakemake.input.bams
if isinstance(bams, str):
    bams = [bams]
bams = list(map("--INPUT {}".format, bams))


output = snakemake.output.bam
output_fmt = infer_out_format(output)
convert = ""
if output_fmt == "CRAM":
    output = "/dev/stdout"

    # NOTE: output format inference should be done by snakemake-wrapper-utils. Keeping it here for backwards compatibility.
    if snakemake.params.get("embed_ref", False):
        samtools_opts += ",embed_ref"

    convert = f" | samtools view {samtools_opts}"
elif output_fmt == "BAM" and snakemake.output.get("idx"):
    extra += " --CREATE_INDEX"


with tempfile.TemporaryDirectory() as tmpdir:
    shell(
        "(picard {tool}"  # Tool and its subcommand
        " {java_opts}"  # Automatic java option
        " {extra}"  # User defined parmeters
        " {bams}"  # Input bam(s)
        " --TMP_DIR {tmpdir}"
        " --OUTPUT {output}"  # Output bam
        " --METRICS_FILE {snakemake.output.metrics}"  # Output metrics
        " {convert}) {log}"  # Logging
    )


output_prefix = Path(snakemake.output.bam).with_suffix("")
if snakemake.output.get("idx"):
    if output_fmt == "BAM" and snakemake.output.idx != str(output_prefix) + ".bai":
        shell("mv {output_prefix}.bai {snakemake.output.idx}")
ShowHide 20 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://doi.org/10.48546/workflowhub.workflow.528.1
Name: preparechips
Version: v0.1.1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...