Snakemake pipeline for detection limit test from laser-microdissected samples

public public 1yr ago 0 bookmarks

A Snakemake workflow for assessing detection limit from laser-microdissected samples.

Usage

  1. Requirements

    1. git-lfs

    2. miniconda / mamba

    3. snakemake

  2. Clone the repository Clone the repository, and set it as the working directory.

git clone --recursive https://github.com/3d-omics/detection_limit_test.git
cd detection_limit_test
  1. Run the pipeline with the test data (takes 5 minutes)
snakemake \ --use-conda \ --conda-frontend mamba \ -j 8
  1. Edit the following files:

    1. config/samples.tsv : the control file with the sequencing libraries and their location.

    2. config/features.tsv : the references against which to map the libraries: human, chicken / pig, MAG catalogue.

    3. config/params.tsv : parameters for every program. The defaults are reasonable.

  2. Run the pipeline on a cluster with slurm and go for a walk:

./run_slurm

Brief description

  1. Trim reads and remove adaptors with fastp

  2. Map to human, chicken / pig, mag catalogue:

    1. Map to the reference with bowtie2

    2. Extract the reads that have one of both ends unmapped with samtools

    3. Map those unmapped reads to the next reference

  3. Generate MAG statistics with

    1. coverm

    2. singlem

    3. nonpareil

  4. Generate lots of reports in the reports/ folder

Rulegraph

rulegraph

Code Snippets

21
22
23
24
25
26
27
28
29
shell:
    """
    bowtie2-build \
        --threads {threads} \
        {params.extra} \
        {input.reference} \
        {output.mock} \
    2> {log} 1>&2
    """
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
shell:
    """
    if [[ {params.is_paired} = "True" ]] ; then
        (bowtie2 \
            -x {input.mock} \
            -1 {input.forward_} \
            -2 {input.reverse_} \
            --threads {threads} \
            --rg-id '{params.rg_id}' \
            --rg '{params.rg_extra}' \
            {params.extra} \
        | samtools sort \
            --threads {threads} \
            -m {params.samtools_mem} \
        | samtools rmdup - - \
        | samtools view \
            --reference {input.reference} \
            --output {output.cram} \
            --output-fmt cram,level=9,nthreads={threads} \
            --write-index \
        ) 2> {log} 1>&2
    else
        (bowtie2 \
            -x {input.mock} \
            -U {input.forward_} \
            --threads {threads} \
            --rg-id '{params.rg_id}' \
            --rg '{params.rg_extra}' \
            {params.extra} \
        | samtools sort \
            --threads {threads} \
            -m {params.samtools_mem} \
        | samtools rmdup -s - - \
        | samtools view \
            --reference {input.reference} \
            --output {output.cram} \
            --output-fmt cram,level=9,nthreads={threads} \
            --write-index \
        ) 2> {log} 1>&2
    fi
    """
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
shell:
    """
    if [[ {params.is_paired} = "True" ]] ; then
        (samtools view \
            --reference {input.reference} \
            --threads {threads} \
            -u \
            -o /dev/stdout \
            -f 12 \
            {input.cram} \
        | samtools fastq \
            -1 {output.forward_} \
            -2 {output.reverse_} \
            -0 /dev/null \
            -c 9 \
            --threads {threads} \
        ) 2> {log} 1>&2
    else
        (samtools view \
            --reference {input.reference} \
            --threads {threads} \
            -u \
            -o /dev/stdout \
            -f 4 \
            {input.cram} \
        | samtools fastq \
            -0 {output.forward_} \
            -c 9 \
            --threads {threads} \
        ) 2> {log} 1>&2
    fi
    """
21
22
23
24
25
26
27
28
29
shell:
    """
    bowtie2-build \
        --threads {threads} \
        {params.extra} \
        {input.reference} \
        {output.mock} \
    2> {log} 1>&2
    """
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
shell:
    """
    if [[ {params.is_paired} == "True" ]] ; then
        (bowtie2 \
            -x {input.mock} \
            -1 {input.forward_} \
            -2 {input.reverse_} \
            --threads {threads} \
            --rg-id '{params.rg_id}' \
            --rg '{params.rg_extra}' \
            {params.extra} \
        | samtools sort \
            -l 9 \
            -M \
            -m {params.samtools_mem} \
            -o {output.cram} \
            --reference {input.reference} \
            --threads {threads} \
            --write-index \
        ) 2> {log} 1>&2
    else
        (bowtie2 \
            -x {input.mock} \
            -U {input.forward_} \
            --threads {threads} \
            --rg-id '{params.rg_id}' \
            --rg '{params.rg_extra}' \
            {params.extra} \
        | samtools sort \
            -l 9 \
            -M \
            -m {params.samtools_mem} \
            -o {output.cram} \
            --reference {input.reference} \
            --threads {threads} \
            --write-index \
        ) 2> {log} 1>&2
    fi
    """
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
shell:
    """
    if [[ {params.is_paired} = True ]] ; then
        fastp \
            --in1 {input.forward_} \
            --in2 {input.reverse_} \
            --out1 >(bgzip -l 1 -@ {threads} > {output.forward_}) \
            --out2 >(bgzip -l 1 -@ {threads} > {output.reverse_}) \
            --unpaired1 >(bgzip -l 1 -@ {threads} > {output.unpaired1}) \
            --unpaired2 >(bgzip -l 1 -@ {threads} > {output.unpaired2}) \
            --html {output.html} \
            --json {output.json} \
            --compression 1 \
            --verbose \
            --adapter_sequence {params.adapter_forward} \
            --adapter_sequence_r2 {params.adapter_reverse} \
            --length_required {params.length_required} \
            --thread {threads} \
            {params.extra} \
        2> {log} 1>&2
    else
        fastp \
            --in1 {input.forward_} \
            --out1 >(bgzip -l 1 -@ {threads} > {output.forward_}) \
            --html {output.html} \
            --json {output.json} \
            --compression 1 \
            --verbose \
            --adapter_sequence {params.adapter_forward} \
            --length_required {params.length_required} \
            --thread {threads} \
            {params.extra} \
        2> {log} 1>&2
    fi
    """
12
13
shell:
    "fastqc {input} 2> {log} 1>&2"
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
shell:
    """
    if [[ {params.is_paired} = "True" ]] ; then
        kraken2 \
            --db {input.database} \
            --threads {threads} \
            --paired \
            --gzip-compressed \
            --output >(pigz > {output.out_gz}) \
            --report {output.report} \
            {input.forward_} \
            {input.reverse_} \
        > {log} 2>&1
    else
        kraken2 \
            --db {input.database} \
            --threads {threads} \
            --gzip-compressed \
            --output >(pigz > {output.out_gz}) \
            --report {output.report} \
            {input.forward_} \
        > {log} 2>&1
    fi
    """
13
14
15
16
17
shell:
    """
    ln --symbolic $(readlink --canonicalize {input.forward_}) {output.forward_} 2> {log}
    ln --symbolic $(readlink --canonicalize {input.reverse_}) {output.reverse_} 2> {log}
    """
12
13
14
15
16
17
18
19
20
21
22
23
24
shell:
    """
    (gzip \
        --decompres \
        --stdout {input.fa_gz} \
    | bgzip \
        --compress-level 9 \
        --threads {threads} \
        --stdout \
        /dev/stdin \
    > {output.fa_gz} \
    ) 2> {log}
    """
43
44
45
46
47
48
49
50
51
52
53
54
55
shell:
    """
    (gzip \
        --decompres \
        --stdout {input.fa_gz} \
    | bgzip \
        --compress-level 9 \
        --threads {threads} \
        --stdout \
        /dev/stdin \
    > {output.fa_gz} \
    ) 2> {log}
    """
18
19
20
21
22
23
24
25
26
27
28
29
shell:
    """
    multiqc \
        --title {params.library} \
        --force \
        --filename {params.library} \
        --outdir {params.out_dir} \
        --dirs \
        --dirs-depth 1 \
        {input} \
    2> {log} 1>&2
    """
13
14
15
16
17
18
19
20
21
22
shell:
    """
    multiqc \
        --filename reads \
        --title reads \
        --force \
        --outdir {params.dir} \
        {input} \
    2> {log} 1>&2
    """
37
38
39
40
41
42
43
44
45
46
shell:
    """
    multiqc \
        --title fastp \
        --force \
        --filename fastp \
        --outdir {params.dir} \
        {input} \
    2> {log} 1>&2
    """
61
62
63
64
65
66
67
68
69
70
71
shell:
    """
    multiqc \
        --title {params.title} \
        --force \
        --filename {params.title} \
        --outdir {params.dir} \
        --module kraken \
        {input} \
    2> {log} 1>&2
    """
 97
 98
 99
100
101
102
103
104
105
106
107
108
shell:
    """
    multiqc \
        --title {params.title} \
        --force \
        --filename {params.title} \
        --outdir {params.dir} \
        --dirs \
        --dirs-depth 1 \
        {input.reports} \
    2> {log} 1>&2
    """
135
136
137
138
139
140
141
142
143
144
145
146
shell:
    """
    multiqc \
        --title {params.title} \
        --force \
        --filename {params.title} \
        --outdir {params.dir} \
        --dirs \
        --dirs-depth 1 \
        {input.reports} \
    2> {log} 1>&2
    """
12
13
shell:
    "samtools flagstats {input.cram} > {output.txt} 2> {log}"
27
28
shell:
    "samtools idxstats {input.cram} > {output.tsv} 2> {log}"
43
44
45
46
47
48
49
50
shell:
    """
    samtools stats \
        --reference {input.reference} \
        {input.cram} \
    > {output.tsv} \
    2> {log}
    """
64
65
66
67
68
69
70
71
shell:
    """
    samtools stats \
        --reference {input.reference} \
        {input.cram} \
    > {output.tsv} \
    2> {log}
    """
28
29
30
31
32
33
34
35
36
37
38
39
shell:
    """
    gzip -dc {input.forward_} > {output.forward_fq} 2> {log}

    nonpareil \
        -s {output.forward_fq} \
        -T kmer \
        -b {params.prefix} \
        -f fastq \
        -t {threads} \
    2>> {log} 1>&2 || true
    """
64
65
66
67
68
69
70
shell:
    """
    Rscript --no-init-file workflow/scripts/aggregate_nonpareil.R \
        --input-folder {params.input_dir} \
        --output-file {output} \
    2> {log} 1>&2
    """
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
shell:
    """
    if [[ {params.is_paired} = "True" ]] ; then
        singlem pipe \
            --forward {input.forward_} \
            --reverse {input.reverse_} \
            --otu-table {output.otu_table} \
            --archive-otu-table {output.archive_otu_table} \
            --taxonomic-profile {output.condense} \
            --metapackage {input.data} \
            --threads {threads} \
            --assignment-threads {threads} \
        2> {log} 1>&2 || true
    else
        singlem pipe \
            --forward {input.forward_} \
            --otu-table {output.otu_table} \
            --archive-otu-table {output.archive_otu_table} \
            --taxonomic-profile {output.condense} \
            --metapackage {input.data} \
            --threads {threads} \
            --assignment-threads {threads} \
        2> {log} 1>&2 || true
    fi
    """
151
152
153
154
155
156
157
158
shell:
    """
    singlem condense \
        --input-archive-otu-tables {input.archive_otu_tables} \
        --taxonomic-profile {output.condense} \
        --metapackage {input.data} \
    2> {log} 1>&2
    """
190
191
192
193
194
195
196
197
198
199
200
shell:
    """
    samtools view \
        -F 4 \
        --threads {threads} \
        --reference {input.reference} \
        --output {output.bam} \
        --fast \
        {input.cram} \
    2> {log}
    """
219
220
221
222
223
224
225
226
227
shell:
    """
    coverm genome \
        --bam-files {input.bam} \
        --methods {params.method} \
        --separator {params.separator} \
        --min-covered-fraction {params.min_covered_fraction} \
    > {output} 2> {log} || true \
    """
245
246
247
248
249
250
251
shell:
    """
    Rscript --no-init-file workflow/scripts/aggregate_coverm.R \
        --input-folder {params.input_dir} \
        --output-file {output} \
    2> {log} 1>&2
    """
SnakeMake From line 245 of rules/stats.smk
276
277
278
279
280
281
282
283
shell:
    """
    coverm contig \
        --bam-files {input.bam} \
        --methods {params.method} \
        --proper-pairs-only \
    > {output} 2> {log} || true
    """
301
302
303
304
305
306
307
shell:
    """
    Rscript --no-init-file workflow/scripts/aggregate_coverm.R \
        --input-folder {params.input_dir} \
        --output-file {output} \
    2> {log} 1>&2
    """
SnakeMake From line 301 of rules/stats.smk
ShowHide 20 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/3d-omics/bioinfo_detection_limit_test
Name: bioinfo_detection_limit_test
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...