Snakemake workflow for analysis of Tara Euk Metagenomes

public public 1yr ago 0 bookmarks

Setup

Create a conda environment for the running of the pipeline:

conda env create --name snakemake-tara-euk --file environment.yaml

Code Snippets

149
150
wrapper:
    "0.27.1/bio/fastqc"
167
168
wrapper:
    "0.27.1/bio/trimmomatic/pe"
SnakeMake From line 167 of master/Snakefile
179
180
wrapper:
    "0.27.1/bio/fastqc"
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
shell: 
    """
    multiqc -n multiqc.html {input.rawG}
    mv multiqc.html {output.html_rawG}
    mv multiqc_data/multiqc_general_stats.txt {output.stats_rawG} 
    rm -rf multiqc_data

    multiqc -n multiqc.html {input.trimmedG}
    mv multiqc.html {output.html_trimmedG}
    mv multiqc_data/multiqc_general_stats.txt {output.stats_trimmedG} 
    rm -rf multiqc_data

    multiqc -n multiqc.html {input.trimmedT}
    mv multiqc.html {output.html_trimmedT}
    mv multiqc_data/multiqc_general_stats.txt {output.stats_trimmedT} 
    rm -rf multiqc_data

    multiqc -n multiqc.html {input.trimmedT}
    mv multiqc.html {output.html_trimmedT}
    mv multiqc_data/multiqc_general_stats.txt {output.stats_trimmedT} 
    rm -rf multiqc_data
    """ 
233
234
235
236
shell: 
    """
    interleave-reads.py {input.r2} {input.r1} | trim-low-abund.py --gzip -C 3 -Z 18 -M 30e9 -V - -o {output} 2> {log} 
    """
SnakeMake From line 233 of master/Snakefile
248
249
250
251
252
253
shell: 
    """
    zcat {input.r1} {input.r2} | sourmash compute -k 21,31,51\
        --scaled 10000  --track-abundance \
        -o {output} - 2> {log}
    """
SnakeMake From line 248 of master/Snakefile
273
274
275
276
shell: 
    """
    megahit -1 {params.inputr1} -2 {params.inputr2} --min-contig-len {params.min_contig_len} --memory {params.memory} -t {params.cpu_threads} --out-dir {params.megahit_output_name} {params.other_options}  >> {log} 2>&1
    """
294
295
296
297
shell:
    """
    bwa index {input} 2> {log}
    """
304
305
306
307
shell:
    """
    cp {input} {output}
    """
SnakeMake From line 304 of master/Snakefile
329
330
331
332
shell:
    """ 
    bwa mem -t {params.threads} {params.extra} {input.reference} {input.r1} {input.r2} | samtools sort -o {output} - >> {log} 2>&1
    """ 
343
344
345
346
shell:
    """
    jgi_summarize_bam_contig_depths --outputDepth {output} {input} > {log} 2>&1
    """
SnakeMake From line 343 of master/Snakefile
361
362
363
364
shell:
    """
    metabat2 {params.other} --numThreads {params.threads} -i {input.assembly} -a {input.depth} -o {output} > {log} 2>&1
    """
376
377
378
379
380
381
382
383
shell:
    '''
    concoct --coverage_file {input.depth} \
        --composition_file {input.assembly} \
        --basename {params.outdir} \
        --length_threshold {params.length} \
        --converge_out -t 12\
    '''        
389
390
391
392
shell: 
    '''
    merge_cutup_clustering.py {input} > {output}
    '''
SnakeMake From line 389 of master/Snakefile
399
400
401
402
403
404
shell:
    '''
    mkdir -p {output}
    extract_fasta_bins.py {input.assembly} {input.csv} --output_path {params.outdir}
    touch {output.done}
    '''
SnakeMake From line 399 of master/Snakefile
411
412
413
414
415
shell:
    """
    ulimit -s 65536
    filter_euk_bins.py {input.fastas} --minbpeuks {params.minbpeuks} --eukratio {params.eukratio} --output {output} --tempdir {params.tempdir}
    """
SnakeMake From line 411 of master/Snakefile
429
430
431
432
shell: 
    """
    EukRep -i {input} -o {output} --prokarya {params.prok} --min {params.min_contig} > {log} 2>&1
    """
447
448
449
450
shell:
    """
    metabat2 {params.other} --numThreads {params.threads} -i {input.assembly} -a {input.depth} -o {output} > {log} 2>&1
    """
460
461
462
463
shell:
    """
    prodigal -i {input.assembly} -f gff -o {output.genes} -a {output.proteins} -p meta
    """
480
481
482
483
shell:
    """
    bwa index {input} 2> {log}
    """
505
506
507
508
shell:
    """ 
    bwa mem -t {params.threads} {params.extra} {input.reference} {input.r1} {input.r2} | samtools sort -o {output} - >> {log} 2>&1
    """ 
518
519
520
521
shell:
    """
    coverm genome --bam-files {input.mapping} --genome-fasta-directory {input.genome_dir} --genome-fasta-extension "fa"  --min-read-percent-identity 0.95     --min-read-aligned-percent 0.75  --proper-pairs-only --methods count length covered_bases covered_fraction reads_per_base mean variance trimmed_mean rpkm relative_abundance     --output-format dense     --min-covered-fraction 0     --contig-end-exclusion 75     --trim-min 0.05     --trim-max 0.95 --quiet > {output}
    """
534
535
536
537
shell:
    """
    coverm genome --bam-files {input.mapping} --genome-fasta-directory {input.genome_dir} --genome-fasta-extension "fa"  --min-read-percent-identity 0.95     --min-read-aligned-percent 0.75  --proper-pairs-only --methods coverage_histogram --output-format dense     --min-covered-fraction 0     --contig-end-exclusion 75     --trim-min 0.05     --trim-max 0.95 --quiet > {output}
    """
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
__author__ = "Julian de Ruiter"
__copyright__ = "Copyright 2017, Julian de Ruiter"
__email__ = "[email protected]"
__license__ = "MIT"


from os import path
from tempfile import TemporaryDirectory

from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

def basename_without_ext(file_path):
    """Returns basename of file path, without the file extension."""

    base = path.basename(file_path)

    split_ind = 2 if base.endswith(".gz") else 1
    base = ".".join(base.split(".")[:-split_ind])

    return base


# Run fastqc, since there can be race conditions if multiple jobs 
# use the same fastqc dir, we create a temp dir.
with TemporaryDirectory() as tempdir:
    shell("fastqc {snakemake.params} --quiet "
          "--outdir {tempdir} {snakemake.input[0]}"
          " {log}")

    # Move outputs into proper position.
    output_base = basename_without_ext(snakemake.input[0])
    html_path = path.join(tempdir, output_base + "_fastqc.html")
    zip_path = path.join(tempdir, output_base + "_fastqc.zip")

    if snakemake.output.html != html_path:
        shell("mv {html_path} {snakemake.output.html}")

    if snakemake.output.zip != zip_path:
        shell("mv {zip_path} {snakemake.output.zip}")
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "[email protected]"
__license__ = "MIT"


from snakemake.shell import shell

extra = snakemake.params.get("extra", "")
log = snakemake.log_fmt_shell(stdout=True, stderr=True)
trimmer = " ".join(snakemake.params.trimmer)

shell("trimmomatic PE {snakemake.params.extra} "
      "{snakemake.input.r1} {snakemake.input.r2} "
      "{snakemake.output.r1} {snakemake.output.r1_unpaired} "
      "{snakemake.output.r2} {snakemake.output.r2_unpaired} "
      "{trimmer} "
      "{log}")
ShowHide 13 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/AlexanderLabWHOI/tara-euk-metaG
Name: tara-euk-metag
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...