MToolBox pipeline written in snakemake to allow a better scalability

public public 1yr ago Version: Prototype.2 0 bookmarks

This is an work-in-progress update of MToolBox ( PMID:25028726 ). Please find more at the official documentation .

Code Snippets

 99
100
101
102
103
shell:
    """
    mkdir -p {params.outDir}
    fastqc -t {threads} -o {params.outDir} {input} &> {log}
    """
119
120
121
122
123
shell:
    """
    #module load gsnap
    gmap_build -D {params.gmap_db_dir} -d {params.gmap_db} -s none {input.mt_genome_fasta} &> {log}
    """
SnakeMake From line 119 of master/Snakefile
145
146
147
148
149
150
shell:
    """
    cat {input.mt_genome_fasta} {input.n_genome_fasta} > {output.mt_n_fasta}
    gmap_build -D {params.gmap_db_dir} -d {params.gmap_db} -s none {output.mt_n_fasta} &> {log}
    # rm {input.mt_genome_fasta}_{input.n_genome_fasta}.fasta
    """
SnakeMake From line 145 of master/Snakefile
172
173
174
175
176
177
178
shell:
    """

    mkdir -p {params.outDir}
    fastqc -t {threads} -o {params.outDir} {input} &> {log}

    """
208
209
210
211
run:
    #trimmomatic_adapters_path = get_trimmomatic_adapters_path()
    shell("export tap=$(which trimmomatic | sed 's/bin\/trimmomatic/share\/trimmomatic\/adapters\/TruSeq3-PE.fa/g'); trimmomatic PE {params.options} -threads {threads} {input.R1} {input.R2} {params.out1P} {params.out1U} {params.out2P} {params.out2U} ILLUMINACLIP:$tap:2:30:10 {params.processing_options} &> {log}")
    shell("zcat {params.out1U} {params.out2U} | gzip > {output.out1U} && rm {params.out1U} {params.out2U}")
234
235
236
237
238
239
240
241
242
243
run:
    if seq_type == "pe":
        print("PE mode")
        shell("gsnap -D {params.gmap_db_dir} -d {params.gmap_db} -o {params.uncompressed_output} -A sam --gunzip --nofails --pairmax-dna=500 --query-unk-mismatch=1 {params.RG_tag} -n 1 -Q -O -t {threads} {input[0]} {input[1]} &> {log} && gzip {params.uncompressed_output} &>> {log}")
    if seq_type == "se":
        print("SE mode")
        shell("gsnap -D {params.gmap_db_dir} -d {params.gmap_db} -o {params.uncompressed_output} -A sam --gunzip --nofails --pairmax-dna=500 --query-unk-mismatch=1 {params.RG_tag} -n 1 -Q -O -t {threads} {input[0]} &> {log} && gzip {params.uncompressed_output} &>> {log}")
    elif seq_type == "both":
        print("PE + SE mode")
        shell("gsnap -D {params.gmap_db_dir} -d {params.gmap_db} -o {params.uncompressed_output} -A sam --gunzip --nofails --pairmax-dna=500 --query-unk-mismatch=1 {params.RG_tag} -n 1 -Q -O -t {threads} {input[0]} {input[1]} {input[2]} &> {log} && gzip {params.uncompressed_output} &>> {log}")
SnakeMake From line 234 of master/Snakefile
257
258
259
run:
    sam_to_fastq(samfile=input.outmt_sam, outmt1=output.outmt1,
                 outmt2=output.outmt2, outmt=output.outmt)
SnakeMake From line 257 of master/Snakefile
283
284
285
286
287
run:
    if os.path.isfile(input.outmt):
        shell("gsnap -D {params.gmap_db_dir} -d {params.gmap_db} -o {params.uncompressed_output} --gunzip -A sam --nofails --query-unk-mismatch=1 -O -t {threads} {input.outmt} &> {log.logS} && gzip {params.uncompressed_output} &>> {log.logS}")
    else:
        open(output.outS, 'a').close()
SnakeMake From line 283 of master/Snakefile
314
315
316
317
318
run:
    if os.path.isfile(input.outmt1):
        shell("gsnap -D {params.gmap_db_dir} -d {params.gmap_db} -o {params.uncompressed_output} --gunzip -A sam --nofails --query-unk-mismatch=1 -O -t {threads} {input.outmt1} {input.outmt2} &> {log.logP} && gzip {params.uncompressed_output} &>> {log.logP}")
    else:
        open(output.outP, 'a').close()
SnakeMake From line 314 of master/Snakefile
332
333
334
335
336
337
run:
    filter_alignments(outmt=input.outmt,
                      outS=input.outS,
                      outP=input.outP,
                      OUT=output.sam,
                      ref_mt_fasta=params.ref_mt_fasta)
SnakeMake From line 332 of master/Snakefile
348
349
350
351
shell:
    """
    zcat {input.sam} | samtools view -b -o {output} - &> {log}
    """
364
365
366
367
368
shell:
    """
    samtools sort -o {output.sorted_bam} -T {params.TMP} {input.bam} &> {log}
    # samtools sort -o {output.sorted_bam} -T ${{TMP}} {input.bam}
    """
381
382
383
384
385
386
387
388
389
390
391
392
393
run:
    if params.mark_duplicates == True:
        shell("picard MarkDuplicates \
            INPUT={input.sorted_bam} \
            OUTPUT={output.sorted_bam_md} \
            METRICS_FILE={output.metrics_file} \
            ASSUME_SORTED=true \
            REMOVE_DUPLICATES=true \
            TMP_DIR={params.TMP}")
    else:
        shutil.copy2(input.sorted_bam, output.sorted_bam_md)
        with open(output.metrics_file, "w") as f:
            f.write("")
407
408
409
410
411
shell:
    """
    samtools merge {output.merged_bam} {input} &> {log}
    samtools index {output.merged_bam} {output.merged_bam_index}
    """
421
422
423
424
shell:
    """
    samtools faidx {input.mt_n_fasta} &> {log}
    """
434
435
run:
    shell("picard CreateSequenceDictionary R={input.mt_n_fasta} O={output.genome_dict}")
450
451
452
453
454
455
456
457
458
shell:
    """
    java -Xmx6G -jar {params.source_dir}/modules/GenomeAnalysisTK.jar \
        -R {input.mt_n_fasta} \
        -T LeftAlignIndels \
        -I {input.merged_bam} \
        -o {output.merged_bam_left_realigned} \
        --filter_reads_with_N_cigar
    """
472
473
474
475
shell:
    """
    samtools mpileup -B -f {params.genome_fasta} -o {output.pileup} {input.merged_bam} &> {log}
    """
488
489
490
run:
    mt_table_data = pileup2mt_table(pileup=input.pileup, ref_fasta=params.ref_mt_fasta)
    write_mt_table(mt_table_data=mt_table_data, mt_table_file=output.mt_table)
SnakeMake From line 488 of master/Snakefile
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
run:
    # function (and related ones) from mtVariantCaller
    # vcf_dict = mtvcf_main_analysis(sam_file = input.sam, mtable_file = input.mt_table, name2 = wildcards.sample)
    tmp_sam = os.path.split(input.merged_bam)[1].replace(".bam", ".sam")
    shell("samtools view {merged_bam} > {tmp_dir}/{tmp_sam}".format(merged_bam=input.merged_bam,
                                                                    tmp_dir=params.TMP,
                                                                    tmp_sam=tmp_sam))
    vcf_dict = mtvcf_main_analysis(sam_file="{tmp_dir}/{tmp_sam}".format(tmp_dir=params.TMP,
                                                                         tmp_sam=tmp_sam),
                                   mtable_file=input.mt_table, name2=wildcards.sample)
    # ref_genome_mt will be used in the VCF descriptive field
    # seq_name in the VCF data
    seq_name = get_seq_name(params.ref_mt_fasta)
    VCF_RECORDS = VCFoutput(vcf_dict, reference=wildcards.ref_genome_mt,
                            seq_name=seq_name, vcffile=output.single_vcf)
    bed_output(VCF_RECORDS, seq_name=seq_name, bedfile=output.single_bed)
    # fasta output
    #contigs = pileup2mt_table(pileup=input.pileup, fasta=params.ref_mt_fasta, mt_table=in.mt_table)
    mt_table_data = pileup2mt_table(pileup=input.pileup,
                                    ref_fasta=params.ref_mt_fasta)
    gapped_fasta = mt_table_handle2gapped_fasta(mt_table_data=mt_table_data)
    contigs = gapped_fasta2contigs(gapped_fasta=gapped_fasta)
    fasta_output(vcf_dict=vcf_dict, ref_mt=params.ref_mt_fasta,
                 fasta_out=output.single_fasta, contigs=contigs)
546
547
run:
    shell("bcftools index {input.single_vcf}")
561
562
run:
    shell("bcftools merge {input.single_vcf_list} -O v -o {output.merged_vcf}")
ShowHide 16 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/mitoNGS/MToolBox_snakemake
Name: mtoolbox_snakemake
Version: Prototype.2
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...