Mitochondria Variant Discovery Using Snakemake

public public 1yr ago 0 bookmarks

https://gatk.broadinstitute.org/hc/en-us/articles/4403870837275-Mitochondrial-short-variant-discovery-SNVs-Indels-

Preparation

  • copy env file to '.env'

  • set parameters to

Code Snippets

40
41
42
43
44
45
46
47
48
49
50
run:
    name = os.path.basename(output[0]).replace("_1P.fq.gz","")
    r1 = info[info["sample"]==name]["r1"].squeeze()
    r2 = info[info["sample"]==name]["r2"].squeeze()
    print(f"R1: {r1}\tR2: {r2}")
    cmd  = f"[ ! -d {outdir}/remove_adapter ] && mkdir {outdir}/remove_adapter; \n"
    cmd += f"trimmomatic PE -threads {threads} -phred33 "
    cmd += f"  -trimlog {outdir}/remove_adapter/{name}.trimmomatic.log -summary {outdir}/remove_adapter/{name}.trimmomatic.summary "
    cmd += f"  {r1} {r2} -baseout {params.prefix}.fq.gz "
    cmd += f"  LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 > >(tee {log}) 2>&1; \n"
    subprocess.run(cmd, shell=True, executable='/bin/bash')
73
74
75
76
77
78
79
80
81
shell:
    "[ ! -d {outdir}/mapping ] && mkdir {outdir}/mapping; \n"
    "{params.bwa} -t {threads} -R '@RG\\tID:{wildcards.sample}\\tSM:{wildcards.sample}\\tLB:MT\\tPL:Illumina' {genome} {input} | "
    "  {params.samtools} -o {output.sort} - > >(tee {log}) 2>&1; "
    "{params.gatk} -conf 'spark.executor.cores={threads}' --input {output.sort} --output {output.mark} --metrics-file {output.metrics} > >(tee {log}) 2>&1;\n"
    "[ ! -d {outdir}/qc/{wildcards.sample} ] && mkdir {outdir}/qc/{wildcards.sample}; \n"
    "{params.bamdst} -p {mt_bed} -o {outdir}/qc/{wildcards.sample} {output.mark} ; \n"
    "paste <(echo -e '#Sample\n{wildcards.sample}') {outdir}/qc/{wildcards.sample}/chromosomes.report > {outdir}/qc/{wildcards.sample}/chromosomes.report.1; \n"
    "sed -n '4,$p' {outdir}/qc/{wildcards.sample}/coverage.report | sed 's| \+ ||g'|sed '1i [Sample] Sample Name\t{wildcards.sample}' > {outdir}/qc/{wildcards.sample}/coverage.report.1; \n"
101
102
103
104
105
shell:
    "[ ! -d {outdir}/mapping ] && mkdir {outdir}/mapping; \n"
    "{params.bwa} -t {threads} -R '@RG\\tID:{wildcards.sample}\\tSM:{wildcards.sample}\\tLB:MT\\tPL:Illumina' {genome} {input} | "
    "  {params.samtools} -o {output.sort} - > >(tee {log}) 2>&1; \n"
    "{params.gatk} -conf 'spark.executor.cores={threads} ' --input {output.sort} --output {output.mark} --metrics-file {output.metrics} > >(tee {log}) 2>&1; \n"
119
120
121
122
shell:
    "gatk CollectWgsMetrics --VALIDATION_STRINGENCY SILENT --USE_FAST_ALGORITHM true --INCLUDE_BQ_HISTOGRAM true "
    "-CAP {params.coverage_cap} --READ_LENGTH {params.read_length}  "
    "-R {genome} -I {input} -O {output} > >(tee {log}) 2>&1"
144
145
146
shell:
    "[ ! -d {outdir}/call-variants ] && mkdir {outdir}/call-variants; "
    "{params.gatk_mutect2} --native-pair-hmm-threads {threads} -R {input.genome} -L {params.region} -I {input.bam} --bam-output {params.prefix}.nonctr.bam -O {output.vcf} > >(tee {log}) 2>&1 ; \n"
170
171
172
shell:
    "[ ! -d {outdir}/call-variants ] && mkdir {outdir}/call-variants; "
    "{params.gatk_mutect2} --native-pair-hmm-threads {threads} -R {input.genome} -L {params.region} -I {input.bam} --bam-output {params.prefix}.ctrshift.bam -O {output.vcf} > >(tee {log}) 2>&1;\n "
194
195
196
shell:
    "gatk LiftoverVcf -R {genome} -C {input.chain} -I {input.vcf_shift} -O {params.prefix}.ctr.vcf --REJECT {params.prefix}.ctr.rejected.vcf 2> >(tee {log} >&2) "
    "&& gatk MergeVcfs -I {input.vcf} -I {params.prefix}.ctr.vcf -O {params.prefix}.raw.vcf > >(tee {log}) 2>&1;\n"
211
212
213
shell:
    "[ ! -d {outdir}/final ] && mkdir {outdir}/final; \n"
    "vcfs=({outdir}/call-variants/*raw.vcf.gz); if ((${{#vcfs[@]}} == 1)); then cp ${{vcfs}} {output.raw}; else bcftools merge {input.raw} -o {output.raw}; fi;\n"
227
228
229
230
shell:
    "[ ! -d {outdir}/final ] && mkdir {outdir}/final; "
    "cat {outdir}/qc/*/chromosomes.report.1 | sed -n '{rows}' > {outdir}/final/chromosomes.report; "
    "paste {outdir}/qc/*/coverage.report.1 | cut -f{cols}> {outdir}/final/coverage.report; "
245
246
247
248
249
250
251
shell:
    "[ ! -d {outdir}/sv ] && mkdir {outdir}/sv; \n"
    "gatk CollectSVEvidence -R {genome} -L {mt_interval} -DI {mt_interval} -I {input.bam} -F {input.vcf} --sample-name {wildcards.sample} -RD {params.prefix}.RD.txt -SR {params.prefix}.SR.txt -PE {params.prefix}.PE.txt -SD {params.prefix}.SD.txt; \n"
    "[ ! -d {outdir}/sv/manta ] && mkdir {outdir}/sv/manta; \n"
    "python2 {params.manta} --referenceFasta {genome}  --callRegions {mt_bed}.gz --tumorBam {input.bam} --runDir {outdir}/sv/manta/{wildcards.sample} && python2 {outdir}/sv/manta/{wildcards.sample}/runWorkflow.py --mode local; \n"
    "cp {outdir}/sv/manta/{wildcards.sample}/results/variants/tumorSV.vcf.gz {outdir}/sv/{wildcards.sample}.sv.manta.vcf.gz; \n"
    "{params.whamg} -f {input.bam} -a {genome} > {outdir}/sv/{wildcards.sample}.sv.whamg.vcf; \n"
ShowHide 6 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/Sue9104/mito-scrna
Name: mito-scrna
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...