Revised WGS Genotyping V2 Workflow

public public 1yr ago 0 bookmarks
  • Cristian Gonzalez-Colin ([email protected])

  • Vijayanand Lab (https://www.lji.org/labs/vijayanand/)

  • La Jolla Institute for Immunology (LJI)

  • La Jolla, CA USA

  • Current v

Code Snippets

66
67
68
69
run:
    smlldonor = params.donor
    smlldonor = smlldonor.replace("reseq", "")
    shell("{bwa_app} mem -M -t {threads} -R '@RG\\tID:{smlldonor}\\tSM:{smlldonor}\\tPL:ILLUMINA' {input.fasta_ref} {input.R1} {input.R2} | samtools view -S -bh - > {output.map_file} ")
80
81
82
83
84
run:
    if len(input) == 1:
        shell("cp {input.run} {output.bam} ")
    else:
        shell("{samtools_app} merge --threads {threads} {output.bam} {input.run} {input.reseq} ")
SnakeMake From line 80 of main/Snakefile
93
94
shell:
    "{samtools_app} sort -@ {threads} {input.bam_file} -o {output.bam_sort}"
SnakeMake From line 93 of main/Snakefile
105
106
107
shell:
    "java -jar {picard_app} MarkDuplicates I={input.bam_file} "
    "O={output.bam_noDup} M={output.dup_file} REMOVE_DUPLICATES={params.rm_dup} "
SnakeMake From line 105 of main/Snakefile
115
116
shell:
    "{samtools_app} stats  {input.bam_file}  --threads {threads} > {output.txt_file}"
SnakeMake From line 115 of main/Snakefile
124
125
126
shell:
    "java -jar {picard_app} CollectWgsMetrics I={input.bam_file} "
    "O={output.collect} R={input.fasta_ref} "
SnakeMake From line 124 of main/Snakefile
133
134
shell:
    "{samtools_app} index {input.bam_file} {output.index_file}"
SnakeMake From line 133 of main/Snakefile
150
151
152
153
154
shell:
    "{gatk_app} BaseRecalibrator -I {input.bam_file} -R {input.fasta_ref} "
    " --known-sites {params.snps_1000} --known-sites {params.indels_Mills} "
    " --known-sites {params.hg38_indels} --known-sites {params.hg38_snps} "
    " --known-sites {params.snps_hapmap} -O {output.recal_table} "
SnakeMake From line 150 of main/Snakefile
166
167
168
169
170
shell:
    "{gatk_app} --java-options \"-Djava.io.tmpdir={params.tempdir} -Xms2G -Xmx2G -XX:ParallelGCThreads=2\" ApplyBQSR "
    " -I {input.bam} -R {input.fasta_ref} "
    " --bqsr-recal-file {input.recal_table} --create-output-bam-index true "
    " -O {output.bamfile}"
179
180
181
shell:
    "{gatk_app} HaplotypeCaller "
    "-I {input.bam_file} -R {input.fasta_ref} -O {output.vcf_file} -ERC GVCF "
SnakeMake From line 179 of main/Snakefile
190
191
shell:
    "{samtools_app} view -hb -q 20 {input.bam_file} > {output.filter_bam}"
SnakeMake From line 190 of main/Snakefile
201
202
shell:
    "grep -w {params.chr} {input.interval_file} > {output.chr_file}"
SnakeMake From line 201 of main/Snakefile
213
214
215
216
run:
    iFiles = " ".join(["-V " + file for file in input.vcf_file ])
    shell("mkdir -p " + params.tempdir)
    shell("{gatk_app}  GenomicsDBImport " + iFiles + "  --genomicsdb-workspace-path {output.database} -L {input.interval_file} --tmp-dir {params.tempdir}")
SnakeMake From line 213 of main/Snakefile
225
226
shell:
    "{gatk_app} --java-options \"-Xmx4g\" GenotypeGVCFs -R {input.fasta_ref} -V gendb://{input.database} -O {output.cohort_vcf}"
SnakeMake From line 225 of main/Snakefile
233
234
235
run:
    iFiles = " ".join(["-I " + file for file in input.vcf_files ])
    shell("java -jar {picard_app} GatherVcfs " + iFiles + " -O {output.cohort} ")
SnakeMake From line 233 of main/Snakefile
243
244
shell:
    "{bcftools_app} norm -f {input.fasta_ref} -m -any -Oz -o {output.cohort} {input.vcf_file} "
SnakeMake From line 243 of main/Snakefile
251
252
shell:
    "tabix -p vcf {input.cohort} "
268
269
270
271
272
273
274
275
276
277
278
279
280
281
shell:
    "{gatk_app} --java-options \"-Djava.io.tmpdir={params.tempdir} -Xms4G -Xmx4G -XX:ParallelGCThreads=2\" VariantRecalibrator \
      -tranche 100.0 -tranche 99.95 -tranche 99.9 -tranche 99.8 \
      -tranche 99.5 -tranche 99.0 -tranche 97.0 -tranche 96.0 \
      -tranche 95.0 -tranche 94.0 \
      -tranche 93.5 -tranche 93.0 -tranche 92.0 -tranche 91.0 -tranche 90.0 \
      -R {input.fasta_ref} \
      -V {input.cohort} \
      --resource:hapmap,known=false,training=true,truth=true,prior=15.0 {input.snps_hapmap} \
      --resource:omni,known=false,training=true,truth=false,prior=12.0 {input.snps_omni} \
      --resource:1000G,known=false,training=true,truth=false,prior=10.0 {input.snps_1000} \
      -an QD -an MQ -an MQRankSum -an ReadPosRankSum -an FS -an SOR  \
      -mode SNP -O {output.recal} --tranches-file {output.tranches} \
      --rscript-file {output.script} "
SnakeMake From line 268 of main/Snakefile
296
297
298
299
300
301
302
303
304
305
306
307
308
shell:
    "{gatk_app} --java-options \"-Djava.io.tmpdir={params.tempdir} -Xms4G -Xmx4G -XX:ParallelGCThreads=2\" VariantRecalibrator \
      -tranche 100.0 -tranche 99.95 -tranche 99.9 -tranche 99.8 \
      -tranche 99.5 -tranche 99.0 -tranche 97.0 -tranche 96.0 \
      -tranche 95.0 -tranche 94.0 -tranche 93.5 -tranche 93.0 \
      -tranche 92.0 -tranche 91.0 -tranche 90.0 \
      -R {input.fasta_ref} \
      -V {input.cohort} \
      --resource:mills,known=false,training=true,truth=true,prior=12.0 {input.indels_Mills} \
      --resource:dbsnp,known=true,training=false,truth=false,prior=2.0 {input.hg38_snps} \
      -an QD -an MQRankSum -an ReadPosRankSum -an FS -an SOR -an DP \
      -mode INDEL -O {output.recal} --tranches-file {output.tranches} \
      --rscript-file {output.script} "
SnakeMake From line 296 of main/Snakefile
319
320
321
322
323
324
325
326
327
328
shell:
    "{gatk_app} --java-options \"-Djava.io.tmpdir={params.tempdir} \
      -Xms2G -Xmx2G -XX:ParallelGCThreads=2\" ApplyVQSR \
      -V {input.cohort} \
      --recal-file {input.recal} \
      -mode SNP \
      --tranches-file {input.tranches} \
      --truth-sensitivity-filter-level 99.8 \
      --create-output-variant-index true \
      -O {output.vcf}"
SnakeMake From line 319 of main/Snakefile
339
340
341
342
343
344
345
346
347
348
shell:
    "{gatk_app} --java-options \"-Djava.io.tmpdir={params.tempdir} \
      -Xms2G -Xmx2G -XX:ParallelGCThreads=2\" ApplyVQSR \
      -V {input.vcf} \
      -mode INDEL \
      --recal-file {input.recal} \
      --tranches-file {input.tranches} \
      --truth-sensitivity-filter-level 99.9 \
      --create-output-variant-index true \
      -O {output.vcf} "
SnakeMake From line 339 of main/Snakefile
358
359
360
361
shell:
    "{gatk_app} --java-options \"-Djava.io.tmpdir={params.tempdir} -Xms2G -Xmx2G -XX:ParallelGCThreads=2\" CalculateGenotypePosteriors "
    " -V {input.vcf} "
    " --supporting-callsets {input.wgs_1000GP} -O {output.vcf} "
SnakeMake From line 358 of main/Snakefile
370
371
shell:
    "{bcftools_app} +fill-tags {input.vcf} -Ou | {bcftools_app} view -t {params.chr_str} -Ou | {bcftools_app} annotate --set-id '%CHROM\_%POS\_%REF\_%ALT' -Oz -o {output.vcf} "
SnakeMake From line 370 of main/Snakefile
378
379
shell:
    "{bcftools_app} filter -S . -e 'FMT/DP<3 | FMT/GQ<20' -Oz -o {output.vcf} {input.vcf} "
SnakeMake From line 378 of main/Snakefile
386
387
shell:
    "{bcftools_app} filter -i 'FILTER=\"PASS\"' -Ou {input.vcf}| {bcftools_app} filter -i 'INFO/HWE > 1e-6' -Ou | {bcftools_app} filter -i 'F_MISSING < 0.15' -Oz -o {output.vcf}"
SnakeMake From line 386 of main/Snakefile
397
398
shell:
    "java -Xmx24g -jar {beagle_app} gt={input.vcf} nthreads={threads} out={params.prefix}  impute=false"
SnakeMake From line 397 of main/Snakefile
405
406
shell:
    "tabix -p vcf {input} "
414
415
shell:
    "{bcftools_app} +fill-tags  -Oz -o {output.vcf} {input.vcf}"
SnakeMake From line 414 of main/Snakefile
424
425
426
shell:
    "for chr in {{1..22},X}; do echo chr$\{chr\} $\{chr\} >> {params.chrs}; done"
    "{bcftools_app} annotate --rename-chrs {params.chrs} {input} -Oz -o {output} --set-id '%CHROM\:%POS\:%REF\:%ALT' "
SnakeMake From line 424 of main/Snakefile
435
436
shell:
    "{bcftools_app} filter -i 'INFO/MAF > {params.maf}' -Oz -o {output.vcf} {input.vcf} "
SnakeMake From line 435 of main/Snakefile
444
445
shell:
    "{bcftools_app} +dosage {input.vcf} -- -t GT > {output.table}"
SnakeMake From line 444 of main/Snakefile
452
453
shell:
    "{bcftools_app} query -f '%ID\n' {input.vcf}  > {output.table}"
SnakeMake From line 452 of main/Snakefile
464
465
shell:
    "Rscript {params.script} -g {input.genotype} -v {input.variants} -r {params.results}"
SnakeMake From line 464 of main/Snakefile
ShowHide 30 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/cristian2420/WGS_genotyping
Name: wgs_genotyping
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...