Variant calling workflow using GATK4

public public 1yr ago 0 bookmarks

Variant calling workflow using GATK4

For now, for simplicity, we assume that each sample is a 'platform unit'.

Usage

  1. Fill out/Edit the config.yaml file.

  2. Make samplesheet. You can go to the bin/ folder and run bash make_samples_template.sh .

  3. A file named grouped_contigs.tsv with a column named 'name' and a second column named 'contigs' containing comma-delimited sequence names (chromosome/contig names) is needed. You may run Rscript group_contigs.R .

  4. qsub -q bbc bin/run_snakemake.sh .

Workflow

Workflow

Code Snippets

88
89
90
91
92
93
94
95
96
97
shell:
    """
    if [ `printf '{input}' | wc -w` -gt 1 ]
    then
        cat {input} > {output}
    else
        ln -sr {input} {output}
    fi

    """
SnakeMake From line 88 of main/Snakefile
120
121
122
123
shell:
    """
    fastqc --outdir {params.outdir} {input}
    """
145
146
147
148
shell:
    """
    fastq_screen --threads {threads} --outdir analysis/fastq_screen/ {input}
    """
SnakeMake From line 145 of main/Snakefile
171
172
173
174
shell:
    """
    trim_galore --paired {input} --output_dir analysis/trim_galore/ --cores {threads} --fastqc
    """
197
198
199
200
shell:
    """
    trim_galore {input} --output_dir analysis/trim_galore/ --cores {threads} --fastqc
    """
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
shell:
    """
    bwa mem \
    -t {threads} \
    -R {params.bwa_RG} \
    {params.bwa_idx} \
    {input} | \
    samblaster {params.samblaster_params} 2>{output.samblaster_err} | \
    samtools sort \
    -m 6G \
    -@ {threads} \
    -O "BAM" \
    -o {output.outbam} \
    -

    echo "END bwamem"
    echo "END bwamem" 1>&2

    samtools index -@ {threads} {output.outbam}

    echo "END indexing"
    echo "END indexing" 1>&2

    samtools idxstats {output.outbam} > {output.idxstat}

    echo "END idxstats"
    echo "END idxstats" 1>&2

    """
276
277
278
279
280
281
282
283
284
285
286
287
shell:
    """
    gatk --java-options "-Xms8g -Xmx{resources.mem_gb}g -Djava.io.tmpdir=./tmp" \
    HaplotypeCaller \
    -R {params.ref_fasta} \
    -I {input.bam} \
    -O {output} \
    -ERC GVCF \
    --native-pair-hmm-threads {threads} \
    {params.dbsnp} \
    {params.contigs}
    """
309
310
311
312
313
314
315
316
shell:
    """
    gatk --java-options "-Xms8g -Xmx{resources.mem_gb}g -Djava.io.tmpdir=./tmp" \
    GenomicsDBImport \
    {params.sample_gvcfs} \
    --genomicsdb-workspace-path {output.genomicsdb} \
    {params.contigs}
    """
336
337
338
339
340
341
342
343
shell:
    """
    gatk --java-options "-Xms8g -Xmx{resources.mem_gb}g -Djava.io.tmpdir=./tmp" \
    GenotypeGVCFs \
    -R {params.ref_fasta} \
    -V gendb://{params.genomicsdb} \
    -O {output.vcf}
    """
365
366
367
368
369
370
371
372
shell:
    """
    gatk --java-options "-Xms8g -Xmx{resources.mem_gb}g -Djava.io.tmpdir=./tmp" \
    SortVcf \
    -I {input.vcf} \
    -O {output.sorted_vcf} \
    -SD {params.dictionary} 
    """
399
400
401
402
403
404
405
406
407
408
shell:
    """
    gatk --java-options "-Xms8g -Xmx{resources.mem_gb}g -Djava.io.tmpdir=./tmp" \
    MergeVcfs \
    {params.in_vcfs} \
    --SEQUENCE_DICTIONARY {params.dictionary} \
    --OUTPUT {output.raw} 

    vt peek -r {params.ref_fasta} {output.raw} 2> {output.vt_peek_raw} 1>>{log.stdout}
    """
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
shell:
    """
    gatk --java-options "-Xms8g -Xmx{resources.mem_gb}g -Djava.io.tmpdir=./tmp" \
    SelectVariants \
    -R {params.ref_fasta} \
    -V {input} \
    --select-type-to-include {wildcards.var_type} \
    -O {output.raw}

    echo "SelectVariants 1 done." >> {log.stdout}
    echo "SelectVariants 1 done." >> {log.stderr}

    gatk --java-options "-Xms8g -Xmx{resources.mem_gb}g -Djava.io.tmpdir=./tmp" \
    VariantFiltration \
    --R {params.ref_fasta} \
    --V {output.raw} \
    {params.filt_params} \
    -O {output.filt} 

    echo "VariantFiltration done." >> {log.stdout}
    echo "VariantFiltration done." >> {log.stderr}

    gatk --java-options "-Xms8g -Xmx{resources.mem_gb}g -Djava.io.tmpdir=./tmp" \
    SelectVariants \
    -R {params.ref_fasta} \
    -V {output.filt} \
    --exclude-filtered \
    -O {output.pass_only} 

    echo "SelectVariants 2 done." >> {log.stdout}
    echo "SelectVariants 2 done." >> {log.stderr}

    vt peek -r {params.ref_fasta} {output.pass_only} 2> {output.vt_peek_pass} 1>>{log.stdout}
    """
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
shell:
    """
    gatk --java-options "-Xms8g -Xmx{resources.mem_gb}g -Djava.io.tmpdir=./tmp" \
    BaseRecalibrator \
    -R {params.ref_fasta} \
    -I {input.bam} \
    --known-sites {input.known_snps_vcf} \
    --known-sites {input.known_indels_vcf} \
    -O {output.recal_table} 

    echo "BaseRecalibrator done." >> {log.stdout}
    echo "BaseRecalibrator done." >> {log.stderr}

    gatk --java-options "-Xms8g -Xmx{resources.mem_gb}g -Djava.io.tmpdir=./tmp" \
    ApplyBQSR \
    -R {params.ref_fasta} \
    -I {input.bam} \
    -bqsr {output.recal_table} \
    -O {output.recal_bam}

    echo "ApplyBQSR done." >> {log.stdout}
    echo "ApplyBQSR done." >> {log.stderr}

    """
554
555
556
557
558
shell:
    """
    qualimap bamqc -bam {input} --java-mem-size={resources.mem_gb}G --paint-chromosome-limits -outdir analysis/qualimap/{wildcards.sample} -nt {threads}

    """
585
586
587
588
589
590
591
592
593
594
595
shell:
    """
    # merge
    bcftools concat --threads {threads} --allow-overlaps -O z -o {output.merged_vcf} {input.vcfs}
    bcftools index --tbi --threads {threads} {output.merged_vcf}

    # annotate
    java -Xms8g -Xmx80g -Djava.io.tmpdir=./tmp -jar $SNPEFF/snpEff.jar {params.snpEff_dataDir} -v -stats {output.html} {params.snpEff_genome_id} {output.merged_vcf} | bcftools view --threads {threads} -O z -o {output.annot_vcf} -
    bcftools index --tbi --threads {threads} {output.annot_vcf}

    """
633
634
635
636
637
638
639
640
641
shell:
    """
    multiqc \
    --force \
    --outdir {params.workdir} \
    --filename {params.outfile} \
    {params.dirs} 

    """
ShowHide 8 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/vari-bbc/variants-gatk4
Name: variants-gatk4
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: GNU General Public License v3.0
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...