DiVA WGS: Next-Generation Sequencing Whole Genome Data Analysis Pipeline

public public 1yr ago 0 bookmarks

DiVA WGS is a pipeline for Next-Generation Sequencing Whole Genome data anlysis.

All solida-core workflows follow GATK Best Practices for Germline Variant Discovery, with the incorporation of further improvements and refinements after their testing with real data in various CRS4 Next Generation Sequencing Core Facility research sequencing projects.

Pipelines are based on Snakemake , a workflow management system that provides all the features needed to create reproducible and scalable data analyses.

Software dependencies are specified into the environment.yaml file and directly managed by Snakemake using Conda , ensuring the reproducibility of the workflow on a great number of different computing environments such as workstations, clusters and cloud environments.

Pipeline Overview

The pipeline workflow is composed by two major analysis sections:

  • Mapping : single and/or paired-end reads in fastq format are aligned against a reference genome to produce a deduplicated and recalibrated BAM file. This section is executed by DiMA pipeline.

  • Variant Calling : a joint call is performed from all project's bam files

Parallely, statistics collected during these steps are used to generate reports for Quality Control .

A complete view of the analysis workflow is provided by the pipeline's graph .

Pipeline Handbook

DiVA WGS pipeline documentation can be found in the docs/ directory:

  1. Pipeline Structure:

  2. Pipeline Workflow

  3. Required Files:

  4. Running the pipeline:

Contact us

support@solida-core

Code Snippets

16
17
18
19
20
21
22
23
shell:
    "gatk SplitIntervals --java-options {params.custom} "
    "-R {params.genome} "
    "-L {params.intervals} "
    "-mode {params.mode} "
    "--scatter-count {params.scattercount} "
    "-O split "
    ">& {log} "
41
42
43
44
45
46
47
48
49
shell:
    "gatk HaplotypeCaller --java-options {params.custom} "
    "-R {params.genome} "
    "-I {input.cram} "
    "-O {output.gvcf} "
    "-ERC GVCF "
    "-G StandardAnnotation "
    "-L split/{wildcards.interval}-scattered.interval_list "
    ">& {log} "
24
25
26
shell:
    "cp {input.bam} {output.bam} && "
    "cp {input.bai} {output.bai} "
21
22
23
24
25
26
27
shell:
    "mkdir -p db; "
    "gatk GenomicsDBImport --java-options {params.custom} "
    "{params.gvcfs} "
    "--genomicsdb-workspace-path db/{wildcards.interval} "
    "-L split/{wildcards.interval}-scattered.interval_list "
    ">& {log} "
43
44
45
46
47
48
49
shell:
    "gatk GenotypeGVCFs --java-options {params.custom} "
    "-R {params.genome} "
    "-V gendb://db/{wildcards.interval} "
    "-G StandardAnnotation "
    "-O {output} "
    ">& {log} "
16
17
18
19
20
21
shell:
    "picard {params.custom} CollectInsertSizeMetrics "
    "I={input.bam} "
    "O={output.metrics} "
    "H={output.histogram} "
    "&> {log} "
38
39
40
41
42
43
44
shell:
    "picard {params.custom} CollectWgsMetrics "
    "{params.arguments} "
    "I={input.bam} "
    "O={output.metrics} "
    "R={params.genome} "
    "&> {log} "
24
25
26
27
28
29
30
31
32
33
shell:
    "multiqc "
    "{input} "
    "{params.fastqc} "
    "{params.trimming} "
    "{params.params} "
    "-o {params.outdir} "
    "-n {params.outname} "
    "--sample-names {params.reheader} "
    ">& {log}"
13
14
15
shell:
     "bcftools concat -a {input.vcfs} | bgzip -cf > {output};"
     "tabix -p vcf {output}"
56
57
58
59
60
61
62
63
64
65
shell:
    "gatk VariantRecalibrator --java-options {params.custom} "
    "-R {params.genome} "
    "-V {input.vcf} "
    "{params.recal} "
    "-tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 "
    "--output {output.recal} "
    "--tranches-file {output.tranches} "
    "--rscript-file {output.plotting} "
    ">& {log}"
85
86
87
88
89
90
91
shell:
    "gatk  ApplyVQSR --java-options {params.custom} "
    "-R {params.genome} "
    "-V {input.vcf} -mode {params.mode} "
    "--recal-file {input.recal} -ts-filter-level 99.0 "
    "--tranches-file {input.tranches} -O {output} "
    ">& {log}"
ShowHide 7 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/solida-core/diva.wgs
Name: diva-wgs
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: GNU General Public License v3.0
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...