DiVA WGS: Next-Generation Sequencing Whole Genome Data Analysis Pipeline

public 1yr ago 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

DiVA WGS is a pipeline for Next-Generation Sequencing Whole Genome data anlysis.

All solida-core workflows follow GATK Best Practices for Germline Variant Discovery, with the incorporation of further improvements and refinements after their testing with real data in various CRS4 Next Generation Sequencing Core Facility research sequencing projects.

Pipelines are based on Snakemake , a workflow management system that provides all the features needed to create reproducible and scalable data analyses.

Software dependencies are specified into the environment.yaml file and directly managed by Snakemake using Conda , ensuring the reproducibility of the workflow on a great number of different computing environments such as workstations, clusters and cloud environments.

Pipeline Overview

The pipeline workflow is composed by two major analysis sections:

Mapping : single and/or paired-end reads in fastq format are aligned against a reference genome to produce a deduplicated and recalibrated BAM file. This section is executed by DiMA pipeline.
Variant Calling : a joint call is performed from all project's bam files

Parallely, statistics collected during these steps are used to generate reports for Quality Control .

A complete view of the analysis workflow is provided by the pipeline's graph .

Pipeline Handbook

DiVA WGS pipeline documentation can be found in the docs/ directory:

Pipeline Structure:
- Snakefile
- Configfile
- Rules
- Envs
Pipeline Workflow
Required Files:
- Reference files
- User files
Running the pipeline:
- Manual Snakemake Usage
- SOLIDA:
  - CLI - Command Line Interface
  - GUI - Graphical User Interface

Contact us

support@solida-core

Code Snippets

shell:
    "gatk SplitIntervals --java-options {params.custom} "
    "-R {params.genome} "
    "-L {params.intervals} "
    "-mode {params.mode} "
    "--scatter-count {params.scattercount} "
    "-O split "
    ">& {log} "

SnakeMake gatk From line 16 of rules/call_variants.smk

shell:
    "gatk HaplotypeCaller --java-options {params.custom} "
    "-R {params.genome} "
    "-I {input.cram} "
    "-O {output.gvcf} "
    "-ERC GVCF "
    "-G StandardAnnotation "
    "-L split/{wildcards.interval}-scattered.interval_list "
    ">& {log} "

SnakeMake gatk From line 41 of rules/call_variants.smk

shell:
    "cp {input.bam} {output.bam} && "
    "cp {input.bai} {output.bai} "

SnakeMake From line 24 of rules/delivery.smk

shell:
    "mkdir -p db; "
    "gatk GenomicsDBImport --java-options {params.custom} "
    "{params.gvcfs} "
    "--genomicsdb-workspace-path db/{wildcards.interval} "
    "-L split/{wildcards.interval}-scattered.interval_list "
    ">& {log} "

SnakeMake gatk From line 21 of rules/joint_call.smk

shell:
    "gatk GenotypeGVCFs --java-options {params.custom} "
    "-R {params.genome} "
    "-V gendb://db/{wildcards.interval} "
    "-G StandardAnnotation "
    "-O {output} "
    ">& {log} "

SnakeMake gatk From line 43 of rules/joint_call.smk

shell:
    "picard {params.custom} CollectInsertSizeMetrics "
    "I={input.bam} "
    "O={output.metrics} "
    "H={output.histogram} "
    "&> {log} "

SnakeMake Picard From line 16 of rules/picard_stats.smk

shell:
    "picard {params.custom} CollectWgsMetrics "
    "{params.arguments} "
    "I={input.bam} "
    "O={output.metrics} "
    "R={params.genome} "
    "&> {log} "

SnakeMake Picard From line 38 of rules/picard_stats.smk

shell:
    "multiqc "
    "{input} "
    "{params.fastqc} "
    "{params.trimming} "
    "{params.params} "
    "-o {params.outdir} "
    "-n {params.outname} "
    "--sample-names {params.reheader} "
    ">& {log}"

SnakeMake MultiQC From line 24 of rules/qc.smk

shell:
     "bcftools concat -a {input.vcfs} | bgzip -cf > {output};"
     "tabix -p vcf {output}"

SnakeMake BCFtools tabix From line 13 of rules/vqsr.smk

shell:
    "gatk VariantRecalibrator --java-options {params.custom} "
    "-R {params.genome} "
    "-V {input.vcf} "
    "{params.recal} "
    "-tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 "
    "--output {output.recal} "
    "--tranches-file {output.tranches} "
    "--rscript-file {output.plotting} "
    ">& {log}"

SnakeMake gatk From line 56 of rules/vqsr.smk

shell:
    "gatk  ApplyVQSR --java-options {params.custom} "
    "-R {params.genome} "
    "-V {input.vcf} -mode {params.mode} "
    "--recal-file {input.recal} -ts-filter-level 99.0 "
    "--tranches-file {input.tranches} -O {output} "
    ">& {log}"