DiVA (DNA Variant Analysis) is a pipeline for Next-Generation Sequencing Exome data anlysis
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
This workflow performs mapping and variant calling following GATK Best Practices for Germline Variant Discovery. DiVA is part of the Snakemake-based pipelines collection solida-core developed and manteined at CRS4 .
Authors
-
Matteo Massidda (@massiddaMT)
-
Rossano Atzeni (@ratzeni)
Usage
The usage of this workflow is described in the Snakemake Workflow Catalog .
If you use this workflow in a paper, don't forget to give credits to the authors by citing the URL of this (original) repository and its DOI (see above).
INSTRUCTIONS
Create a virtual environment with the command:
mamba create -c bioconda -c conda-forge --name snakemake snakemake=6.15 snakedeploy
and activate it:
conda activate snakemake
We get some public data to test the pipeline. You can directly clone in this folder from github, just type:
git clone https://github.com/solida-core/test-data-DNA.git
You can then perform the pipeline deploy defining a directory
my_dest_dir
for analysis output and a pipeline tag for a specific version:
snakedeploy deploy-workflow https://github.com/solida-core/diva
my_desd_dir
--tag XXXX
To run the pipeline, go inside the deployed pipeline folder and use the command:
snakemake --use-conda -p --cores all
You can generate analysis report with the command:
snakemake --report report.zip --cores all
Code Snippets
19 20 21 22 23 24 25 26 27 28 29 30 | shell: "gatk HaplotypeCaller --java-options {params.custom} " "-R {params.genome} " "-I {input.bam} " "-O {output.gvcf} " "-ERC GVCF " "-L {params.intervals} " "-ip 200 " "-G StandardAnnotation " "--max-reads-per-alignment-start 0 " "--min-base-quality-score 20 " "--add-output-vcf-command-line false " |
13 14 15 16 17 18 | shell: "vcftools " "--gzvcf {input} " "--out {params.out_basename} " "--relatedness2 " ">& {log}" |
22 23 24 25 26 27 28 29 30 | shell: "mkdir -p {params.base_db} ; " "gatk GenomicsDBImport --java-options {params.custom} " "{params.gvcfs} " "--genomicsdb-workspace-path {params.db} " "-L {params.intervals} " "-ip 200 " "--merge-input-intervals " ">& {log} " |
49 50 51 52 53 | shell: "gatk GenotypeGVCFs --java-options {params.custom} " "-R {params.genome} " "-V gendb://{params.db} " "-G StandardAnnotation " |
43 44 45 46 47 48 49 50 51 52 | shell: "gatk VariantRecalibrator --java-options {params.custom} " "-R {params.genome} " "-V {input.vcf} " "{params.recal} " "-tranche 100.0 -tranche 99.9 -tranche 99.0 -tranche 90.0 " "--output {output.recal} " "--tranches-file {output.tranches} " "--rscript-file {output.plotting} " ">& {log}" |
72 73 74 75 76 77 78 | shell: "gatk ApplyVQSR --java-options {params.custom} " "-R {params.genome} " "-V {input.vcf} -mode {params.mode} " "--recal-file {input.recal} -ts-filter-level 99.0 " "--tranches-file {input.tranches} -O {output} " ">& {log}" |