Deep Variant as a Nextflow pipeline

public public 1yr ago Version: 1.0 0 bookmarks

deepvariant

Deep Variant as a Nextflow pipeline

A Nextflow pipeline for running the Google DeepVariant variant caller .

What is DeepVariant and why in Nextflow?

The Google Brain Team in December 2017 released a Variant Caller based on DeepLearning: DeepVariant.

In practice, DeepVariant first builds images based on the BAM file, then it uses a DeepLearning image recognition approach to obtain the variants and eventually it converts the output of the prediction in the standard VCF format.

DeepVariant as a Nextflow pipeline provides several advantages to the users. It handles automatically, through preprocessing steps , the creation of some extra needed indexed and compressed files which are a necessary input for DeepVariant, and which should normally manually be produced by the users. Variant Calling can be performed at the same time on multiple BAM files and thanks to the internal parallelization of Nextflow no resources are wasted. Nextflow's support of Docker allows to produce the results in a computational reproducible and clean way by running every step inside of a Docker container .

For more detailed information about Google's DeepVariant please refer to google/deepvariant or this blog post .
For more information about DeepVariant in Nextflow please refer to this blog post

Quick Start

Warning DeepVariant can be very computationally intensive to run.

To test the pipeline you can run:

nextflow run nf-core/deepvariant -profile test,docker

A typical run on whole genome data looks like this:

nextflow run nf-core/deepvariant --genome hg19 --bam yourBamFile --bed yourBedFile -profile standard,docker

In this case variants are called on the bam files contained in the testdata directory. The hg19 version of the reference genome is used. One vcf files is produced and can be found in the folder "results"

A typical run on whole exome data looks like this:

nextflow run nf-core/deepvariant --exome --genome hg19 --bam_folder myBamFolder --bed myBedFile -profile standard,docker

Documentation

The nf-core/deepvariant documentation is split into the following files:

  1. Installation

  2. Running the pipeline

  3. Pipeline configuration

  4. Output and how to interpret the results

  5. Troubleshooting

  6. More about DeepVariant

More about the pipeline

As shown in the following picture, the worklow both contains preprocessing steps ( light blue ones ) and proper variant calling steps ( darker blue ones ).

Some input files ar optional and if not given, they will be automatically created for the user during the preprocessing steps. If these are given, the preprocessing steps are skipped. For more information about preprocessing, please refer to the "INPUT PARAMETERS" section.

The worklow accepts one reference genome and multiple BAM files as input . The variant calling for the several input BAM files will be processed completely indipendently and will produce indipendent VCF result files. The advantage of this approach is that the variant calling of the different BAM files can be parallelized internally by Nextflow and take advantage of all the cores of the machine in order to get the results at the fastest.

Credits

This pipeline was originally developed at Lifebit , by @luisas, to ease and reduce cost for variant calling analyses

Many thanks to nf-core and those who have helped out along the way too, including (but not limited to): @ewels, @MaxUlysse, @apeltzer, @sven1103 & @pditommaso

Code Snippets

239
240
241
"""
samtools faidx $fasta
"""
257
258
259
"""
bgzip -c ${fasta} > ${fasta}.gz
"""
NextFlow From line 257 of master/main.nf
276
277
278
"""
samtools faidx $fastagz
"""
294
295
296
"""
bgzip -c -i ${fasta} > ${fasta}.gz
"""
NextFlow From line 294 of master/main.nf
317
318
319
320
321
322
323
324
325
326
327
328
"""
mkdir ready
[[ `samtools view -H ${bam} | grep '@RG' | wc -l`   > 0 ]] && { mv $bam ready;}|| { picard AddOrReplaceReadGroups \
I=${bam} \
O=ready/${bam} \
RGID=${params.rgid} \
RGLB=${params.rglb} \
RGPL=${params.rgpl} \
RGPU=${params.rgpu} \
RGSM=${params.rgsm};}
cd ready ;samtools index ${bam};
"""
354
355
356
357
358
359
360
361
362
363
364
365
"""
mkdir logs
mkdir ${bam.baseName}_shardedExamples
dv_make_examples.py \
--cores ${task.cpus} \
--sample ${bam} \
--ref ${fastagz} \
--reads ${bam} \
--regions ${bed} \
--logdir logs \
--examples ${bam.baseName}_shardedExamples
"""
NextFlow From line 354 of master/main.nf
383
384
385
386
387
388
389
390
"""
dv_call_variants.py \
  --cores ${task.cpus} \
  --sample ${bam} \
  --outfile ${bam.baseName}_call_variants_output.tfrecord \
  --examples $shardedExamples \
  --model ${model}
"""
NextFlow From line 383 of master/main.nf
416
417
418
419
420
421
"""
dv_postprocess_variants.py \
--ref ${fastagz} \
--infile call_variants_output.tfrecord \
--outfile "${bam}.vcf"
"""
NextFlow From line 416 of master/main.nf
433
434
435
436
437
438
439
440
441
442
443
"""
echo $workflow.manifest.version &> v_nf_deepvariant.txt
echo $workflow.nextflow.version &> v_nextflow.txt
ls /opt/conda/pkgs/ &> v_deepvariant.txt
python --version &> v_python.txt
pip --version &> v_pip.txt
samtools --version &> v_samtools.txt
lbzip2 --version &> v_lbzip2.txt
bzip2 --version &> v_bzip2.txt
scrape_software_versions.py &> software_versions_mqc.yaml
"""
ShowHide 7 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://nf-co.re/deepvariant
Name: deepvariant
Version: 1.0
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...