A snakemake workflow that calls variants from multi-sample illumina reads using Deepvariant and GLnexus

public 1yr ago 0 bookmarks

View Workflow

gpu-accelarated-variant-calling-pipeline — View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Description:

A GPU-accelarated snakemake workflow that calls variants from multi-sample illumina reads using Deepvariant and GLnexus

Files to prepare:

A sample sheet - sample_sheet.csv: a comma delimited file with 3 columns (no column name):
- sample, path to illumina read1, path to illumina read2
Modify configuration file - config.yaml:
- reference: path to reference fasta file
- sample_sheet: path to the sample sheet prepared above
- outdir: path to the output directory
- suffix: illumina reads' suffix of forward reads and reverse reads. For example:
  - test1_R1.fastq.gz and test1_R2.fastq.gz should be ["_R1","_R2"]
- cpu: number of cores provided to the pipeline, should be the same as the command line parameter
- w_size: non-overlapping window size of reporting average depth along the genome.

Environment:

Make sure snakemake is installed in current environment.
Docker is required.
Install docker image: nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1
Install docker image: google/deepvariant:1.4.0-gpu

Usage:

snakemake --cores [cpu] --use-conda

Code Snippets

shell:
    """
    fastqc --quiet --outdir {params.outdir} --noextract -f fastq {input} -t 2
    """

SnakeMake FastQC From line 11 of rules/0.qc.smk

shell:
    """
    bamtools stats -in {input} | grep -v "*" > {output}
    """

SnakeMake BamTools From line 23 of rules/0.qc.smk

shell:
    """
    bedtools makewindows -g {input} -w {params.w_size} > {output}
    """

SnakeMake BEDTools From line 37 of rules/0.qc.smk

shell:
    """
    bedtools coverage \
        -a {input.bed} -b {input.bam} \
        > {output}
    """

SnakeMake BEDTools From line 50 of rules/0.qc.smk

shell:
    """
    bcftools stats  {input.vcf} > {output.vcf_stat}
    """

SnakeMake BCFtools From line 64 of rules/0.qc.smk

shell:
    """
    plot-vcfstats \
        -p {params.outdir} \
        --no-PDF \
        {input.vcf_stat}
    """

SnakeMake vcfstats From line 78 of rules/0.qc.smk

shell:
    """
    vcftools --gzvcf {input.vcf} --freq2 --out {params.outdir}/allele_frequency --max-alleles 2
    vcftools --gzvcf {input.vcf} --depth --out {params.outdir}/depth_per_indv
    vcftools --gzvcf {input.vcf} --site-mean-depth --out {params.outdir}/depth_per_site
    vcftools --gzvcf {input.vcf} --site-quality --out {params.outdir}/quality_per_site
    vcftools --gzvcf {input.vcf} --missing-indv --out {params.outdir}/missing_rate_per_indv
    vcftools --gzvcf {input.vcf} --missing-site --out {params.outdir}/missing_rate_per_site
    vcftools --gzvcf {input.vcf} --het --out {params.outdir}/heterozygosity
    """

SnakeMake VCFtools From line 95 of rules/0.qc.smk

shell:
    """
    multiqc \
    -o {params.output_dir} \
    {params.input_dir}
    """

SnakeMake MultiQC From line 119 of rules/0.qc.smk

shell:
    """
    fastp \
        --in1 {input.r1} \
        --out1 {output.r1} \
        --in2 {input.r2} \
        --out2 {output.r2} \
        --unpaired1 {output.r1_unpaired} \
        --unpaired2 {output.r2_unpaired} \
        --thread {threads} \
        --json {output.json_report} \
        --html {output.html_report} \
        > {log} \
        2> {log}
    """

SnakeMake fastp From line 17 of rules/1.preprocessing.smk

shell:
    """
    cp {input.original_ref} {output.copied_ref}
    """

SnakeMake From line 6 of rules/2.mapping.smk

shell:
    """
    bwa index {input.ref}
    samtools faidx {input.ref}
    """

SnakeMake SAMtools BWA From line 19 of rules/2.mapping.smk

shell:
    """
    docker run \
        --gpus all \
        -w /workdir \
        --volume {params.ref_path}:/ref_dir \
        --volume {params.read_path}:/read_dir\
        --volume {params.tmp_path}:/outputdir \
        nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1 \
        pbrun fq2bam \
            --ref /ref_dir/ref.fasta \
            --in-fq /read_dir/{params.reads_dict[r1_name]} /read_dir/{params.reads_dict[r2_name]} \
            --out-bam /outputdir/{params.out_bam_name} \
        > {log} \
        2>{log}
    cp {params.tmp_path}/* {params.out_bam_path}
    """

SnakeMake From line 42 of rules/2.mapping.smk

shell:
    '''
    docker run \
        --gpus all \
        -w /workdir \
        --volume {params.ref_path}:/ref_dir \
        --volume {params.bam_path}:/bam_dir \
        --volume {params.tmp_path}:/outputdir \
        google/deepvariant:1.4.0-gpu \
        /opt/deepvariant/bin/run_deepvariant \
            --model_type=WGS \
            --ref=/ref_dir/ref.fasta \
            --reads=/bam_dir/{params.bam_name} \
            --sample_name={params.sample_name} \
            --output_vcf=/outputdir/{params.out_vcf_name} \
            --output_gvcf=/outputdir/{params.out_gvcf_name} \
            --num_shards={threads} \
        > {log} \
        2> {log}
    cp {params.tmp_path}/* {params.out_vcf_path}
    mv {params.out_vcf_path}/{params.out_report_name} {output.report}
    '''

SnakeMake From line 23 of rules/3.variant_calling.smk

shell:
    '''
    rm -rf ./GLnexus.DB
    [ -f $CONDA_PREFIX/lib/libjemalloc.so ] && export LD_PRELOAD=$CONDA_PREFIX/lib/libjemalloc.so
    glnexus_cli \
        --config DeepVariantWGS \
        --threads {threads} \
        {input.gvcf} | \
    bcftools view - | \
    bgzip -c \
    > {output.vcf} \
    2> {log}
    rm -rf ./GLnexus.DB
    '''

SnakeMake BCFtools GLnexus From line 57 of rules/3.variant_calling.smk

shell:
    """
    bcftools view \
        -m2 -M2 \
        -v snps \
        -O z \
        -o {output} \
        {input}
    """

SnakeMake BCFtools From line 8 of rules/4.vcf_filtering.smk

shell:
    """
    echo "Job done!"
    echo "Use the following command to clean up temporary files (needs sudo):"
    echo "sudo rm -rf ../../experiment/variant_calling_snakemake/tmp/"
    """