A snakemake workflow that calls variants from multi-sample illumina reads using Deepvariant and GLnexus

public public 1yr ago 0 bookmarks

Description:

  • A GPU-accelarated snakemake workflow that calls variants from multi-sample illumina reads using Deepvariant and GLnexus

Files to prepare:

  • A sample sheet - sample_sheet.csv: a comma delimited file with 3 columns (no column name):

    • sample, path to illumina read1, path to illumina read2
  • Modify configuration file - config.yaml:

    • reference: path to reference fasta file

    • sample_sheet: path to the sample sheet prepared above

    • outdir: path to the output directory

    • suffix: illumina reads' suffix of forward reads and reverse reads. For example:

      • test1_R1.fastq.gz and test1_R2.fastq.gz should be ["_R1","_R2"]
    • cpu: number of cores provided to the pipeline, should be the same as the command line parameter

    • w_size: non-overlapping window size of reporting average depth along the genome.

Environment:

  • Make sure snakemake is installed in current environment.

  • Docker is required.

  • Install docker image: nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1

  • Install docker image: google/deepvariant:1.4.0-gpu

Usage:

snakemake --cores [cpu] --use-conda

Code Snippets

11
12
13
14
shell:
    """
    fastqc --quiet --outdir {params.outdir} --noextract -f fastq {input} -t 2
    """
23
24
25
26
shell:
    """
    bamtools stats -in {input} | grep -v "*" > {output}
    """
37
38
39
40
shell:
    """
    bedtools makewindows -g {input} -w {params.w_size} > {output}
    """
50
51
52
53
54
55
shell:
    """
    bedtools coverage \
        -a {input.bed} -b {input.bam} \
        > {output}
    """
64
65
66
67
shell:
    """
    bcftools stats  {input.vcf} > {output.vcf_stat}
    """
78
79
80
81
82
83
84
shell:
    """
    plot-vcfstats \
        -p {params.outdir} \
        --no-PDF \
        {input.vcf_stat}
    """
 95
 96
 97
 98
 99
100
101
102
103
104
shell:
    """
    vcftools --gzvcf {input.vcf} --freq2 --out {params.outdir}/allele_frequency --max-alleles 2
    vcftools --gzvcf {input.vcf} --depth --out {params.outdir}/depth_per_indv
    vcftools --gzvcf {input.vcf} --site-mean-depth --out {params.outdir}/depth_per_site
    vcftools --gzvcf {input.vcf} --site-quality --out {params.outdir}/quality_per_site
    vcftools --gzvcf {input.vcf} --missing-indv --out {params.outdir}/missing_rate_per_indv
    vcftools --gzvcf {input.vcf} --missing-site --out {params.outdir}/missing_rate_per_site
    vcftools --gzvcf {input.vcf} --het --out {params.outdir}/heterozygosity
    """
119
120
121
122
123
124
shell:
    """
    multiqc \
    -o {params.output_dir} \
    {params.input_dir}
    """
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
shell:
    """
    fastp \
        --in1 {input.r1} \
        --out1 {output.r1} \
        --in2 {input.r2} \
        --out2 {output.r2} \
        --unpaired1 {output.r1_unpaired} \
        --unpaired2 {output.r2_unpaired} \
        --thread {threads} \
        --json {output.json_report} \
        --html {output.html_report} \
        > {log} \
        2> {log}
    """
6
7
8
9
shell:
    """
    cp {input.original_ref} {output.copied_ref}
    """
19
20
21
22
23
shell:
    """
    bwa index {input.ref}
    samtools faidx {input.ref}
    """
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
shell:
    """
    docker run \
        --gpus all \
        -w /workdir \
        --volume {params.ref_path}:/ref_dir \
        --volume {params.read_path}:/read_dir\
        --volume {params.tmp_path}:/outputdir \
        nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1 \
        pbrun fq2bam \
            --ref /ref_dir/ref.fasta \
            --in-fq /read_dir/{params.reads_dict[r1_name]} /read_dir/{params.reads_dict[r2_name]} \
            --out-bam /outputdir/{params.out_bam_name} \
        > {log} \
        2>{log}
    cp {params.tmp_path}/* {params.out_bam_path}
    """
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
shell:
    '''
    docker run \
        --gpus all \
        -w /workdir \
        --volume {params.ref_path}:/ref_dir \
        --volume {params.bam_path}:/bam_dir \
        --volume {params.tmp_path}:/outputdir \
        google/deepvariant:1.4.0-gpu \
        /opt/deepvariant/bin/run_deepvariant \
            --model_type=WGS \
            --ref=/ref_dir/ref.fasta \
            --reads=/bam_dir/{params.bam_name} \
            --sample_name={params.sample_name} \
            --output_vcf=/outputdir/{params.out_vcf_name} \
            --output_gvcf=/outputdir/{params.out_gvcf_name} \
            --num_shards={threads} \
        > {log} \
        2> {log}
    cp {params.tmp_path}/* {params.out_vcf_path}
    mv {params.out_vcf_path}/{params.out_report_name} {output.report}
    '''
57
58
59
60
61
62
63
64
65
66
67
68
69
70
shell:
    '''
    rm -rf ./GLnexus.DB
    [ -f $CONDA_PREFIX/lib/libjemalloc.so ] && export LD_PRELOAD=$CONDA_PREFIX/lib/libjemalloc.so
    glnexus_cli \
        --config DeepVariantWGS \
        --threads {threads} \
        {input.gvcf} | \
    bcftools view - | \
    bgzip -c \
    > {output.vcf} \
    2> {log}
    rm -rf ./GLnexus.DB
    '''
 8
 9
10
11
12
13
14
15
16
shell:
    """
    bcftools view \
        -m2 -M2 \
        -v snps \
        -O z \
        -o {output} \
        {input}
    """
27
28
29
30
31
32
shell:
    """
    echo "Job done!"
    echo "Use the following command to clean up temporary files (needs sudo):"
    echo "sudo rm -rf ../../experiment/variant_calling_snakemake/tmp/"
    """
SnakeMake From line 27 of main/Snakefile
ShowHide 6 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/ZexuanZhao/gpu-accelarated-variant-calling-pipeline
Name: gpu-accelarated-variant-calling-pipeline
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...