A snakemake workflow that calls variants from multi-sample illumina reads using Deepvariant and GLnexus
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Description:
- A GPU-accelarated snakemake workflow that calls variants from multi-sample illumina reads using Deepvariant and GLnexus
Files to prepare:
-
A sample sheet - sample_sheet.csv: a comma delimited file with 3 columns (no column name):
- sample, path to illumina read1, path to illumina read2
-
Modify configuration file - config.yaml:
-
reference: path to reference fasta file
-
sample_sheet: path to the sample sheet prepared above
-
outdir: path to the output directory
-
suffix: illumina reads' suffix of forward reads and reverse reads. For example:
-
test1_R1.fastq.gz
andtest1_R2.fastq.gz
should be ["_R1","_R2"]
-
-
cpu: number of cores provided to the pipeline, should be the same as the command line parameter
-
w_size: non-overlapping window size of reporting average depth along the genome.
-
Environment:
-
Make sure snakemake is installed in current environment.
-
Docker is required.
-
Install docker image:
nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1
-
Install docker image:
google/deepvariant:1.4.0-gpu
Usage:
snakemake --cores [cpu] --use-conda
Code Snippets
11 12 13 14 | shell: """ fastqc --quiet --outdir {params.outdir} --noextract -f fastq {input} -t 2 """ |
23 24 25 26 | shell: """ bamtools stats -in {input} | grep -v "*" > {output} """ |
37 38 39 40 | shell: """ bedtools makewindows -g {input} -w {params.w_size} > {output} """ |
50 51 52 53 54 55 | shell: """ bedtools coverage \ -a {input.bed} -b {input.bam} \ > {output} """ |
64 65 66 67 | shell: """ bcftools stats {input.vcf} > {output.vcf_stat} """ |
78 79 80 81 82 83 84 | shell: """ plot-vcfstats \ -p {params.outdir} \ --no-PDF \ {input.vcf_stat} """ |
95 96 97 98 99 100 101 102 103 104 | shell: """ vcftools --gzvcf {input.vcf} --freq2 --out {params.outdir}/allele_frequency --max-alleles 2 vcftools --gzvcf {input.vcf} --depth --out {params.outdir}/depth_per_indv vcftools --gzvcf {input.vcf} --site-mean-depth --out {params.outdir}/depth_per_site vcftools --gzvcf {input.vcf} --site-quality --out {params.outdir}/quality_per_site vcftools --gzvcf {input.vcf} --missing-indv --out {params.outdir}/missing_rate_per_indv vcftools --gzvcf {input.vcf} --missing-site --out {params.outdir}/missing_rate_per_site vcftools --gzvcf {input.vcf} --het --out {params.outdir}/heterozygosity """ |
119 120 121 122 123 124 | shell: """ multiqc \ -o {params.output_dir} \ {params.input_dir} """ |
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | shell: """ fastp \ --in1 {input.r1} \ --out1 {output.r1} \ --in2 {input.r2} \ --out2 {output.r2} \ --unpaired1 {output.r1_unpaired} \ --unpaired2 {output.r2_unpaired} \ --thread {threads} \ --json {output.json_report} \ --html {output.html_report} \ > {log} \ 2> {log} """ |
6 7 8 9 | shell: """ cp {input.original_ref} {output.copied_ref} """ |
19 20 21 22 23 | shell: """ bwa index {input.ref} samtools faidx {input.ref} """ |
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | shell: """ docker run \ --gpus all \ -w /workdir \ --volume {params.ref_path}:/ref_dir \ --volume {params.read_path}:/read_dir\ --volume {params.tmp_path}:/outputdir \ nvcr.io/nvidia/clara/clara-parabricks:4.0.0-1 \ pbrun fq2bam \ --ref /ref_dir/ref.fasta \ --in-fq /read_dir/{params.reads_dict[r1_name]} /read_dir/{params.reads_dict[r2_name]} \ --out-bam /outputdir/{params.out_bam_name} \ > {log} \ 2>{log} cp {params.tmp_path}/* {params.out_bam_path} """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | shell: ''' docker run \ --gpus all \ -w /workdir \ --volume {params.ref_path}:/ref_dir \ --volume {params.bam_path}:/bam_dir \ --volume {params.tmp_path}:/outputdir \ google/deepvariant:1.4.0-gpu \ /opt/deepvariant/bin/run_deepvariant \ --model_type=WGS \ --ref=/ref_dir/ref.fasta \ --reads=/bam_dir/{params.bam_name} \ --sample_name={params.sample_name} \ --output_vcf=/outputdir/{params.out_vcf_name} \ --output_gvcf=/outputdir/{params.out_gvcf_name} \ --num_shards={threads} \ > {log} \ 2> {log} cp {params.tmp_path}/* {params.out_vcf_path} mv {params.out_vcf_path}/{params.out_report_name} {output.report} ''' |
57 58 59 60 61 62 63 64 65 66 67 68 69 70 | shell: ''' rm -rf ./GLnexus.DB [ -f $CONDA_PREFIX/lib/libjemalloc.so ] && export LD_PRELOAD=$CONDA_PREFIX/lib/libjemalloc.so glnexus_cli \ --config DeepVariantWGS \ --threads {threads} \ {input.gvcf} | \ bcftools view - | \ bgzip -c \ > {output.vcf} \ 2> {log} rm -rf ./GLnexus.DB ''' |
8 9 10 11 12 13 14 15 16 | shell: """ bcftools view \ -m2 -M2 \ -v snps \ -O z \ -o {output} \ {input} """ |
27 28 29 30 31 32 | shell: """ echo "Job done!" echo "Use the following command to clean up temporary files (needs sudo):" echo "sudo rm -rf ../../experiment/variant_calling_snakemake/tmp/" """ |
Support
- Future updates
Related Workflows





