Germline Variant Calling Workflow for Non-Model Organisms

public public 1yr ago 0 bookmarks

germline variant calling

Working with non-model organism means you don't have known SNPs and prepared interval list in GATK bundle . Alternatively I did:

  • Optional Base Quality Score Calibration(BQSR) (working...)

  • Apply hard filters to call sets.

  • Split genome into scaffolds as intervals.

workflow

How to run:

  1. conda env create -f environment.yaml and install bamcov ...

  2. prepare metadata.tsv.

  3. modify config.yaml if needed.

  4. snakemake

To run on JCU HPC with PBSpro:

snakemake -p --cluster-config jcu_hpc.json --cluster "qsub -j oe -l walltime={cluster.time} -l select=1:ncpus={cluster.ncpus}:mem={cluster.mem}" --jobs 100 --latency-wait 5

Clean everything:

snakemake clean

Reference:

https://snakemake.readthedocs.io/en/stable/ https://github.com/gatk-workflows/gatk4-germline-snps-indels https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling https://zhuanlan.zhihu.com/p/33891718 https://software.broadinstitute.org/gatk/documentation/article?id=11097 https://gatkforums.broadinstitute.org/gatk/discussion/12443/genomicsdbimport-run-slowly-with-multiple-samples

Code Snippets

15
16
17
18
19
20
21
shell:
    """
    gatk SelectVariants \
    -V {input} \
    -select-type SNP \
    -O {output}
    """
29
30
31
32
33
34
35
shell:
    """
    gatk SelectVariants \
    -V {input} \
    -select-type INDEL \
    -O {output}
    """
42
43
44
45
46
47
48
49
50
51
52
53
54
shell:
    """
    gatk VariantFiltration \
    -V {input} \
    -filter "QD < 2.0" --filter-name "QD2" \
    -filter "QUAL < 30.0" --filter-name "QUAL30" \
    -filter "SOR > 3.0" --filter-name "SOR3" \
    -filter "FS > 60.0" --filter-name "FS60" \
    -filter "MQ < 40.0" --filter-name "MQ40" \
    -filter "MQRankSum < -12.5" --filter-name "MQRankSum-12.5" \
    -filter "ReadPosRankSum < -8.0" --filter-name "ReadPosRankSum-8" \
    -O {output}
    """
61
62
63
64
65
66
67
68
69
70
shell:
    """
    gatk VariantFiltration \
    -V {input} \
    -filter "QD < 2.0" --filter-name "QD2" \
    -filter "QUAL < 30.0" --filter-name "QUAL30" \
    -filter "FS > 200.0" --filter-name "FS200" \
    -filter "ReadPosRankSum < -20.0" --filter-name "ReadPosRankSum-20" \
    -O {output}
    """
51
52
shell:
    "rm -rf analysis data/ubam data/trimmed sample_map.txt logs"
55
56
shell:
    "rm -rf analysis/genomicsDB/*.db"
ShowHide 5 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/bakeronit/snakemake-gatk4-non-model
Name: snakemake-gatk4-non-model
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...