Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
germline variant calling
Working with non-model organism means you don't have known SNPs and prepared interval list in GATK bundle . Alternatively I did:
-
Optional Base Quality Score Calibration(BQSR) (working...)
-
Apply hard filters to call sets.
-
Split genome into scaffolds as intervals.
How to run:
-
conda env create -f environment.yaml
and install bamcov ... -
prepare metadata.tsv.
-
modify config.yaml if needed.
-
snakemake
To run on JCU HPC with PBSpro:
snakemake -p --cluster-config jcu_hpc.json --cluster "qsub -j oe -l walltime={cluster.time} -l select=1:ncpus={cluster.ncpus}:mem={cluster.mem}" --jobs 100 --latency-wait 5
Clean everything:
snakemake clean
Reference:
https://snakemake.readthedocs.io/en/stable/ https://github.com/gatk-workflows/gatk4-germline-snps-indels https://github.com/snakemake-workflows/dna-seq-gatk-variant-calling https://zhuanlan.zhihu.com/p/33891718 https://software.broadinstitute.org/gatk/documentation/article?id=11097 https://gatkforums.broadinstitute.org/gatk/discussion/12443/genomicsdbimport-run-slowly-with-multiple-samples
Code Snippets
15 16 17 18 19 20 21 | shell: """ gatk SelectVariants \ -V {input} \ -select-type SNP \ -O {output} """ |
29 30 31 32 33 34 35 | shell: """ gatk SelectVariants \ -V {input} \ -select-type INDEL \ -O {output} """ |
42 43 44 45 46 47 48 49 50 51 52 53 54 | shell: """ gatk VariantFiltration \ -V {input} \ -filter "QD < 2.0" --filter-name "QD2" \ -filter "QUAL < 30.0" --filter-name "QUAL30" \ -filter "SOR > 3.0" --filter-name "SOR3" \ -filter "FS > 60.0" --filter-name "FS60" \ -filter "MQ < 40.0" --filter-name "MQ40" \ -filter "MQRankSum < -12.5" --filter-name "MQRankSum-12.5" \ -filter "ReadPosRankSum < -8.0" --filter-name "ReadPosRankSum-8" \ -O {output} """ |
61 62 63 64 65 66 67 68 69 70 | shell: """ gatk VariantFiltration \ -V {input} \ -filter "QD < 2.0" --filter-name "QD2" \ -filter "QUAL < 30.0" --filter-name "QUAL30" \ -filter "FS > 200.0" --filter-name "FS200" \ -filter "ReadPosRankSum < -20.0" --filter-name "ReadPosRankSum-20" \ -O {output} """ |
51 52 | shell: "rm -rf analysis data/ubam data/trimmed sample_map.txt logs" |
55 56 | shell: "rm -rf analysis/genomicsDB/*.db" |
Support
- Future updates
Related Workflows





