Clinical Genomics Uppsala inheritance disease pipeline for WGS
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Clinical Genomics Uppsala inheritance disease pipeline for WGS made as a snakemake workflow.
The pipeline will be build one step at a time with step 1 and 2 being the most crucial. Where possible, hydra-genetics modules (https://github.com/hydra-genetics) will be used. Part of pipeline will not be in hydra-genetics from the beginning but will be changed into modules when there is time.
Steg 1: SNV and indel analysis
-
GATK best practices to get analysis ready bam
-
deepVariant (+ GLNexus?) for calling
-
kinship and sex-check with peddy (maybe have an easy this many reads tells this story too, can find XXY and homozygote females)
-
coverage for gene panels
Steg 2: CNV, and other SV: inversions, deletion and duplications for Moon
-
manta
-
CNVnator
-
When these work and other parts of the pipeline it is possible to continue buildning this part. What is good right now? (Tiddit, CNVkit, delly, others?)
-
Combine the results from different callers: SVdb to one vcf-file
- SVdb will help remove false positives?
-
Region Of Homozygosity and UniParental Disomy
- AutoMap (https://github.com/mquinodo/AutoMap) and https://github.com/bjhall/upd
Steg 3: SMA
-
SMNCopyNumberCaller (https://github.com/Illumina/SMNCopyNumberCaller, https://www.nature.com/articles/s41436-020-0754-0?proof=t)
-
SMNca (https://onlinelibrary.wiley.com/doi/full/10.1002/humu.24120)
-
other ways to handle SMN1 och SMN2?
Steg 4: Repeat expansions
-
ExpansionHunter
-
if annotation is needed: STRanger
-
histogram with size distribution per sample
- REViewer? Illumina
-
Fragile X
Steg 5: Mitochondria
- heteroplasmy (sensitivity)
Steg 6: RNA
#Software or thoughts for future
-
Telomerecat is a tool for estimating the average telomere length (TL) for a paired end, whole genome sequencing (WGS) sample (Panos kanske är intresserad av svaret)
-
Cyrius for good call of CYP2D6
-
What data is needed more than vcf? QC and figures.
Code Snippets
19 20 21 22 23 24 25 26 27 28 29 30 | shell: "pbrun deepvariant_germline \ --ref {input.ref} \ --in-fq {input.reads} \ --out-bam {output.bam} \ --gvcf --out-variants {output.vcf} \ --num-gpus {params.n} \ --tmp-dir {params.dir} \ --read-group-sm {wildcards.sample} \ --read-group-lb illumina \ --read-group-pl {params.date}_deepvariant_germline \ --read-group-id-prefix {wildcards.sample} &> {log}" |
10 11 | shell: "vcftools --gzvcf {input} --remove-filtered \".\" --recode --recode-INFO-all --out {wildcards.sample} &> {log}" |
26 27 | shell: "( python {params}/scripts/ref_vcf.py {input.vcf} {input.ref} {output} ) &> {log}" |
37 38 | shell: """( awk '{{gsub(/chrM/,"chrMT"); print}}' {input} > {output} ) &> {log}""" |
51 52 | shell: "( bgzip {input} && tabix {input}.gz ) &> {log}" |
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | import sys from pysam import VariantFile vcf_in = VariantFile(sys.argv[1]) # dosen't matter if bgziped or not. Automatically recognizes # Add reference_description descriptions to new header new_header = vcf_in.header #new_header.add_line("reference="+ sys.argv[2]) new_header.add_line("##reference=" + sys.argv[2]) # start new vcf with the new_header vcf_out = VariantFile(sys.argv[3], 'w', header=new_header) for record in vcf_in.fetch(): vcf_out.write(record) |
Support
- Future updates
Related Workflows





