Clinical Genomics Uppsala inheritance disease pipeline for WGS

public public 1yr ago 0 bookmarks

Clinical Genomics Uppsala inheritance disease pipeline for WGS made as a snakemake workflow.

The pipeline will be build one step at a time with step 1 and 2 being the most crucial. Where possible, hydra-genetics modules (https://github.com/hydra-genetics) will be used. Part of pipeline will not be in hydra-genetics from the beginning but will be changed into modules when there is time.

Steg 1: SNV and indel analysis

  • GATK best practices to get analysis ready bam

  • deepVariant (+ GLNexus?) for calling

  • kinship and sex-check with peddy (maybe have an easy this many reads tells this story too, can find XXY and homozygote females)

  • coverage for gene panels

Steg 2: CNV, and other SV: inversions, deletion and duplications for Moon

  • manta

  • CNVnator

  • When these work and other parts of the pipeline it is possible to continue buildning this part. What is good right now? (Tiddit, CNVkit, delly, others?)

  • Combine the results from different callers: SVdb to one vcf-file

    • SVdb will help remove false positives?
  • Region Of Homozygosity and UniParental Disomy

    • AutoMap (https://github.com/mquinodo/AutoMap) and https://github.com/bjhall/upd

Steg 3: SMA

  • SMNCopyNumberCaller (https://github.com/Illumina/SMNCopyNumberCaller, https://www.nature.com/articles/s41436-020-0754-0?proof=t)

  • SMNca (https://onlinelibrary.wiley.com/doi/full/10.1002/humu.24120)

  • other ways to handle SMN1 och SMN2?

Steg 4: Repeat expansions

  • ExpansionHunter

  • if annotation is needed: STRanger

  • histogram with size distribution per sample

    • REViewer? Illumina
  • Fragile X

Steg 5: Mitochondria

  • heteroplasmy (sensitivity)

Steg 6: RNA

#Software or thoughts for future

  • Telomerecat is a tool for estimating the average telomere length (TL) for a paired end, whole genome sequencing (WGS) sample (Panos kanske är intresserad av svaret)

  • Cyrius for good call of CYP2D6

  • What data is needed more than vcf? QC and figures.

Code Snippets

19
20
21
22
23
24
25
26
27
28
29
30
shell:
    "pbrun deepvariant_germline \
    --ref {input.ref} \
    --in-fq {input.reads} \
    --out-bam {output.bam} \
    --gvcf --out-variants {output.vcf} \
    --num-gpus {params.n} \
    --tmp-dir {params.dir} \
    --read-group-sm {wildcards.sample} \
    --read-group-lb illumina \
    --read-group-pl {params.date}_deepvariant_germline \
    --read-group-id-prefix {wildcards.sample}  &> {log}"
10
11
shell:
    "vcftools --gzvcf {input} --remove-filtered \".\" --recode --recode-INFO-all --out {wildcards.sample} &> {log}"
26
27
shell:
    "( python {params}/scripts/ref_vcf.py {input.vcf} {input.ref} {output} ) &> {log}"
37
38
shell:
    """( awk '{{gsub(/chrM/,"chrMT"); print}}' {input} > {output} ) &> {log}"""
51
52
shell:
    "( bgzip {input} && tabix {input}.gz ) &> {log}"
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
import sys
from pysam import VariantFile

vcf_in = VariantFile(sys.argv[1])  # dosen't matter if bgziped or not. Automatically recognizes

# Add reference_description descriptions to new header
new_header = vcf_in.header
#new_header.add_line("reference="+ sys.argv[2])
new_header.add_line("##reference=" + sys.argv[2])

# start new vcf with the new_header
vcf_out = VariantFile(sys.argv[3], 'w', header=new_header)


for record in vcf_in.fetch():
    vcf_out.write(record)
ShowHide 3 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/zezzipa/Poirot_RD-WGS
Name: poirot_rd-wgs
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: GNU General Public License v3.0
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...