GermlineStructuralV-nf: Comprehensive Structural Variant Identification Pipeline for Human Genome Using Multi-Caller Approach

public public 1yr ago Version: Version 1 0 bookmarks

GermlineStructuralV-nf

Description

GermlineStructuralV-nf is a pipeline for identifying structural variant events in human Illumina short read whole genome sequence data. GermlineStructuralV-nf identifies structural variant and copy number events from BAM files using Manta , Smoove , and TIDDIT . Variants are then merged using SURVIVOR , and annotated by AnnotSV . The pipeline is written in Nextflow and uses Singularity/Docker to run containerised tools.

Structural and copy number detection is challenging. Most structural variant detection tools infer these events from read mapping patterns, which can often resemble sequencing and read alignment artefacts. To address this, GermlineStructuralV-nf employs 3 general purpose structural variant calling tools, which each support a combination of detection methods. Manta, Smoove and TIDDIT use typical detection approaches that consider:

  • Discordant read pair alignments
  • Split reads that span a breakpoints
  • Read depth profiling
  • Local de novo assembly

This approach is currently considered the best approach for maximising sensitivty of short read data ( Cameron et al. 2019 , Malmoud et al. 2019 ). By using a combination of tools that employ different methods, we improve our ability to detect different types and sizes of variant events.

Diagram

User guide

To run this pipeline, you will need to prepare your input files, reference data, and clone this repository. Before proceeding, ensure Nextflow is installed on the system you're working on.

Code Snippets

31
32
33
34
35
36
37
38
39
40
41
"""
AnnotSV \
	-SVinputFile ${sampleID}_merged.vcf \
	-annotationsDir ${params.annotsvDir} \
	-bedtools bedtools -bcftools bcftools \
	-annotationMode ${mode} \
	-genomeBuild GRCh38 \
	-includeCI 1 \
	-overwrite 1 \
	-outputFile ${outputFile} ${extraArgs}
"""
14
15
16
"""
cat "${params.input}" > samples.txt
"""
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
"""
# configure manta SV analysis workflow
configManta.py \
	--normalBam ${bam} \
	--referenceFasta ${params.ref} \
	--runDir manta \
	${intervals} ${extraArgs}

# run SV detection 
manta/runWorkflow.py -m local -j ${task.cpus}

# clean up outputs
mv manta/results/variants/candidateSmallIndels.vcf.gz \
	manta/Manta_${sampleID}.candidateSmallIndels.vcf.gz
mv manta/results/variants/candidateSmallIndels.vcf.gz.tbi \
	manta/Manta_${sampleID}.candidateSmallIndels.vcf.gz.tbi
mv manta/results/variants/candidateSV.vcf.gz \
	manta/Manta_${sampleID}.candidateSV.vcf.gz
mv manta/results/variants/candidateSV.vcf.gz.tbi \
	manta/Manta_${sampleID}.candidateSV.vcf.gz.tbi
mv manta/results/variants/diploidSV.vcf.gz \
	manta/Manta_${sampleID}.diploidSV.vcf.gz
mv manta/results/variants/diploidSV.vcf.gz.tbi \
	manta/Manta_${sampleID}.diploidSV.vcf.gz.tbi

# convert multiline inversion BNDs from manta vcf to single line
convertInversion.py \$(which samtools) ${params.ref} \
	manta/Manta_${sampleID}.diploidSV.vcf.gz \
	> manta/Manta_${sampleID}.diploidSV_converted.vcf

# zip and index converted vcf
bgzip manta/Manta_${sampleID}.diploidSV_converted.vcf
tabix manta/Manta_${sampleID}.diploidSV_converted.vcf.gz
"""
73
74
75
76
77
78
79
80
81
82
83
84
85
"""
# create new header for merged vcf
printf "${sampleID}_manta\n" > ${sampleID}_rehead_manta.txt

# replace sampleID with caller_sample for merging
bcftools reheader \
	Manta_${sampleID}.diploidSV_converted.vcf.gz \
	-s ${sampleID}_rehead_manta.txt \
	-o Manta_${sampleID}.vcf.gz

# gunzip vcf
gunzip Manta_${sampleID}.vcf.gz
"""
22
23
24
25
26
27
28
"""
smoove call -d --name ${sampleID} \
	--fasta ${params.ref} \
	--outdir smoove \
	--processes ${task.cpus} \
	--genotype ${bam} ${extraArgs}
"""
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
"""
# create new header for merged vcf
printf "${sampleID}_smoove\n" > ${sampleID}_rehead_smoove.txt

# replace sampleID with caller_sample for merging 	
bcftools reheader \
	${sampleID}-smoove.genotyped.vcf.gz \
	-s ${sampleID}_rehead_smoove.txt \
	-o Smoove_${sampleID}.vcf.gz

# gunzip vcf
gunzip Smoove_${sampleID}.vcf.gz

#clean up
#rm -r ${sampleID}_rehead_smoove.txt
"""
16
17
18
19
20
21
22
23
24
25
26
27
"""
echo ${mergeFile} | xargs -n1 > ${sampleID}_survivor.txt

SURVIVOR merge ${sampleID}_survivor.txt \
	${params.survivorMaxDist} \
	${params.survivorConsensus} \
	${params.survivorType} \
	${params.survivorStrand} \
	0 \
	${params.survivorSize} \
	${sampleID}_merged.vcf
"""
13
14
15
16
17
18
19
20
21
"""
SURVIVOR vcftobed ${sampleID}_merged.vcf \
	0 -1 \
	${sampleID}_merged.bed

SURVIVOR stats ${sampleID}_merged.vcf \
	-1 -1 -1 \
	${sampleID}_merged.stats.txt
"""
36
37
"""
"""
16
17
18
19
20
21
22
"""
tiddit \
	--cov \
	--bam ${bam} \
	--ref ${params.ref} \
	-o ${sampleID}_cov  ${extraArgs}
"""
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
"""
tiddit \
	--sv \
	-q 20 \
	--bam ${bam} \
	--ref ${params.ref} \
	-o ${sampleID}_sv \
	--threads ${task.cpus} ${extraArgs}

# rename vcf to show its from tiddit 
mv ${sampleID}_sv.vcf \
	Tiddit_${sampleID}_sv.vcf

# filter to pass only variants 
grep -E "#|PASS" Tiddit_${sampleID}_sv.vcf \
	> Tiddit_${sampleID}_PASSsv.vcf
"""
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
"""
# bgzip and index tiddit vcf 
bgzip Tiddit_${sampleID}_PASSsv.vcf
tabix Tiddit_${sampleID}_PASSsv.vcf.gz

# create new header for merged vcf
printf "${sampleID}_tiddit\n" > ${sampleID}_rehead_tiddit.txt

# replace sampleID with caller_sample for merging 	
bcftools reheader \
	Tiddit_${sampleID}_PASSsv.vcf.gz \
	-s ${sampleID}_rehead_tiddit.txt \
	-o Tiddit_${sampleID}_final.vcf.gz

# gunzip vcf
gunzip Tiddit_${sampleID}_final.vcf.gz

#clean up
rm -r ${sampleID}_rehead_tiddit.txt
"""
ShowHide 7 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/Sydney-Informatics-Hub/Germline-StructuralV-nf
Name: germlinestructuralv-nf
Version: Version 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...