GermlineStructuralV-nf: Comprehensive Structural Variant Identification Pipeline for Human Genome Using Multi-Caller Approach

public 1yr ago Version: Version 1 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

GermlineStructuralV-nf

Description

GermlineStructuralV-nf is a pipeline for identifying structural variant events in human Illumina short read whole genome sequence data. GermlineStructuralV-nf identifies structural variant and copy number events from BAM files using Manta , Smoove , and TIDDIT . Variants are then merged using SURVIVOR , and annotated by AnnotSV . The pipeline is written in Nextflow and uses Singularity/Docker to run containerised tools.

Structural and copy number detection is challenging. Most structural variant detection tools infer these events from read mapping patterns, which can often resemble sequencing and read alignment artefacts. To address this, GermlineStructuralV-nf employs 3 general purpose structural variant calling tools, which each support a combination of detection methods. Manta, Smoove and TIDDIT use typical detection approaches that consider:

Discordant read pair alignments
Split reads that span a breakpoints
Read depth profiling
Local de novo assembly

This approach is currently considered the best approach for maximising sensitivty of short read data ( Cameron et al. 2019 , Malmoud et al. 2019 ). By using a combination of tools that employ different methods, we improve our ability to detect different types and sizes of variant events.

Diagram

User guide

To run this pipeline, you will need to prepare your input files, reference data, and clone this repository. Before proceeding, ensure Nextflow is installed on the system you're working on.

Code Snippets

"""
AnnotSV \
	-SVinputFile ${sampleID}_merged.vcf \
	-annotationsDir ${params.annotsvDir} \
	-bedtools bedtools -bcftools bcftools \
	-annotationMode ${mode} \
	-genomeBuild GRCh38 \
	-includeCI 1 \
	-overwrite 1 \
	-outputFile ${outputFile} ${extraArgs}
"""

NextFlow BCFtools BEDTools AnnotSV From line 31 of modules/annotsv.nf

"""
cat "${params.input}" > samples.txt
"""

NextFlow From line 14 of modules/check_cohort.nf

"""
# configure manta SV analysis workflow
configManta.py \
	--normalBam ${bam} \
	--referenceFasta ${params.ref} \
	--runDir manta \
	${intervals} ${extraArgs}

# run SV detection 
manta/runWorkflow.py -m local -j ${task.cpus}

# clean up outputs
mv manta/results/variants/candidateSmallIndels.vcf.gz \
	manta/Manta_${sampleID}.candidateSmallIndels.vcf.gz
mv manta/results/variants/candidateSmallIndels.vcf.gz.tbi \
	manta/Manta_${sampleID}.candidateSmallIndels.vcf.gz.tbi
mv manta/results/variants/candidateSV.vcf.gz \
	manta/Manta_${sampleID}.candidateSV.vcf.gz
mv manta/results/variants/candidateSV.vcf.gz.tbi \
	manta/Manta_${sampleID}.candidateSV.vcf.gz.tbi
mv manta/results/variants/diploidSV.vcf.gz \
	manta/Manta_${sampleID}.diploidSV.vcf.gz
mv manta/results/variants/diploidSV.vcf.gz.tbi \
	manta/Manta_${sampleID}.diploidSV.vcf.gz.tbi

# convert multiline inversion BNDs from manta vcf to single line
convertInversion.py \$(which samtools) ${params.ref} \
	manta/Manta_${sampleID}.diploidSV.vcf.gz \
	> manta/Manta_${sampleID}.diploidSV_converted.vcf

# zip and index converted vcf
bgzip manta/Manta_${sampleID}.diploidSV_converted.vcf
tabix manta/Manta_${sampleID}.diploidSV_converted.vcf.gz
"""

NextFlow tabix manta From line 24 of modules/manta.nf

"""
# create new header for merged vcf
printf "${sampleID}_manta\n" > ${sampleID}_rehead_manta.txt

# replace sampleID with caller_sample for merging
bcftools reheader \
	Manta_${sampleID}.diploidSV_converted.vcf.gz \
	-s ${sampleID}_rehead_manta.txt \
	-o Manta_${sampleID}.vcf.gz

# gunzip vcf
gunzip Manta_${sampleID}.vcf.gz
"""

NextFlow BCFtools From line 73 of modules/manta.nf

"""
smoove call -d --name ${sampleID} \
	--fasta ${params.ref} \
	--outdir smoove \
	--processes ${task.cpus} \
	--genotype ${bam} ${extraArgs}
"""

NextFlow smoove From line 22 of modules/smoove.nf

"""
# create new header for merged vcf
printf "${sampleID}_smoove\n" > ${sampleID}_rehead_smoove.txt

# replace sampleID with caller_sample for merging 	
bcftools reheader \
	${sampleID}-smoove.genotyped.vcf.gz \
	-s ${sampleID}_rehead_smoove.txt \
	-o Smoove_${sampleID}.vcf.gz

# gunzip vcf
gunzip Smoove_${sampleID}.vcf.gz

#clean up
#rm -r ${sampleID}_rehead_smoove.txt
"""

NextFlow BCFtools From line 43 of modules/smoove.nf

"""
echo ${mergeFile} | xargs -n1 > ${sampleID}_survivor.txt

SURVIVOR merge ${sampleID}_survivor.txt \
	${params.survivorMaxDist} \
	${params.survivorConsensus} \
	${params.survivorType} \
	${params.survivorStrand} \
	0 \
	${params.survivorSize} \
	${sampleID}_merged.vcf
"""

NextFlow SURVIVOR From line 16 of modules/survivor_merge.nf

"""
SURVIVOR vcftobed ${sampleID}_merged.vcf \
	0 -1 \
	${sampleID}_merged.bed

SURVIVOR stats ${sampleID}_merged.vcf \
	-1 -1 -1 \
	${sampleID}_merged.stats.txt
"""

NextFlow SURVIVOR From line 13 of modules/survivor_summary.nf

"""
"""

NextFlow From line 36 of modules/survivor_summary.nf

"""
tiddit \
	--cov \
	--bam ${bam} \
	--ref ${params.ref} \
	-o ${sampleID}_cov  ${extraArgs}
"""

NextFlow TIDDIT From line 16 of modules/tiddit_cov.nf

"""
tiddit \
	--sv \
	-q 20 \
	--bam ${bam} \
	--ref ${params.ref} \
	-o ${sampleID}_sv \
	--threads ${task.cpus} ${extraArgs}

# rename vcf to show its from tiddit 
mv ${sampleID}_sv.vcf \
	Tiddit_${sampleID}_sv.vcf

# filter to pass only variants 
grep -E "#|PASS" Tiddit_${sampleID}_sv.vcf \
	> Tiddit_${sampleID}_PASSsv.vcf
"""

NextFlow TIDDIT From line 18 of modules/tiddit_sv.nf

"""
# bgzip and index tiddit vcf 
bgzip Tiddit_${sampleID}_PASSsv.vcf
tabix Tiddit_${sampleID}_PASSsv.vcf.gz

# create new header for merged vcf
printf "${sampleID}_tiddit\n" > ${sampleID}_rehead_tiddit.txt

# replace sampleID with caller_sample for merging 	
bcftools reheader \
	Tiddit_${sampleID}_PASSsv.vcf.gz \
	-s ${sampleID}_rehead_tiddit.txt \
	-o Tiddit_${sampleID}_final.vcf.gz

# gunzip vcf
gunzip Tiddit_${sampleID}_final.vcf.gz

#clean up
rm -r ${sampleID}_rehead_tiddit.txt
"""