SV calling pipeline

public public 1yr ago 0 bookmarks

HiFi-sv-snakemake

A snakemake pipeline for structural variant(SV) analysis base on HiFi reads
With it, you can get a set of SV dataset including five types, which are Insertion , Deletion , Inversion , Duplication , and Translocation .

Input files

To use the pipeline you should first copy the Snakefile and config.yaml files in the working directory. Then prepare the following files:

  1. a reference file named {genome}.fa

  2. a assembled genome file named {sample}.fa

  3. a HiFi reads fastq file named {sample}.fq.gz

The {genome} and {sample} labels should match the label in the configuration (config.yaml file or in the command line).

Dependencies

Conda environment files containing the dependencies can be found in the envs directory. These can be automatically installed and managed by providing the --use-conda argument to snakemake: snakemake --use-conda ... .
The folder scripts contains python scripts and text necessary for the pipeline.

Examples

This pipeline mainly contains four steps:

  1. read_map_all to obtain all necessary ailgnment file for SV calling.
snakemake --configfile config.yaml --cores 32 --use-conda read_map_all
  1. call_original_all to calling SV by mutiple callers and aligners.
snakemake --configfile config.yaml --cores 16 --use-conda call_original_all
  1. call_filter_all to filter all SV detasets.
snakemake --configfile config.yaml --cores 5 --use-conda call_filter_all
  1. final_results_all to get variant suitable for each SV type.
snakemake --configfile config.yaml --cores 16 --use-conda final_results_all

In this pipeline, we defined:
Insertion: detected by NGMLR + pbmm2 - SVIM ;
Deletion: detected by NGMLR + pbmm2 - Sniffles and Assemblytics ;
Inversion: detected by NGMLR - cuteSV + Sniffles ;
Duplication: detected by pbmm2 - pbsv + cuteSV ;
Tranlocation: detedcted by NGMLR + pbmm2 - pbsv + Sniffles + cuteSV and SyRI .

Overview of the workflow

We can easily visualize the workflow with Snakemake .

Rule graph

snakemake --rulegraph --configfile config.yaml -f final_results_all | dot -Tsvg > rulegraph.svg

DAG of jobs

snakemake -np -f final_results_all --configfile config.yaml --dag | dot -Tsvg > DAGgraph.svg

Code Snippets

71
72
shell:
    'ngmlr -r {input.ref} -q {input.fq} -t {threads} -x pacbio --rg-sm {wildcards.sample} --rg-lb {params.para} --rg-pl PacBio -o {output} 2> {log}'
86
87
shell:
    "pbmm2 align {input} {output} --unmapped --preset CCS -N 2 --sort --rg '@RG\\tID:{wildcards.sample}\\tSM:{wildcards.sample}\\tLB:{params.para}\\tPL:PB' 2> {log}"
104
105
106
shell:
    'samtools sort {input.bam_ngmlr} -o {output.sort_ngmlr} | samtools index 2> {log}' \
    'samtools sort {input.bam_pbmm2} -o {output.sort_pbmm2} | samtools index 2>> {log}'
125
126
127
shell:
    'nucmer --maxmatch {params.para} {input.ref} {input.asm} -p {wildcards.sample}-{params.name} -t {threads} 2> {log}' \
    'mv {wildcards.sample}-{params.name}.delta {output}'
141
142
shell:
    'minimap2 -ax asm5 --eqx {input.ref} {input.asm} > {output.sam} | samtools view -b -o {output.bam} >2 {log}'
164
165
166
167
shell:
    'trf {input.ref} {params.para} 2> {log.tdm}' \
    'pbsv discover {input.bam} --tandem-repeats {output.repeat} {output.svsig} 2> {log.sv}' \
    'pbsv call --ccs -x {params.depth} {input.ref} {output.svsig} {output.vcf} 2>> {log.sv}'
182
183
184
shell:
    'svim alignment --sequence_alleles --read_names {input.ref} {input.bam} {params.outdir}/ 2> {log}' \
    'mv {params.outdir}/final_results.vcf {output}'
204
205
206
207
shell:
    'samtools calmd -u {input.pbmm2_bam} {input.ref} > {output.pbmm2_MD}' \
    'sniffles --skip_parameter_estimation -s {params.para} --ccs_reads -m {input.ngmlr_bam} -v {output.vcf_ngmlr} 2> {log.log_ngmlr}' \
    'sniffles --skip_parameter_estimation -s {params.para} --ccs_reads -m {output.pbmm2_MD} -v {output.vcf_pbmm2} 2> {log.log_pbmm2}'
225
226
227
228
shell:
    'samtools calmd -u {input.pbmm2_bam} {input.ref} > {output.pbmm2_MD}' \
    'sniffles --skip_parameter_estimation -s {params.para} --ccs_reads -m {input.ngmlr_bam} -v {output.vcf_ngmlr} 2> {log.log_ngmlr}' \
    'sniffles --skip_parameter_estimation -s {params.para} --ccs_reads -m {output.pbmm2_MD} -v {output.vcf_pbmm2} 2> {log.log_pbmm2}'
243
244
245
shell:
    'mkdir {params.work_dir}' \
    'cuteSV {input} {output} {params.work_dir} --max_cluster_bias_INS 1000 --diff_ratio_merging_INS 0.9 --max_cluster_bias_DEL 1000 --diff_ratio_merging_DEL 0.5 --threads {threads} --sample {wildcards.sample}-{params.map_name} --min_support 10 --genotype 2> {log}'
260
261
262
shell:
    'gzip {input} | scripts/Assemblytics {wildcards.sample}-{params.name} 10000 20 100000 2> {log}' \
    'mv {wildcards.sample}-{params.name}/{wildcards.sample}-{params.name}.Assemblytics_structural_variants.bed {output}'
SnakeMake From line 260 of main/Snakefile
279
280
281
282
283
shell:
    'python ref_chr.py -i {input.ref} -o {output.ref_chr} -c {params.max_chr}' \
    'python asm_chr.py -i {input.asm} -o {output.asm_chr}' \
    'syri -c {input.bam} -r {output.ref_chr} -q {output.asm_chr} -k -F B --nosnp 2> {log}' \
    'mv syri.vcf {output.vcf}'
SnakeMake From line 279 of main/Snakefile
300
301
shell:
    'python svim_ins_filter.py -i {input} -q {params.qual} -o {output}'
SnakeMake From line 300 of main/Snakefile
313
314
315
shell:
    'ls {input} > sample_files_ins' \
    'SURVIVOR merge sample_files_ins 1000 1 1 1 0 20 {output} 2> {log}'
329
330
331
332
shell:
    'python assemblytics_format_change.py -i {input.bed} -a {input.txt} -o {output.vcf_temp}' \
    'bcftools norm -f {input.ref} -m-both -c s -o {output.norm} {output.vcf_temp}' \
    'python genotype.py -i {output.vcf_temp} -o {output.vcf}'
339
340
run:
    shell('python sniffles_del_fileter.py -i {input} -o {output}')
SnakeMake From line 339 of main/Snakefile
351
352
353
run:
    shell('(ls {input} > sample_files_del)')
    shell('(SURVIVOR merge sample_files_del 1000 1 1 1 0 20 {output}) 2> {log}')
SnakeMake From line 351 of main/Snakefile
366
367
368
run:
    shell('python sniffles_inv_fileter.py -i {input.Sniffles} -o {output.Sniffles_vcf}')
    shell('python cutesv_inv_fliter.py -i {input.cuteSV} -o {output.cuteSV_vcf}')
SnakeMake From line 366 of main/Snakefile
378
379
380
run:
    shell('(ls {input} > sample_files_inv)')
    shell('(SURVIVOR merge sample_files_inv 1000 1 1 1 0 20 {output}) 2> {log}')
SnakeMake From line 378 of main/Snakefile
392
393
394
run:
    shell('python pbsv_dup_fileter.py -i {input.pbsv} -o {output.pbsv_vcf}')
    shell('python cutesv_dup_fliter.py -i {input.cuteSV} -o {output.cuteSV_vcf}')
SnakeMake From line 392 of main/Snakefile
404
405
406
run:
    shell('(ls {input} > sample_files_dup)')
    shell('(SURVIVOR merge sample_files_dup 1000 1 1 1 0 20 {output}) 2> {log}')
SnakeMake From line 404 of main/Snakefile
416
417
run:
    shell('python syri_tra_filter.py -i {input} -o {output}')
SnakeMake From line 416 of main/Snakefile
428
429
430
431
run:
    shell('python pbsv_tra_filter.py -i {input.pbsv} -o {output.pbsv_vcf}')
    shell('python sniffles_tra_filter.py -i {input.Sniffles} -o {output.Sniffles_vcf}')
    shell('python cutesv_tra_filter.py -i {input.cuteSV} -o {output.cuteSV_vcf}')
SnakeMake From line 428 of main/Snakefile
446
447
448
run:
    shell('(ls {input} > sample_files_tra)')
    shell('(SURVIVOR merge sample_files_tra 1000 2 1 1 0 20 {output}) 2> {log}')
SnakeMake From line 446 of main/Snakefile
455
456
run:
    shell('python tra_remove_repeat.py -i {input} -o {output}')
SnakeMake From line 455 of main/Snakefile
ShowHide 14 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/Triciasnow/HiFi-sv-snakemake
Name: hifi-sv-snakemake
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: Apache License 2.0
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...