Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
HiFi-sv-snakemake
A
snakemake
pipeline for structural variant(SV) analysis base on HiFi reads
With it, you can get a set of SV dataset including five types, which are
Insertion
,
Deletion
,
Inversion
,
Duplication
, and
Translocation
.
Input files
To use the pipeline you should first copy the
Snakefile
and
config.yaml
files in the working directory.
Then prepare the following files:
-
a reference file named
{genome}.fa
-
a assembled genome file named
{sample}.fa
-
a HiFi reads fastq file named
{sample}.fq.gz
The {genome} and {sample} labels should match the label in the configuration (config.yaml file or in the command line).
Dependencies
Conda environment files containing the dependencies can be found in the
envs
directory. These can be automatically installed and managed by providing the
--use-conda
argument to snakemake:
snakemake --use-conda ...
.
The folder
scripts
contains python scripts and text necessary for the pipeline.
Examples
This pipeline mainly contains four steps:
- read_map_all to obtain all necessary ailgnment file for SV calling.
snakemake --configfile config.yaml --cores 32 --use-conda read_map_all
- call_original_all to calling SV by mutiple callers and aligners.
snakemake --configfile config.yaml --cores 16 --use-conda call_original_all
- call_filter_all to filter all SV detasets.
snakemake --configfile config.yaml --cores 5 --use-conda call_filter_all
- final_results_all to get variant suitable for each SV type.
snakemake --configfile config.yaml --cores 16 --use-conda final_results_all
In this pipeline, we defined:
Insertion: detected by
NGMLR
+
pbmm2
-
SVIM
;
Deletion: detected by
NGMLR
+
pbmm2
-
Sniffles
and
Assemblytics
;
Inversion: detected by
NGMLR
-
cuteSV
+
Sniffles
;
Duplication: detected by
pbmm2
-
pbsv
+
cuteSV
;
Tranlocation: detedcted by
NGMLR
+
pbmm2
-
pbsv
+
Sniffles
+
cuteSV
and
SyRI
.
Overview of the workflow
We can easily visualize the workflow with Snakemake .
Rule graph
snakemake --rulegraph --configfile config.yaml -f final_results_all | dot -Tsvg > rulegraph.svg
DAG of jobs
snakemake -np -f final_results_all --configfile config.yaml --dag | dot -Tsvg > DAGgraph.svg
Code Snippets
71 72 | shell: 'ngmlr -r {input.ref} -q {input.fq} -t {threads} -x pacbio --rg-sm {wildcards.sample} --rg-lb {params.para} --rg-pl PacBio -o {output} 2> {log}' |
86 87 | shell: "pbmm2 align {input} {output} --unmapped --preset CCS -N 2 --sort --rg '@RG\\tID:{wildcards.sample}\\tSM:{wildcards.sample}\\tLB:{params.para}\\tPL:PB' 2> {log}" |
104 105 106 | shell: 'samtools sort {input.bam_ngmlr} -o {output.sort_ngmlr} | samtools index 2> {log}' \ 'samtools sort {input.bam_pbmm2} -o {output.sort_pbmm2} | samtools index 2>> {log}' |
125 126 127 | shell: 'nucmer --maxmatch {params.para} {input.ref} {input.asm} -p {wildcards.sample}-{params.name} -t {threads} 2> {log}' \ 'mv {wildcards.sample}-{params.name}.delta {output}' |
141 142 | shell: 'minimap2 -ax asm5 --eqx {input.ref} {input.asm} > {output.sam} | samtools view -b -o {output.bam} >2 {log}' |
164 165 166 167 | shell: 'trf {input.ref} {params.para} 2> {log.tdm}' \ 'pbsv discover {input.bam} --tandem-repeats {output.repeat} {output.svsig} 2> {log.sv}' \ 'pbsv call --ccs -x {params.depth} {input.ref} {output.svsig} {output.vcf} 2>> {log.sv}' |
182 183 184 | shell: 'svim alignment --sequence_alleles --read_names {input.ref} {input.bam} {params.outdir}/ 2> {log}' \ 'mv {params.outdir}/final_results.vcf {output}' |
204 205 206 207 | shell: 'samtools calmd -u {input.pbmm2_bam} {input.ref} > {output.pbmm2_MD}' \ 'sniffles --skip_parameter_estimation -s {params.para} --ccs_reads -m {input.ngmlr_bam} -v {output.vcf_ngmlr} 2> {log.log_ngmlr}' \ 'sniffles --skip_parameter_estimation -s {params.para} --ccs_reads -m {output.pbmm2_MD} -v {output.vcf_pbmm2} 2> {log.log_pbmm2}' |
225 226 227 228 | shell: 'samtools calmd -u {input.pbmm2_bam} {input.ref} > {output.pbmm2_MD}' \ 'sniffles --skip_parameter_estimation -s {params.para} --ccs_reads -m {input.ngmlr_bam} -v {output.vcf_ngmlr} 2> {log.log_ngmlr}' \ 'sniffles --skip_parameter_estimation -s {params.para} --ccs_reads -m {output.pbmm2_MD} -v {output.vcf_pbmm2} 2> {log.log_pbmm2}' |
243 244 245 | shell: 'mkdir {params.work_dir}' \ 'cuteSV {input} {output} {params.work_dir} --max_cluster_bias_INS 1000 --diff_ratio_merging_INS 0.9 --max_cluster_bias_DEL 1000 --diff_ratio_merging_DEL 0.5 --threads {threads} --sample {wildcards.sample}-{params.map_name} --min_support 10 --genotype 2> {log}' |
260 261 262 | shell: 'gzip {input} | scripts/Assemblytics {wildcards.sample}-{params.name} 10000 20 100000 2> {log}' \ 'mv {wildcards.sample}-{params.name}/{wildcards.sample}-{params.name}.Assemblytics_structural_variants.bed {output}' |
279 280 281 282 283 | shell: 'python ref_chr.py -i {input.ref} -o {output.ref_chr} -c {params.max_chr}' \ 'python asm_chr.py -i {input.asm} -o {output.asm_chr}' \ 'syri -c {input.bam} -r {output.ref_chr} -q {output.asm_chr} -k -F B --nosnp 2> {log}' \ 'mv syri.vcf {output.vcf}' |
300 301 | shell: 'python svim_ins_filter.py -i {input} -q {params.qual} -o {output}' |
313 314 315 | shell: 'ls {input} > sample_files_ins' \ 'SURVIVOR merge sample_files_ins 1000 1 1 1 0 20 {output} 2> {log}' |
329 330 331 332 | shell: 'python assemblytics_format_change.py -i {input.bed} -a {input.txt} -o {output.vcf_temp}' \ 'bcftools norm -f {input.ref} -m-both -c s -o {output.norm} {output.vcf_temp}' \ 'python genotype.py -i {output.vcf_temp} -o {output.vcf}' |
339 340 | run: shell('python sniffles_del_fileter.py -i {input} -o {output}') |
351 352 353 | run: shell('(ls {input} > sample_files_del)') shell('(SURVIVOR merge sample_files_del 1000 1 1 1 0 20 {output}) 2> {log}') |
366 367 368 | run: shell('python sniffles_inv_fileter.py -i {input.Sniffles} -o {output.Sniffles_vcf}') shell('python cutesv_inv_fliter.py -i {input.cuteSV} -o {output.cuteSV_vcf}') |
378 379 380 | run: shell('(ls {input} > sample_files_inv)') shell('(SURVIVOR merge sample_files_inv 1000 1 1 1 0 20 {output}) 2> {log}') |
392 393 394 | run: shell('python pbsv_dup_fileter.py -i {input.pbsv} -o {output.pbsv_vcf}') shell('python cutesv_dup_fliter.py -i {input.cuteSV} -o {output.cuteSV_vcf}') |
404 405 406 | run: shell('(ls {input} > sample_files_dup)') shell('(SURVIVOR merge sample_files_dup 1000 1 1 1 0 20 {output}) 2> {log}') |
416 417 | run: shell('python syri_tra_filter.py -i {input} -o {output}') |
428 429 430 431 | run: shell('python pbsv_tra_filter.py -i {input.pbsv} -o {output.pbsv_vcf}') shell('python sniffles_tra_filter.py -i {input.Sniffles} -o {output.Sniffles_vcf}') shell('python cutesv_tra_filter.py -i {input.cuteSV} -o {output.cuteSV_vcf}') |
446 447 448 | run: shell('(ls {input} > sample_files_tra)') shell('(SURVIVOR merge sample_files_tra 1000 2 1 1 0 20 {output}) 2> {log}') |
455 456 | run: shell('python tra_remove_repeat.py -i {input} -o {output}') |
Support
- Future updates
Related Workflows





