Structural variant filtering and anlaysis of Nanopore human WGS data.

public public 1yr ago Version: v0.1.0 0 bookmarks

Structural variant filtering and analysis of Nanopore human WGS data.

nanopore-SV-analysis

Installation instructions

Download the latest code from GitHub:

git clone https://github.com/mike-molnar/nanopore-SV-analysis.git

Dependencies

There are many dependencies so it is best to create a new Conda environment using the provided YAML file:

conda env create -n nanopore-SV-analysis -f nanopore-SV-analysis.yml
conda activate nanopore-SV-analysis

Below is a list of the Conda dependencies:

  • BCFtools

  • bedtools

  • karyoploteR

  • cuteSV

  • Longshot

  • NanoFilt v2.8.0

  • NanoPlot v1.20.0

  • pybedtools

  • PySAM

  • PyVCF

  • seaborn v0.10.0

  • Snakemake

  • Sniffles2

  • SURVIVOR

  • SVIM

  • WhatsHap

  • Winnowmap2

Reference genome

You will need to download the reference genome manually before running the workflow. I have not included the download as part of the workflow because it is designed to run on a cluster that may not have internet access. You can use a local copy of GRCh38 if you have one, but the reference can only contain the autosomes and sex chromosomes, and the chromosomes must be named chr1, chr2, ...,chrX, chrY . To download the reference genome and index it, change to the reference directory of the workflow and run the download_reference.sh script:

cd /path/to/nanopore-SV-analysis/reference
chmod u+x download_reference.sh
./download_reference.sh

To run on a grid engine

Copy the Snakefile and config.yaml files to the directory that you want to run the workflow. Modify the information in config.yaml for your sample names and FASTQ locations. There are a few different grid engines, so the exact format to run the workflow may be different for your particular grid engine:

snakemake --jobs 500 --rerun-incomplete --keep-going --latency-wait 60 --cluster "qsub -cwd -V -o snakemake.output.log -e snakemake.error.log -q queue_name -P project_name -pe smp {threads} -l h_vmem={params.memory_per_thread} -l h_rt={params.run_time} -b y"

You will have to replace queue_name and project_name with the necessary values to run on your cluster.

Code Snippets

17
18
19
shell:
    "{conda_dir}/NanoPlot {params.scale} -t {threads} -p {wildcards.sample} \
    --title {wildcards.sample} --fastq {input} -o {wildcards.sample}/analysis/nanoplot &> {log}"
37
38
shell:
    "{conda_dir}/samtools depth -r {wildcards.chromosomes} {input.bam} > {output} 2> {log}"
57
58
59
shell:
    "{conda_dir}/python {scripts_dir}/coverage_by_window.py -i {input.depth} \
    -o {output} -w {params.window} -c {params.coverage} &> {log}"
74
75
shell:
    "cat {input} > {output} 2> {log}"
94
95
96
shell:
    "{conda_dir}/python {scripts_dir}/coverage_by_window.py -i {input.depth} \
    -o {output} -w {params.window} -c {params.coverage} &> {log}"
111
112
shell:
    "cat {input} > {output} 2> {log}"
131
132
133
shell:
    "{conda_dir}/python {scripts_dir}/coverage_by_window.py -i {input.depth} \
    -o {output} -w {params.window} -c {params.coverage} &> {log}"
148
149
shell:
    "cat {input} > {output} 2> {log}"
168
169
170
shell:
    "{conda_dir}/python {scripts_dir}/coverage_by_window.py -i {input.depth} \
    -o {output} -w {params.window} -c {params.coverage} &> {log}"
185
186
shell:
    "cat {input} > {output} 2> {log}"
205
206
207
shell:
    "{conda_dir}/python {scripts_dir}/coverage_by_window.py -i {input.depth} \
    -o {output} -w {params.window} -c {params.coverage} &> {log}"
222
223
shell:
    "cat {input} > {output} 2> {log}"
240
241
242
shell:
    "{conda_dir}/Rscript {scripts_dir}/karyoploter_coverage.R {input.normal} {input.sample} \
    {output} {params.genome} {wildcards.normal} {wildcards.sample} &> {log}"
259
260
261
shell:
    "{conda_dir}/Rscript {scripts_dir}/karyoploter_coverage_small_chr.R {input.normal} {input.sample} \
    {output} {params.genome} {wildcards.normal} {wildcards.sample} &> {log}"
278
279
280
shell:
    "{conda_dir}/Rscript {scripts_dir}/karyoploter_coverage_large_chr.R {input.normal} {input.sample} \
    {output} {params.genome} {wildcards.normal} {wildcards.sample} &> {log}"
299
300
301
302
shell:
    "{conda_dir}/Rscript {scripts_dir}/karyoploter_coverage_chr.R {input.normal_cov} {input.sample_cov} \
    {input.normal_baf} {input.sample_baf} {wildcards.normal} {wildcards.sample} {params.genome} \
    {wildcards.chromosomes} {output} &> {log}"
331
332
333
334
335
336
shell:
    "{conda_dir}/python {scripts_dir}/filter_SVs.py -ins {input.INS} -del {input.DEL} -dup {input.DUP} \
    -inv {input.INV} -trans {input.trans} -depth {input.depth} -split {input.split_alignments} \
    -low_map {input.low_map} -gaps {genome_gaps} -ins_out {output.INS} -del_out \
    {output.DEL} -dup_out {output.DUP} -inv_out {output.INV} -trans_out {output.trans} -cnv_out \
    {output.CNVs} -cov {params.coverage} &> {log}"
17
18
19
shell:
    "{conda_dir}/NanoFilt --logfile {log} -q {params.quality} -l {params.min_length} \
    {input} | {conda_dir}/bgzip -c > {output} 2>> {log}"
35
36
shell:
    "{conda_dir}/bgzip -r {input} 2> {log}"
55
56
57
58
shell:
    "{conda_dir}/winnowmap {params.preset_options} {params.include_MD_tag} \
    -t {threads} -W {high_frequency_kmers} {reference} {input.fastq} 2> {log} | \
    {conda_dir}/samtools sort -o {output} &>> {log}"
74
75
shell:
    "{conda_dir}/samtools index {input} &> {log}"
93
94
95
shell:
    "{conda_dir}/longshot {params.thresholds} -r {wildcards.regions} --bam {input.bam} \
    --ref {reference} --out {output} &> {log}"
111
112
shell:
    "{conda_dir}/bgzip --threads {threads} -c {input} > {output} 2> {log}"
129
130
shell:
    "{conda_dir}/tabix -p {params.file_type} {input} &> {log}"
149
150
151
shell:
    "{conda_dir}/bcftools concat {params.concat} {input.zip} 2> {log} | \
    {conda_dir}/bcftools sort {params.sort} -o {output} &>> {log}"
168
169
shell:
    "{conda_dir}/tabix -p {params.file_type} {input} &> {log}"
190
191
192
shell:
    "{conda_dir}/whatshap haplotag {params.RG} {params.noSNVs} --regions {wildcards.regions} \
    --reference {reference} {input.vcf} {input.bam} 2> {log} | {conda_dir}/samtools sort -o {output} &>> {log}"
208
209
shell:
    "{conda_dir}/samtools merge {output} {input} &> {log}"
225
226
shell:
    "{conda_dir}/samtools index {input} &> {log}"
245
246
247
shell:
    "zcat {input.zip} 2> {log} | {conda_dir}/bcftools view -f {params.view_param} \
    2>> {log} | {conda_dir}/bcftools query -f {params.query_param} > {output} 2>> {log}"
263
264
shell:
    "{conda_dir}/python {scripts_dir}/calculate_b_allele_frequency.py -i {input} -o {output} 2> {log}"
282
283
284
shell:
    "{conda_dir}/samtools view -h {input.bam} 2> {log} | \
    awk '$5<{params.quality} {{print $0}}' > {output} 2>> {log}"
300
301
302
shell:
    "{conda_dir}/samtools view -S -b -h {input} 2> {log} | \
    {conda_dir}/samtools depth - > {output} 2>> {log}"
322
323
324
325
shell:
    "{conda_dir}/SURVIVOR bincov {input} {params.distance} {params.min_coverage} 2> {log} | \
    {conda_dir}/bedtools slop -i stdin -g {chromosome_sizes} -b {params.slop} 2>> {log} | \
    {conda_dir}/bedtools merge -i stdin -d {params.merge_length} > {output} 2>> {log}"
21
22
23
24
25
shell:
    "{conda_dir}/cuteSV --diff_ratio_merging_DEL {params.diff_ratio_merging_DEL} \
    --min_support {params.min_support} --max_split_parts {params.max_splits} \
    --threads {threads} --max_cluster_bias_DEL {params.max_cluster_bias_DEL} {input.bam} \
    {reference} {output} {params.working_dir} &> {log}"
41
42
shell:
    "{conda_dir}/python {scripts_dir}/vcf2bedpe.py -i {input} -o {output} &> {log}"
62
63
64
shell:
    "{conda_dir}/sniffles --input {input.bam} --snf {output.snf} --vcf {output.vcf} --non-germline \
     --minsupport {params.min_reads} --max-splits-base {params.max_splits} -t {threads} &> {log}"
80
81
shell:
    "{conda_dir}/python {scripts_dir}/vcf2bedpe.py -i {input} -o {output} &> {log}"
 99
100
shell:
    "{conda_dir}/svim alignment {params.working_dir} {input.bam} {reference} &> {log}"
116
117
shell:
    "{conda_dir}/python {scripts_dir}/vcf2bedpe.py -i {input} -o {output} &> {log}"
136
137
138
shell:
    "awk {params.awk} {input} 2> {log} | awk {params.fix_columns} 2>> {log} | \
    awk {params.remove_duplicates} > {output} 2>> {log}"
165
166
167
168
shell:
    "awk {params.awk_INS} {input.cuteSV} {input.sniffles} {input.svim} 2> {log} | \
    sort {params.sort} 2>> {log} | {conda_dir}/bedtools merge -i stdin -d {params.merge_length} \
    {params.columns} {params.output_type} > {output} 2>> {log}"
191
192
193
194
shell:
    "awk {params.awk_DEL} {input.cuteSV} {input.sniffles} {input.svim} 2> {log} | \
    sort {params.sort} 2>> {log} | {conda_dir}/bedtools merge -i stdin -d {params.merge_length} \
    {params.columns} {params.output_type} > {output} 2>> {log}"
218
219
220
221
222
shell:
    "grep {params.grep_DUP} {input.cuteSV} {input.sniffles} {input.svim} 2> {log} | \
    cut {params.select_columns} 2>> {log} | sort {params.sort} 2>> {log} | \
    {conda_dir}/bedtools merge -i stdin -d {params.merge_length} {params.columns} \
    {params.output_type} > {output} 2>> {log}"
246
247
248
249
250
shell:
    "grep {params.grep_INV} {input.cuteSV} {input.sniffles} {input.svim} 2> {log} | \
    cut {params.select_columns} 2>> {log} | sort {params.sort} 2>> {log} | \
    {conda_dir}/bedtools merge -i stdin -d {params.merge_length} {params.columns} \
    {params.output_type} > {output} 2>> {log}"
272
273
274
275
shell:
    "awk {params.awk_BND_TRA} {input.cuteSV} {input.sniffles} {input.svim} 2> {log} | \
    awk {params.fix_columns} 2>> {log} | cut {params.select_columns} 2>> {log} | \
    sort {params.sort} > {output} 2>> {log}"
292
293
294
shell:
    "{conda_dir}/python {scripts_dir}/merge_split_alignments.py \
    --read-to-reference-bam {input.bam} --output-bedpe {output} &> {log}"
314
315
316
shell:
    "sort {params.sort} {input} 2> {log} | {conda_dir}/bedtools merge -i stdin \
    -d {params.merge_length} -c {params.columns} -o {params.output_type} > {output} 2>> {log}"
ShowHide 37 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/mike-molnar/nanopore-SV-analysis
Name: nanopore-sv-analysis
Version: v0.1.0
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...