Structural variant filtering and anlaysis of Nanopore human WGS data.

public 1yr ago Version: v0.1.0 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Structural variant filtering and analysis of Nanopore human WGS data.

nanopore-SV-analysis

Installation instructions

Download the latest code from GitHub:

git clone https://github.com/mike-molnar/nanopore-SV-analysis.git

Dependencies

There are many dependencies so it is best to create a new Conda environment using the provided YAML file:

conda env create -n nanopore-SV-analysis -f nanopore-SV-analysis.yml
conda activate nanopore-SV-analysis

Below is a list of the Conda dependencies:

BCFtools
bedtools
karyoploteR
cuteSV
Longshot
NanoFilt v2.8.0
NanoPlot v1.20.0
pybedtools
PySAM
PyVCF
seaborn v0.10.0
Snakemake
Sniffles2
SURVIVOR
SVIM
WhatsHap
Winnowmap2

Reference genome

You will need to download the reference genome manually before running the workflow. I have not included the download as part of the workflow because it is designed to run on a cluster that may not have internet access. You can use a local copy of GRCh38 if you have one, but the reference can only contain the autosomes and sex chromosomes, and the chromosomes must be named chr1, chr2, ...,chrX, chrY . To download the reference genome and index it, change to the reference directory of the workflow and run the download_reference.sh script:

cd /path/to/nanopore-SV-analysis/reference
chmod u+x download_reference.sh
./download_reference.sh

To run on a grid engine

Copy the Snakefile and config.yaml files to the directory that you want to run the workflow. Modify the information in config.yaml for your sample names and FASTQ locations. There are a few different grid engines, so the exact format to run the workflow may be different for your particular grid engine:

snakemake --jobs 500 --rerun-incomplete --keep-going --latency-wait 60 --cluster "qsub -cwd -V -o snakemake.output.log -e snakemake.error.log -q queue_name -P project_name -pe smp {threads} -l h_vmem={params.memory_per_thread} -l h_rt={params.run_time} -b y"

You will have to replace queue_name and project_name with the necessary values to run on your cluster.

Code Snippets

shell:
    "{conda_dir}/NanoPlot {params.scale} -t {threads} -p {wildcards.sample} \
    --title {wildcards.sample} --fastq {input} -o {wildcards.sample}/analysis/nanoplot &> {log}"

SnakeMake NanoPlot From line 17 of rules/analysis.smk

shell:
    "{conda_dir}/samtools depth -r {wildcards.chromosomes} {input.bam} > {output} 2> {log}"

SnakeMake SAMtools From line 37 of rules/analysis.smk

shell:
    "{conda_dir}/python {scripts_dir}/coverage_by_window.py -i {input.depth} \
    -o {output} -w {params.window} -c {params.coverage} &> {log}"

SnakeMake From line 57 of rules/analysis.smk

shell:
    "cat {input} > {output} 2> {log}"

SnakeMake From line 74 of rules/analysis.smk

shell:
    "{conda_dir}/python {scripts_dir}/coverage_by_window.py -i {input.depth} \
    -o {output} -w {params.window} -c {params.coverage} &> {log}"

SnakeMake From line 94 of rules/analysis.smk

shell:
    "cat {input} > {output} 2> {log}"

SnakeMake From line 111 of rules/analysis.smk

shell:
    "{conda_dir}/python {scripts_dir}/coverage_by_window.py -i {input.depth} \
    -o {output} -w {params.window} -c {params.coverage} &> {log}"

SnakeMake From line 131 of rules/analysis.smk

shell:
    "cat {input} > {output} 2> {log}"

SnakeMake From line 148 of rules/analysis.smk

shell:
    "{conda_dir}/python {scripts_dir}/coverage_by_window.py -i {input.depth} \
    -o {output} -w {params.window} -c {params.coverage} &> {log}"

SnakeMake From line 168 of rules/analysis.smk

shell:
    "cat {input} > {output} 2> {log}"

SnakeMake From line 185 of rules/analysis.smk

shell:
    "{conda_dir}/python {scripts_dir}/coverage_by_window.py -i {input.depth} \
    -o {output} -w {params.window} -c {params.coverage} &> {log}"

SnakeMake From line 205 of rules/analysis.smk

shell:
    "cat {input} > {output} 2> {log}"

SnakeMake From line 222 of rules/analysis.smk

shell:
    "{conda_dir}/Rscript {scripts_dir}/karyoploter_coverage.R {input.normal} {input.sample} \
    {output} {params.genome} {wildcards.normal} {wildcards.sample} &> {log}"

SnakeMake From line 240 of rules/analysis.smk

shell:
    "{conda_dir}/Rscript {scripts_dir}/karyoploter_coverage_small_chr.R {input.normal} {input.sample} \
    {output} {params.genome} {wildcards.normal} {wildcards.sample} &> {log}"

SnakeMake From line 259 of rules/analysis.smk

shell:
    "{conda_dir}/Rscript {scripts_dir}/karyoploter_coverage_large_chr.R {input.normal} {input.sample} \
    {output} {params.genome} {wildcards.normal} {wildcards.sample} &> {log}"

SnakeMake From line 278 of rules/analysis.smk

shell:
    "{conda_dir}/Rscript {scripts_dir}/karyoploter_coverage_chr.R {input.normal_cov} {input.sample_cov} \
    {input.normal_baf} {input.sample_baf} {wildcards.normal} {wildcards.sample} {params.genome} \
    {wildcards.chromosomes} {output} &> {log}"

SnakeMake From line 299 of rules/analysis.smk

shell:
    "{conda_dir}/python {scripts_dir}/filter_SVs.py -ins {input.INS} -del {input.DEL} -dup {input.DUP} \
    -inv {input.INV} -trans {input.trans} -depth {input.depth} -split {input.split_alignments} \
    -low_map {input.low_map} -gaps {genome_gaps} -ins_out {output.INS} -del_out \
    {output.DEL} -dup_out {output.DUP} -inv_out {output.INV} -trans_out {output.trans} -cnv_out \
    {output.CNVs} -cov {params.coverage} &> {log}"

SnakeMake From line 331 of rules/analysis.smk

shell:
    "{conda_dir}/NanoFilt --logfile {log} -q {params.quality} -l {params.min_length} \
    {input} | {conda_dir}/bgzip -c > {output} 2>> {log}"

SnakeMake nanofilt From line 17 of rules/mapping.smk

shell:
    "{conda_dir}/bgzip -r {input} 2> {log}"

SnakeMake From line 35 of rules/mapping.smk

shell:
    "{conda_dir}/winnowmap {params.preset_options} {params.include_MD_tag} \
    -t {threads} -W {high_frequency_kmers} {reference} {input.fastq} 2> {log} | \
    {conda_dir}/samtools sort -o {output} &>> {log}"

SnakeMake SAMtools From line 55 of rules/mapping.smk

shell:
    "{conda_dir}/samtools index {input} &> {log}"

SnakeMake SAMtools From line 74 of rules/mapping.smk

shell:
    "{conda_dir}/longshot {params.thresholds} -r {wildcards.regions} --bam {input.bam} \
    --ref {reference} --out {output} &> {log}"

SnakeMake Longshot From line 93 of rules/mapping.smk

shell:
    "{conda_dir}/bgzip --threads {threads} -c {input} > {output} 2> {log}"

SnakeMake From line 111 of rules/mapping.smk

shell:
    "{conda_dir}/tabix -p {params.file_type} {input} &> {log}"

SnakeMake tabix From line 129 of rules/mapping.smk

shell:
    "{conda_dir}/bcftools concat {params.concat} {input.zip} 2> {log} | \
    {conda_dir}/bcftools sort {params.sort} -o {output} &>> {log}"

SnakeMake BCFtools From line 149 of rules/mapping.smk

shell:
    "{conda_dir}/tabix -p {params.file_type} {input} &> {log}"

SnakeMake tabix From line 168 of rules/mapping.smk

shell:
    "{conda_dir}/whatshap haplotag {params.RG} {params.noSNVs} --regions {wildcards.regions} \
    --reference {reference} {input.vcf} {input.bam} 2> {log} | {conda_dir}/samtools sort -o {output} &>> {log}"

SnakeMake SAMtools WhatsHap From line 190 of rules/mapping.smk

shell:
    "{conda_dir}/samtools merge {output} {input} &> {log}"

SnakeMake SAMtools From line 208 of rules/mapping.smk

shell:
    "{conda_dir}/samtools index {input} &> {log}"

SnakeMake SAMtools From line 225 of rules/mapping.smk

shell:
    "zcat {input.zip} 2> {log} | {conda_dir}/bcftools view -f {params.view_param} \
    2>> {log} | {conda_dir}/bcftools query -f {params.query_param} > {output} 2>> {log}"

SnakeMake BCFtools From line 245 of rules/mapping.smk

shell:
    "{conda_dir}/python {scripts_dir}/calculate_b_allele_frequency.py -i {input} -o {output} 2> {log}"

SnakeMake From line 263 of rules/mapping.smk

shell:
    "{conda_dir}/samtools view -h {input.bam} 2> {log} | \
    awk '$5<{params.quality} {{print $0}}' > {output} 2>> {log}"

SnakeMake SAMtools From line 282 of rules/mapping.smk

shell:
    "{conda_dir}/samtools view -S -b -h {input} 2> {log} | \
    {conda_dir}/samtools depth - > {output} 2>> {log}"

SnakeMake SAMtools From line 300 of rules/mapping.smk

shell:
    "{conda_dir}/SURVIVOR bincov {input} {params.distance} {params.min_coverage} 2> {log} | \
    {conda_dir}/bedtools slop -i stdin -g {chromosome_sizes} -b {params.slop} 2>> {log} | \
    {conda_dir}/bedtools merge -i stdin -d {params.merge_length} > {output} 2>> {log}"

SnakeMake BEDTools SURVIVOR From line 322 of rules/mapping.smk

shell:
    "{conda_dir}/cuteSV --diff_ratio_merging_DEL {params.diff_ratio_merging_DEL} \
    --min_support {params.min_support} --max_split_parts {params.max_splits} \
    --threads {threads} --max_cluster_bias_DEL {params.max_cluster_bias_DEL} {input.bam} \
    {reference} {output} {params.working_dir} &> {log}"

SnakeMake cuteSV From line 21 of rules/structural_variants.smk

shell:
    "{conda_dir}/python {scripts_dir}/vcf2bedpe.py -i {input} -o {output} &> {log}"

SnakeMake From line 41 of rules/structural_variants.smk

shell:
    "{conda_dir}/sniffles --input {input.bam} --snf {output.snf} --vcf {output.vcf} --non-germline \
     --minsupport {params.min_reads} --max-splits-base {params.max_splits} -t {threads} &> {log}"

SnakeMake sniffles From line 62 of rules/structural_variants.smk

shell:
    "{conda_dir}/python {scripts_dir}/vcf2bedpe.py -i {input} -o {output} &> {log}"

SnakeMake From line 80 of rules/structural_variants.smk

shell:
    "{conda_dir}/svim alignment {params.working_dir} {input.bam} {reference} &> {log}"

SnakeMake svim From line 99 of rules/structural_variants.smk

shell:
    "{conda_dir}/python {scripts_dir}/vcf2bedpe.py -i {input} -o {output} &> {log}"

SnakeMake From line 116 of rules/structural_variants.smk

shell:
    "awk {params.awk} {input} 2> {log} | awk {params.fix_columns} 2>> {log} | \
    awk {params.remove_duplicates} > {output} 2>> {log}"

SnakeMake From line 136 of rules/structural_variants.smk

shell:
    "awk {params.awk_INS} {input.cuteSV} {input.sniffles} {input.svim} 2> {log} | \
    sort {params.sort} 2>> {log} | {conda_dir}/bedtools merge -i stdin -d {params.merge_length} \
    {params.columns} {params.output_type} > {output} 2>> {log}"

SnakeMake BEDTools From line 165 of rules/structural_variants.smk

shell:
    "awk {params.awk_DEL} {input.cuteSV} {input.sniffles} {input.svim} 2> {log} | \
    sort {params.sort} 2>> {log} | {conda_dir}/bedtools merge -i stdin -d {params.merge_length} \
    {params.columns} {params.output_type} > {output} 2>> {log}"

SnakeMake BEDTools From line 191 of rules/structural_variants.smk

shell:
    "grep {params.grep_DUP} {input.cuteSV} {input.sniffles} {input.svim} 2> {log} | \
    cut {params.select_columns} 2>> {log} | sort {params.sort} 2>> {log} | \
    {conda_dir}/bedtools merge -i stdin -d {params.merge_length} {params.columns} \
    {params.output_type} > {output} 2>> {log}"

SnakeMake BEDTools From line 218 of rules/structural_variants.smk

shell:
    "grep {params.grep_INV} {input.cuteSV} {input.sniffles} {input.svim} 2> {log} | \
    cut {params.select_columns} 2>> {log} | sort {params.sort} 2>> {log} | \
    {conda_dir}/bedtools merge -i stdin -d {params.merge_length} {params.columns} \
    {params.output_type} > {output} 2>> {log}"

SnakeMake BEDTools From line 246 of rules/structural_variants.smk

shell:
    "awk {params.awk_BND_TRA} {input.cuteSV} {input.sniffles} {input.svim} 2> {log} | \
    awk {params.fix_columns} 2>> {log} | cut {params.select_columns} 2>> {log} | \
    sort {params.sort} > {output} 2>> {log}"

SnakeMake From line 272 of rules/structural_variants.smk

shell:
    "{conda_dir}/python {scripts_dir}/merge_split_alignments.py \
    --read-to-reference-bam {input.bam} --output-bedpe {output} &> {log}"

SnakeMake From line 292 of rules/structural_variants.smk

shell:
    "sort {params.sort} {input} 2> {log} | {conda_dir}/bedtools merge -i stdin \
    -d {params.merge_length} -c {params.columns} -o {params.output_type} > {output} 2>> {log}"