Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Description
This snakemake pipeline is for Nanopore long-read DNA sequencing data analysis. The pipeline may be run on an HPC or in a local environment.
Major steps in this workflow include:
-
Modified basing calling using Megalodon
-
Structural variant calling using Nanopore pipeline-structural-variation
-
Haplotype-aware variant calling using PEPPER-Margin-DeepVariant
Software Requirements
User's guide
I. Input requirements
-
Edited config/config.yaml
-
Demultiplexed Nanopore fast5 data
-
Reference genome file
II. Editing the config.yaml
Basic parameters:
-
raw: Path to the directory where demultiplexed fast5 data is stored. The name of sub-directory for each sample should be in this format: {sampleID}_fast5
-
out: Output directory, default without input: "output" in working directory
-
reference: Path to primary reference genome fasta file
-
pepper: Path to the directory where PEPPER-Margin-DeepVariant singularity is downloaded
-
bind: Singularity bind mounting
-
snpeff: Path to SNPEFF installed location
Parameters for modified base calling:
-
outputs: Desired outputs
-
base_conv: modified base conversion
-
motif: Restrict modified base results to the specified motifs
-
thresh: Hard threshold for modified base aggregation (probability of modified/canonical base), default 0.75
-
config: Guppy model config file
Parameters for SV calling:
-
min_sv_length: Minimum SV length
-
max_sv_length: Maximum SV length
-
min_read_length: Minimum read length
-
min_read_mapping_quality: Minimum mapping quality
-
min_read_support: Minimum read support required to call a SV (auto for auto-detect)
-
snpsift: Run snpsift filtering or not
-
snpsift_filter: SnpSift filtering parameters if the input of the above parameter is "yes"
-
target_bed: (Optional) BED file containing targeted regions where SVs will only be called
Parameters for Haplotype-aware variant calling:
-
snpsift2: Run snpsift filtering or not
-
snpsift_filter2: SnpSift filtering parameters if the input of the above parameter is "yes"
Optional parameters to run the full analysis on HPV genomes:
(This function was originally designed for the HPV project, however, it can be applied to any secondary genome analysis of any species with references.)
-
hpv: Run the analysis on HPV genomes or not
-
hpv_ref: Path to HPV reference genome fasta file
-
snpeff_hpv: Name of the pre-build HPV snpEff annotation
-
snpeff_config: Path to the HPV snpEff config file
III. To run
-
Clone the repository to your working directory
git clone https://github.com/NCI-CGR/Nanopore_DNA-seq.git
-
Install required software
-
Edit and save config/config.yaml
-
To run on an HPC using slurm job scheduler like NIH Biowulf (Pascal GPU or higher architecture required):
Edit config/cluster_config.yaml according to your HPC information and adjust the computing resources as needed. Run sbatch.sh to initiate pipeline running.
bash sbatch.sh
-
To run in a local environment:
snakemake -p --cores 16 --keep-going --rerun-incomplete --jobs 300 --latency-wait 120 all
-
Look in log directory for logs for each rule
-
To view the snakemkae rule graph:
snakemake --rulegraph | dot -T png > nanopore.png
IV Output directory structure
{output directory}
├── mbc # Modified base calling on primary (eg human) genome
│ ├── basic # using the default model
│ │ └── {sampleID}
│ └── model_based # using a specific model
│ └── {sampleID}
├── mbc_hpv # Modified base calling on secondary (eg HPV) genome
│ ├── basic # using the default model
│ │ └── {sampleID}
│ └── model_based # using a specific model
│ └── {sampleID}
├── sv # SV calling on primary genome
│ └── {sampleID}
├── sv_hpv # SV calling on secondary genome
│ └── {sampleID}
├── vc # Haplotype-aware variant calling on primary genome
│ └── {smapleID}
└── vc_hpv # Haplotype-aware variant calling on secondary genome
└── {smapleID}
Code Snippets
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | shell: """ megalodon {input[0]} \ --guppy-server-path ${{MEGALODON_GUPPY_PATH}} \ --outputs {params.outputs} \ --output-directory {out}mbc/basic/{wildcards.sample} \ --overwrite \ --reference {input[1]} \ --mod-map-emulate-bisulfite \ --mod-map-base-conv {params.base_conv} \ --mod-map-base-conv Z C \ --mod-motif m CG 0 \ --mod-binary-threshold {params.thresh} \ --devices ${{CUDA_VISIBLE_DEVICES}} \ --processes ${{SLURM_CPUS_PER_TASK}} 2>log/{wildcards.sample}_basic.err touch {output} """ |
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | shell: """ megalodon {input[0]} \ --guppy-server-path ${{MEGALODON_GUPPY_PATH}} \ --guppy-config {params.config} \ --outputs {params.outputs} \ --overwrite \ --output-directory {out}mbc/model_based/{wildcards.sample} \ --reference {input[1]} \ --mod-map-emulate-bisulfite \ --mod-map-base-conv {params.base_conv} \ --mod-map-base-conv Z C \ --mod-motif m CG 0 \ --mod-binary-threshold {params.thresh} \ --devices ${{CUDA_VISIBLE_DEVICES}} \ --processes ${{SLURM_CPUS_PER_TASK}} 2>log/{wildcards.sample}_model.err touch {output} """ |
72 73 74 75 76 77 78 | shell: """ samtools sort {out}mbc/basic/{wildcards.sample}/mappings.bam > {out}mbc/basic/{wildcards.sample}/mappings_sorted.bam 2>log/{wildcards.sample}_basic_bam_sort1.err samtools index {out}mbc/basic/{wildcards.sample}/mappings_sorted.bam 2>log/{wildcards.sample}_basic_bam_index1.err samtools sort {out}mbc/basic/{wildcards.sample}/mod_mappings.5mC.bam > {out}mbc/basic/{wildcards.sample}/mod_mappings.5mC_sorted.bam 2>log/{wildcards.sample}_basic_bam_sort3.err samtools index {out}mbc/basic/{wildcards.sample}/mod_mappings.5mC_sorted.bam 2>log/{wildcards.sample}_basic_bam_index3.err """ |
88 89 90 91 92 93 94 | shell: """ samtools sort {out}mbc/model_based/{wildcards.sample}/mappings.bam > {out}mbc/model_based/{wildcards.sample}/mappings_sorted.bam 2>log/{wildcards.sample}_model_based_bam_sort1.err samtools index {out}mbc/model_based/{wildcards.sample}/mappings_sorted.bam 2>log/{wildcards.sample}_model_based_bam_index1.err samtools sort {out}mbc/model_based/{wildcards.sample}/mod_mappings.5mC.bam > {out}mbc/model_based/{wildcards.sample}/mod_mappings.5mC_sorted.bam 2>log/{wildcards.sample}_model_based_bam_sort3.err samtools index {out}mbc/model_based/{wildcards.sample}/mod_mappings.5mC_sorted.bam 2>log/{wildcards.sample}_model_based_bam_index3.err """ |
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 | shell: """ megalodon {input[0]} \ --guppy-server-path ${{MEGALODON_GUPPY_PATH}} \ --outputs {params.outputs} \ --output-directory {out}mbc_hpv/basic/{wildcards.sample} \ --overwrite \ --reference {input[1]} \ --mod-map-emulate-bisulfite \ --mod-map-base-conv {params.base_conv} \ --mod-map-base-conv Z C \ --mod-motif m CG 0 \ --mod-binary-threshold {params.thresh} \ --devices ${{CUDA_VISIBLE_DEVICES}} \ --processes ${{SLURM_CPUS_PER_TASK}} 2>log/{wildcards.sample}_basic_hpv.err touch {output} """ |
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 | shell: """ megalodon {input[0]} \ --guppy-server-path ${{MEGALODON_GUPPY_PATH}} \ --guppy-config {params.config} \ --outputs {params.outputs} \ --overwrite \ --output-directory {out}mbc_hpv/model_based/{wildcards.sample} \ --reference {input[1]} \ --mod-map-emulate-bisulfite \ --mod-map-base-conv {params.base_conv} \ --mod-map-base-conv Z C \ --mod-motif m CG 0 \ --mod-binary-threshold {params.thresh} \ --devices ${{CUDA_VISIBLE_DEVICES}} \ --processes ${{SLURM_CPUS_PER_TASK}} 2>log/{wildcards.sample}_model_hpv.err touch {output} """ |
72 73 74 75 76 77 78 | shell: """ samtools sort {out}mbc_hpv/basic/{wildcards.sample}/mappings.bam > {out}mbc_hpv/basic/{wildcards.sample}/mappings_sorted.bam 2>log/{wildcards.sample}_basic_bam_sort1_hpv.err samtools index {out}mbc_hpv/basic/{wildcards.sample}/mappings_sorted.bam 2>log/{wildcards.sample}_basic_bam_index1_hpv.err samtools sort {out}mbc_hpv/basic/{wildcards.sample}/mod_mappings.5mC.bam > {out}mbc_hpv/basic/{wildcards.sample}/mod_mappings.5mC_sorted.bam 2>log/{wildcards.sample}_basic_bam_sort3_hpv.err samtools index {out}mbc_hpv/basic/{wildcards.sample}/mod_mappings.5mC_sorted.bam 2>log/{wildcards.sample}_basic_bam_index3_hpv.err """ |
88 89 90 91 92 93 94 | shell: """ samtools sort {out}mbc_hpv/model_based/{wildcards.sample}/mappings.bam > {out}mbc_hpv/model_based/{wildcards.sample}/mappings_sorted.bam 2>log/{wildcards.sample}_model_based_bam_sort1_hpv.err samtools index {out}mbc_hpv/model_based/{wildcards.sample}/mappings_sorted.bam 2>log/{wildcards.sample}_model_based_bam_index1_hpv.err samtools sort {out}mbc_hpv/model_based/{wildcards.sample}/mod_mappings.5mC.bam > {out}mbc_hpv/model_based/{wildcards.sample}/mod_mappings.5mC_sorted.bam 2>log/{wildcards.sample}_model_based_bam_sort3_hpv.err samtools index {out}mbc_hpv/model_based/{wildcards.sample}/mod_mappings.5mC_sorted.bam 2>log/{wildcards.sample}_model_based_bam_index3_hpv.err """ |
24 25 26 27 | shell: """ lra index -ONT {input} 2>log/index_lra.err """ |
40 41 42 43 | shell: """ catfishq -r {out}mbc/basic/{wildcards.sample}/ | seqtk seq -A - | lra align -ONT -t 16 {input.REF} - -p s | samtools addreplacerg -r \"@RG\tID:{wildcards.sample}\tSM:{wildcards.sample}\" - | samtools sort -@ 16 -T {wildcards.sample} -O BAM -o {output.BAM} - && samtools index -@ 16 {output.BAM} 2>log/{wildcards.sample}_lra_map.err """ |
60 61 62 63 64 65 66 67 68 69 70 71 | shell: """ cuteSV -t 16 \ --min_size {params.min_size} \ --max_size {params.max_size} \ -S {wildcards.sample} \ --retain_work_dir \ --report_readid \ --min_support {params.min_read_support} \ --genotype {input.BAM} {input.REF} {output.VCF} \ {out}sv/{wildcards.sample}/sv_calls/ 2>log/{wildcards.sample}_sv_call.err """ |
82 83 84 85 86 | shell: """ mkdir -p {output.DIR} mosdepth -x -t 16 -n -b {params.BED} {output.DIR}/{wildcards.sample} {input.BAM} 2>log/{wildcards.sample}_depth.err """ |
111 112 113 114 | shell: """ vcfsort {input.VCF} > {output.VCF} 2>log/{wildcards.sample}_sort_vcf.err """ |
123 124 125 126 | shell: """ cat {input.VCF} | bgziptabix {output.VCF} 2>log/{wildcards.sample}_index_vcf.err """ |
135 136 137 138 139 140 141 142 143 144 | shell: """ NanoPlot -t 16 \ --bam {input.BAM} \ --raw -o {output.DIR} \ -p {wildcards.sample}_ \ --N50 \ --title {wildcards.sample} \ --downsample 100000 2>log/{wildcards.sample}_qc.err """ |
152 153 154 155 156 | shell: """ cd {out}sv/{wildcards.sample}/sv_calls/ java -Xmx16g -jar $SNPEFF_JAR -v hg38 {input.VCF} > {output.VCF} 2>{run}log/{wildcards.sample}_snpeff.err """ |
166 167 168 169 170 | run: if config["snpsift"] == "yes": shell("""cat {input.VCF} | java -jar $SNPSIFT_JAR filter "{params.filter}" > {output.VCF} 2>log/{wildcards.sample}_snpsift.err""") else: shell("""touch {output.VCF}""") |
28 29 30 31 | shell: """ lra index -ONT {input} 2>log/index_lra_hpv.err """ |
44 45 46 47 | shell: """ catfishq -r {out}mbc_hpv/basic/{wildcards.sample}/ | seqtk seq -A - | lra align -ONT -t 16 {input.REF} - -p s | samtools addreplacerg -r \"@RG\tID:{wildcards.sample}\tSM:{wildcards.sample}\" - | samtools sort -@ 16 -T {wildcards.sample} -O BAM -o {output.BAM} - && samtools index -@ 16 {output.BAM} 2>log/{wildcards.sample}_lra_map_hpv.err """ |
64 65 66 67 68 69 70 71 72 73 74 75 | shell: """ cuteSV -t 16 \ --min_size {params.min_size} \ --max_size {params.max_size} \ -S {wildcards.sample} \ --retain_work_dir \ --report_readid \ --min_support {params.min_read_support} \ --genotype {input.BAM} {input.REF} {output.VCF} \ {out}sv_hpv/{wildcards.sample}/sv_calls/ 2>log/{wildcards.sample}_sv_call_hpv.err """ |
86 87 88 89 90 | shell: """ mkdir -p {output.DIR} mosdepth -x -t 16 -n -b {params.BED} {output.DIR}/{wildcards.sample} {input.BAM} 2>log/{wildcards.sample}_depth_hpv.err """ |
115 116 117 118 | shell: """ vcfsort {input.VCF} > {output.VCF} 2>log/{wildcards.sample}_sort_vcf_hpv.err """ |
127 128 129 130 | shell: """ cat {input.VCF} | bgziptabix {output.VCF} 2>log/{wildcards.sample}_index_vcf_hpv.err """ |
138 139 140 141 142 143 144 145 146 147 | shell: """ NanoPlot -t 16 \ --bam {input.BAM} \ --raw -o {output.DIR} \ -p {wildcards.sample}_ \ --N50 \ --title {wildcards.sample} \ --downsample 100000 2>log/{wildcards.sample}_qc_hpv.err """ |
155 156 157 158 159 | shell: """ cd {out}sv_hpv/{wildcards.sample}/sv_calls/ java -Xmx16g -jar {snpeff} -c {snpeff_config} -v {snpeff_hpv} {input.VCF} > {output.VCF} 2>{run}log/{wildcards.sample}_snpeff_hpv.err """ |
12 13 14 15 | shell: """ minimap2 -ax map-ont {input[1]} {out}mbc/basic/{wildcards.sample}/basecalls.fastq > {output} 2> log/{wildcards.sample}_minimap.err """ |
24 25 26 27 28 | shell: """ samtools view -b {input} | samtools sort > {output[0]} 2>log/{wildcards.sample}_minimap_sort.err samtools index {output} 2>log/{wildcards.sample}_minimap_index.err """ |
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | shell: """ {bind} singularity exec \ {pepper}/pepper_deepvariant_r0.4.sif \ run_pepper_margin_deepvariant call_variant \ -b {input[0]} \ -f {input[1]} \ -p {wildcards.sample}_PEPPER_Margin_DeepVariant \ -o {out}vc/{wildcards.sample} \ -t 30 \ --ont \ --phased_output 2>log/{wildcards.sample}_pmd.err touch {output[1]} """ |
64 65 66 67 68 | shell: """ cd {out}vc/{wildcards.sample}/ java -Xmx16g -jar $SNPEFF_JAR -v hg38 {input} > {output} 2>{run}log/{wildcards.sample}_snpeff2.err """ |
78 79 80 81 82 | run: if config["snpsift2"] == "yes": shell("""cat {input} | java -jar $SNPSIFT_JAR filter "{params.filter}" > {output} 2>log/{wildcards.sample}_snpsift2.err""") else: shell("""touch {output}""") |
90 91 92 93 94 | shell: """ cd {out}vc/{wildcards.sample}/filter/ java -Xmx16g -jar $SNPEFF_JAR -v hg38 {input} > {output} 2>{run}log/{wildcards.sample}_snpeff3.err """ |
16 17 18 19 | shell: """ minimap2 -ax map-ont {input[1]} {out}mbc_hpv/basic/{wildcards.sample}/basecalls.fastq > {output} 2> log/{wildcards.sample}_minimap_hpv.err """ |
28 29 30 31 32 | shell: """ samtools view -b {input} | samtools sort > {output[0]} 2>log/{wildcards.sample}_minimap_sort_hpv.err samtools index {output} 2>log/{wildcards.sample}_minimap_index_hpv.err """ |
41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | shell: """ {bind} singularity exec \ {pepper}/pepper_deepvariant_r0.4.sif \ run_pepper_margin_deepvariant call_variant \ -b {input[0]} \ -f {input[1]} \ -p {wildcards.sample}_PEPPER_Margin_DeepVariant \ -o {out}vc_hpv/{wildcards.sample} \ -t 16 \ --ont \ --phased_output 2>log/{wildcards.sample}_pmd_hpv.err touch {output} """ |
65 66 67 68 69 70 | shell: """ cd {out}vc_hpv/{wildcards.sample}/ java -Xmx16g -jar {snpeff} -c {snpeff_config} -v {snpeff_hpv} {params.vcf} > {output} 2>{run}log/{wildcards.sample}_snpeff2_hpv.err """ |
Support
- Future updates