singularity & snakemake workflow to detect mutations with several tools to detect mutations

public public 1yr ago Version: v0.1.0 0 bookmarks

detect Mutations

Sylvain Schmitt April 20, 2021

singularity & snakemake workflow to detect mutations with several alignment and mutation detection tools.

Installation

  • [x] Python ≥3.5

  • [x] Snakemake ≥5.24.1

  • [x] Golang ≥1.15.2

  • [x] Singularity ≥3.7.3

  • [x] This workflow

# Python
sudo apt-get install python3.5
# Snakemake
sudo apt install snakemake`
# Golang
export VERSION=1.15.8 OS=linux ARCH=amd64 # change this as you need
wget -O /tmp/go${VERSION}.${OS}-${ARCH}.tar.gz https://dl.google.com/go/go${VERSION}.${OS}-${ARCH}.tar.gz && \
sudo tar -C /usr/local -xzf /tmp/go${VERSION}.${OS}-${ARCH}.tar.gz
echo 'export GOPATH=${HOME}/go' >> ~/.bashrc && \
echo 'export PATH=/usr/local/go/bin:${PATH}:${GOPATH}/bin' >> ~/.bashrc && \
source ~/.bashrc
# Singularity
mkdir -p ${GOPATH}/src/github.com/sylabs && \
 cd ${GOPATH}/src/github.com/sylabs && \
 git clone https://github.com/sylabs/singularity.git && \
 cd singularity
git checkout v3.7.3
cd ${GOPATH}/src/github.com/sylabs/singularity && \
 ./mconfig && \
 cd ./builddir && \
 make && \
 sudo make install
# detect Mutations
git clone [email protected]:sylvainschmitt/detectMutations.git
cd detectMutations

Usage

Get data

Generate data using the generate Mutations workflow.

git clone [email protected]:sylvainschmitt/generateMutations.git
cd ../generateMutations
snakemake --use-singularity --cores 4
cd ../detectMutations
bash scripts/get_data.sh

Locally

snakemake -np # dry run
snakemake --dag | dot -Tsvg > dag/dag.svg # dag
snakemake --use-singularity --cores 4 # run
snakemake --use-singularity --cores 1 --verbose # debug
snakemake --report report.html # report

HPC

module purge ; module load bioinfo/snakemake-5.25.0 # for test on node
snakemake -np # dry run
sbatch job.sh ; watch 'squeue -u sschmitt' # run
less detMut.*.err # snakemake outputs, use MAJ+F
less detMut.*.out # snakemake outputs, use MAJ+F
snakemake --dag | dot -Tsvg > dag/dag.svg # dag
module purge ; module load bioinfo/snakemake-5.8.1 ; module load system/Python-3.6.3 # for report
snakemake --report report.html # report
module purge ; module load system/R-3.6.2 ; R # to build results

Workflow

Reference

Index reference and SNPs for software to work with.

bwa_index

  • Tools: BWA index

  • Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/bwa/bwa:latest

samtools_faidx

  • Tools: samtools faidx

  • Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/samtools/samtools:latest

gatk_dict

gatk_idx

Reads

CReport quality and trim.

fastqc

  • Tools: fastQC

  • Singularity: docker://biocontainers/fastqc:v0.11.9_cv8

trimmomatic

  • Tools: Trimmomatic

  • Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/trimmomatic/trimmomatic:latest

Alignments

Align reads against reference, mark duplicated, and report alignment quality.

bwa_mem

  • Tools: BWA mem

  • Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/bwa/bwa:latest

samtools_sort

  • Tools: Samtools sort

  • Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/samtools/samtools:latest

samtools_index

  • Tools: Samtools index

  • Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/samtools/samtools:latest

gatk_markduplicates

samtools_index_md

  • Tools: Samtools index

  • Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/samtools/samtools:latest

samtools_mpileup

  • Tools: Samtools mpileup

  • Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/samtools/samtools:latest

samtools_stats

  • Tools: Samtools stats

  • Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/samtools/samtools:latest

qualimap

  • Tools: QualiMap

  • Singularity: docker://pegi3s/qualimap:2.2.1

Detection

Detect mutations.

gatk_mutect2

  • Tools: gatk Mutect2

  • Singularity: docker://broadinstitute/gatk:4.2.6.1

freebayes

  • Tools: freebayes

  • Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/freebayes/freebayes:latest

gatk_haplotypecaller

gatk_genotypegvcfs

strelka2

  • Tools: Strelka2

  • Singularity: docker://quay.io/wtsicgp/strelka2-manta

varscan

  • Tools: VarScan

  • Singularity: docker://alexcoppe/varscan

varscan2vcf

somaticsniper

muse

  • Tools: MuSe

  • Singularity: docker://opengenomics/muse:v0.1.1

Mutations

cp_vcfs

  • Tools: cp

bedtools_substract

  • Tools: bedtools substract

  • Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/bedtools/bedtools:latest

Quality check

Combined quality information from QualiMap , Picard , Samtools , Trimmomatic , and FastQC (see previous steps) and assess calls performance.

multiqc

  • Tools: MultiQC

  • Singularity: oras://registry.forgemia.inra.fr/gafl/singularity/multiqc/multiqc:latest

evaluate_call

Results

module load system/singularity-3.7.3
singularity pull https://github.com/sylvainschmitt/singularity-r-bioinfo/releases/download/0.0.3/sylvainschmitt-singularity-r-bioinfo.latest.sif
singularity shell sylvainschmitt-singularity-r-bioinfo.latest.sif
library(tidyverse)
lapply(list.files("results/stats", full=T), read_tsv) %>% 
 bind_rows() %>% 
 write_tsv("stats.tsv")
quit()
exit

Code Snippets

13
14
shell:
    "bedtools subtract -header -a {input[0]} -b {input[1]} > {output}"
13
14
shell:
    "bwa index {input}"
21
22
23
shell:
    "bwa mem -M -R '{params.rg_mutated}' -t {threads} {input[0]} {input[1]} {input[2]} > {output[0]} ; "
    "bwa mem -M -R '{params.rg_base}' -t {threads} {input[0]} {input[3]} {input[4]} > {output[1]} ; "
12
13
shell:
    "cp {input} {output}"
13
14
script:
    "../scripts/evaluate_call.R"
16
17
shell:
    "fastqc -t {threads} -q {input} --outdir=results/{wildcards.library}/"
16
17
18
19
shell:
    "freebayes -f {input[0]} --pooled-continuous --pooled-discrete --genotype-qualities --report-genotype-likelihood-max"
    " --allele-balance-priors-off --min-alternate-fraction 0.03 --min-repeat-entropy 1 --min-alternate-count 2"
    " {input[2]} {input[1]} > {output}"
12
13
shell:
    "gatk CreateSequenceDictionary R={input} O={output}"
15
16
shell:
    "gatk GenotypeGVCFs -R {input[0]} -V {input[1]} -O {output}"
16
17
shell:
    "gatk HaplotypeCaller -R {input[0]} -I {input[1]} -O {output} -ERC GVCF"
12
13
shell:
    "gatk IndexFeatureFile -I {input}"
16
17
18
shell:
    "gatk MarkDuplicates --java-options \"-Xmx16G -Xms1G -Djava.io.tmpdir=tmp\" I={input[0]} O={output[0]} M={output[2]} ; "
    "gatk MarkDuplicates --java-options \"-Xmx16G -Xms1G -Djava.io.tmpdir=tmp\" I={input[1]} O={output[1]} M={output[3]}"
18
19
20
shell:
    "gatk Mutect2 -R {input[0]} -I {input[1]} -tumor {wildcards.lib}_REP{wildcards.REP}_mutated  -I {input[2]} -normal {wildcards.lib}_REP{wildcards.REP}_base "
    " --panel-of-normals {input[3]} -dont-use-soft-clipped-bases true -O {output}"
16
17
18
19
shell:
    "multiqc results/{wildcards.library}/ -o results/{wildcards.library} ; "
    "rm -r results/{wildcards.library}/qualimap_mutated ; "
    "rm -r results/{wildcards.library}/qualimap_base ; "
21
22
shell:
    "muse.py -f {input[0]} --tumor-bam {input[1]} --normal-bam {input[2]} -O {output} -n {threads} -w results/{wildcards.lib}_REP{wildcards.REP}/muse/"
SnakeMake From line 21 of rules/muse.smk
15
16
17
18
19
shell:
    "qualimap bamqc -bam {input[0]} --paint-chromosome-limits -nt {threads} -skip-duplicated --skip-dup-mode " 
    "0 -outdir results/{wildcards.library}/qualimap_mutated -outformat HTML ; "
    "qualimap bamqc -bam {input[1]} --paint-chromosome-limits -nt {threads} -skip-duplicated --skip-dup-mode " 
    "0 -outdir results/{wildcards.library}/qualimap_base -outformat HTML"
12
13
shell:
    "samtools faidx {input}"
12
13
shell:
    "samtools index {input[0]} ; samtools index {input[1]}"
12
13
shell:
    "samtools index {input[0]} ; samtools index {input[1]}"
16
17
18
shell:
    "samtools mpileup -f {input[0]} {input[1]} > {output[0]} ; "
    "samtools mpileup -f {input[0]} {input[2]} > {output[1]}"
12
13
14
shell:
    "samtools sort --threads {threads} {input[0]} > {output[0]} ; "
    "samtools sort --threads {threads} {input[1]} > {output[1]}"
12
13
14
shell:
    "samtools stats --threads {threads} {input[0]} > {output[0]} ; "
    "samtools stats --threads {threads} {input[1]} > {output[1]}"
16
17
shell:
    "/opt/somatic-sniper/build/bin/bam-somaticsniper -q 25 -Q 15 -s 0.0001 -F vcf -f {input[0]} {input[1]}  {input[2]} {output}"
19
20
21
22
23
24
25
26
27
shell:
    "configureStrelkaSomaticWorkflow.py "
    "--normalBam {input[2]} "
    "--tumorBam {input[1]} "
    "--referenceFasta {input[0]} "
    "--runDir results/{wildcards.lib}_{wildcards.REP}/strelka2/ ; "
    "results/{wildcards.lib}_{wildcards.REP}/strelka2/runWorkflow.py -m local -j {threads} ; "
    "zcat results/{wildcards.lib}_{wildcards.REP}/strelka2/results/variants/somatic.snvs.vcf.gz > {output} ; "
    "rm -r results/{wildcards.lib}_{wildcards.REP}/strelka2"
17
18
19
20
21
shell:
    "trimmomatic PE -threads {threads} {input[0]} {input[1]} {output[0]} {output[4]} {output[1]} {output[5]} "
    "ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:keepBothReads SLIDINGWINDOW:4:15 2> {output[8]} ; "
    "trimmomatic PE -threads {threads} {input[2]} {input[3]} {output[2]} {output[6]} {output[3]} {output[7]} "
    "ILLUMINACLIP:TruSeq3-PE.fa:2:30:10:2:keepBothReads SLIDINGWINDOW:4:15 2> {output[9]}"
15
16
script:
    "../scripts/varscan2vcf.R"
12
13
shell:
    "/.run somatic {input[1]} {input[0]} results/{wildcards.library}/varscan/{wildcards.library}"
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
callfile <- snakemake@input[[1]]
mutationsfile <- snakemake@input[[2]]
statsfile <-  snakemake@output[[1]] 

library(tidyverse)
library(vcfR)

mutations <- read_tsv(mutationsfile, col_types = cols(
  CHROM = col_character(),
  POS = col_double(),
  REF = col_character(),
  TYPE = col_character(),
  ALT = col_character()
)) %>% 
  mutate(True = 1) %>%
  filter(REF != "N")
call <- try(read.vcfR(callfile, verbose = F)@fix, silent = T)
if(!inherits(call, "try-error")){
  call <- as.data.frame(call) %>% 
    mutate_at(c("CHROM", "REF", "ALT"), as.character) %>% 
    mutate(POS = as.numeric(as.character(POS))) %>% 
    mutate(Called = 1) %>% 
    mutate(REF = substr(REF, 1, 1), ALT = substr(ALT, 1, 1))
} else {
  call <- data.frame(CHROM = character(), POS = double(), REF = character(), ALT = character(), Called = integer())
}
stats <- full_join(mutations, call, by = c("CHROM", "POS", "REF", "ALT")) %>% 
  mutate_at(c("True", "Called"), funs(ifelse(is.na(.), 0, .))) %>% 
  mutate(Confusion = recode(paste0(True, Called), "01" = "FP", "10" = "FN", "11" = "TP")) %>% 
  group_by(Confusion) %>% 
  summarise(N = n()) %>% 
  reshape2::dcast(mutationsfile + callfile ~ Confusion, value.var = "N")
if(!("FP" %in% names(stats)))
  stats$FP <- 0
if(!("FN" %in% names(stats)))
  stats$FN <- 0
if(!("TP" %in% names(stats)))
  stats$TP <- 0
stats <- mutate(stats, Precision = round(TP/(TP+FP), 2), Recall = round(TP/(TP+FN), 2)) %>% 
  mutate(callfile = callfile) %>% 
  mutate(mutationsfile = mutationsfile)
write_tsv(stats, statsfile)
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
varscanfile <- snakemake@input[[1]]
snpfile <- snakemake@input[[2]]
# output
vcffile <-  snakemake@output[[1]] 

# manual
# varscanfile <- "results/N100_R2_AF0.1_NR7000/varscan/N100_R2_AF0.1_NR7000.snp"
# snpfile <- file.path("results/reference/", yaml::read_yaml("config/config.dev.yml")$snps)
# vcffile <-  "results/N100_R2_AF0.1_NR7000/varscan/N100_R2_AF0.1_NR7000.vcf"

library(tidyverse)
library(Biostrings)

cat("##fileformat=VCFv4.1\n##fileDate=20210521\n##source=varscan\n#CHROM	POS	ID	REF	ALT\n",
    file = vcffile)

snps <-   read_tsv(snpfile, skip = 11) %>% 
  dplyr::rename(CHROM = `#CHROM`) %>% 
  select(CHROM, POS, ID, REF, ALT) %>% 
  mutate(ID = ".") %>% 
  as_tibble() %>% 
  mutate_all(as.character) %>% 
  mutate_at("POS", as.numeric)

read_tsv(varscanfile) %>% 
  mutate(ID = ".", CHROM	= chrom, POS = position, REF =ref,	ALT	= var) %>% 
  dplyr::select(CHROM,	POS,	ID,	REF,	ALT) %>% 
  anti_join(snps) %>% 
  write_tsv(file = vcffile, append = T, col_names = F)
ShowHide 19 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/sylvainschmitt/detectMutations
Name: detectmutations
Version: v0.1.0
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: GNU General Public License v3.0
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...