snakemake snp calling pipeline based on freebayes

public public 1yr ago 0 bookmarks

requires

sudo apt-get install vcftools

clone workflow into working directory

git clone https://github.com/dwheelerau/snakemake-snp
cd snakemake-snp

edit config and workflow as needed

vim config.yaml

install dependencies into isolated environment

conda env create -n snp --file environment.yml

activate environment

source activate snp

execute workflow

snakemake all --cores 24

Code Snippets

 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
import sys

# simple script to print QC results

infile = sys.argv[1]

flagin = 0
flagout = 0
with open(infile) as f:
    r1 = []
    r2 = []
    for line in f:
        bits = line.split(' ')
        for bit in bits:
            if bit.find("in1=") == 0 and len(r1) == 0:
                r1.append(bit)
                print(bit)
            elif bit.find("in2=") == 0 and len(r2) == 0:
                r2.append(bit)
                print(bit)
            elif bit.find("Input:") == 0:
                print(line)
                flagin = 1
                data = []
            elif bit.find('Result:') == 0 and flagin == 1:
                flagin = 0
                r1 = []
                r2 = []
                print("".join(data) + "\n------\n")
                data = []
            elif flagin == 1:
                data.append(bit)
            else:
                pass
38
39
40
41
42
43
44
45
shell:
    """
    rm -f genome/genome.*
    rm -f logs/*
    rm -f bams/*
    rm -f clean_reads/*gz
    rm -f vcf/*
    """
49
50
51
52
53
54
    shell:
        "mkdir -p "+' '.join(DIRS)

rule get_genome:
    output:
        "genome/genome.fa"
59
60
61
62
63
shell:
    """
    wget {params.genome} -O {output}.gz -o {log}
    gunzip {output}.gz
    """
72
73
shell:
    "bowtie2-build {input} genome/genome > {log}"
93
94
95
96
97
shell:
    "bbduk.sh in1={input.r1} in2={input.r2} out1={output.r1_out} out2={output.r2_out} "
    "minlen={params.minlen} qtrim={params.qtrim} trimq={params.trimq} "
    "ktrim={params.ktrim} k={params.kwin} mink={params.mink} "
    "ref={params.adapt} hdist={params.hdist} 2>&1 | tee -a {log}"
121
122
123
124
125
shell:
    "bowtie2 --rg 'ID:{wildcards.sample}\tSM:{wildcards.sample}' "
    "-p {params.thr} -x genome/genome --met-file {log} {params.mode} "
    "{params.mates} {params.sensitivity} "
    "-1 {input.r1} -2 {input.r2} | samtools view -bS - > {output}"
132
133
shell:
    "samtools sort -o {output} {input}"
140
141
142
143
144
145
146
shell:
    """
    samtools flagstat {input} > {output.aln}
    echo alignment report for sample {output.aln}
    grep 'mapped' {output.aln}
    python scripts/report_qc.py logs/read_qc.log | tee logs/qc_report.txt
    """
156
157
158
shell:
    "picard MarkDuplicates I={input} "
    "O={output.bam} M={output.info} VALIDATION_STRINGENCY=LENIENT 2> {log}"
167
168
shell:
    "freebayes -f genome/genome.fa -b {input} -p {params.ploidy} > {output}"
175
176
shell:
    "bgzip {input}"
SnakeMake From line 175 of master/Snakefile
183
184
shell:
    "tabix -p vcf {input}"
191
192
shell:
    "vcf-merge vcf/*gz > {output}"
SnakeMake From line 191 of master/Snakefile
ShowHide 8 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/dwheelerau/snakemake-snp
Name: snakemake-snp
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...