GBS Data Analysis Workflow: Variant Calling, Phasing, and Population Genetics

public 1yr ago 0 bookmarks

View Workflow

snakemake_gbs_phasing_workflow — View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

This workflow operates on a merged bam file of sample data, uses Freebayes to create a VCF file, performs filtering with bcftools/vcftools, performs phasing and imputation with beagle, then calculates popgen stats with Plink

Usage

Step 1: Install workflow

clone this workflow to your local computer

Step 2: Configure workflow

Configure the workflow according to your needs via editing the file config.yaml

Step 3: Execute workflow

Test your configuration by performing a dry-run via

snakemake --use-conda -n

Execute the workflow locally via

snakemake --use-conda --cores $N

using $N cores or run it in a cluster environment via

snakemake --use-conda --cluster qsub --jobs 100

Code Snippets

wrapper:
    "0.73.0/bio/sambamba/markdup"

SnakeMake From line 64 of main/Snakefile

wrapper:
    "0.73.0/bio/samtools/index"

SnakeMake From line 75 of main/Snakefile

wrapper:
    "0.73.0/bio/freebayes"

SnakeMake FreeBayes From line 93 of main/Snakefile

wrapper:
    "0.73.0/bio/freebayes"

SnakeMake FreeBayes From line 111 of main/Snakefile

shell:
    "bcftools stats {input} > {output}"

SnakeMake BCFtools From line 124 of main/Snakefile

shell:
    "bcftools stats {input} > {output}"

SnakeMake BCFtools From line 137 of main/Snakefile

shell:
    "vcftools --vcf {input} --max-missing 0.5 --mac 3 --minQ 20 --maf 0.03 --recode --recode-INFO-all --out {params} 2> {log}"

SnakeMake VCFtools From line 152 of main/Snakefile

shell:
    "vcftools --vcf {input} --max-missing 0.75 --mac 3 --minQ 20 --maf 0.05 --recode --recode-INFO-all --out {params} 2> {log}"

SnakeMake VCFtools From line 167 of main/Snakefile

shell:
    "vcftools --vcf {input} --max-missing 0.70 --mac 3 --minQ 20 --maf 0.05 --recode --recode-INFO-all --out {params} 2> {log}"

SnakeMake VCFtools From line 182 of main/Snakefile

shell:
    "vcftools --vcf {input} --max-missing 0.25 --mac 3 --minQ 20 --maf 0.05 --recode --recode-INFO-all --out {params} 2> {log}"

SnakeMake VCFtools From line 197 of main/Snakefile

shell:
    "vcftools --vcf {input} --max-missing 0.90 --mac 3 --minQ 20 --maf 0.05 --recode --recode-INFO-all --out {params} 2> {log}"

SnakeMake VCFtools From line 212 of main/Snakefile

shell:
    "vcftools --vcf {input} --max-missing 0.10 --mac 3 --minQ 20 --maf 0.05 --recode --recode-INFO-all --out {params} 2> {log}"

SnakeMake VCFtools From line 227 of main/Snakefile

shell:
    "vcftools --vcf {input} --max-missing 1 --mac 3 --minQ 20 --maf 0.05 --recode --recode-INFO-all --out {params} 2> {log}"

SnakeMake VCFtools From line 242 of main/Snakefile

shell:
    "vcftools --vcf {input} --missing-indv --stdout > {output} 2> {log}"

SnakeMake VCFtools From line 255 of main/Snakefile

shell:
    "vcftools --vcf {input} --missing-indv --stdout > {output} 2> {log}"

SnakeMake VCFtools From line 268 of main/Snakefile

shell:
    "vcftools --vcf {input} --missing-indv --stdout > {output} 2> {log}"

SnakeMake VCFtools From line 281 of main/Snakefile

shell:
    "mawk '$5 > 0.5' {input} | cut -f1 > {output}"

SnakeMake mawk From line 292 of main/Snakefile

shell:
    "mawk '$5 > 0.5' {input} | cut -f1 > {output}"

SnakeMake mawk From line 303 of main/Snakefile

shell:
    "mawk '$5 > 0.5' {input} | cut -f1 > {output}"

SnakeMake mawk From line 314 of main/Snakefile

shell:
    "vcftools --vcf {input.vcf} --remove {input.miss} --recode --recode-INFO-all --out {params} 2> {log}"

SnakeMake VCFtools From line 330 of main/Snakefile

shell:
    "vcftools --vcf {input} --remove-indels --recode --recode-INFO-all --out {params} 2> {log}"

SnakeMake VCFtools From line 345 of main/Snakefile

shell:
    "grep -v super {input} |  perl -pe 's/\s\.:/\t.\/.:/g'  > {output}"

SnakeMake From line 354 of main/Snakefile

shell:
    "grep -v super {input} |  perl -pe 's/\s\.:/\t.\/.:/g'  > {output}"

SnakeMake From line 363 of main/Snakefile

shell:
    "vcftools --vcf {input.vcf} --remove {input.miss} --recode --recode-INFO-all --out {params} 2> {log}"

SnakeMake VCFtools From line 379 of main/Snakefile

shell:
    "vcftools --vcf {input.vcf} --remove {input.miss} --recode --recode-INFO-all --out {params} 2> {log}"

SnakeMake VCFtools From line 395 of main/Snakefile

shell:
    "grep -v super {input} |  perl -pe 's/\s\.:/\t.\/.:/g'  > {output}"

SnakeMake From line 408 of main/Snakefile

shell:
    "java -jar beagle/beagle.28Jun21.220.jar gt={input} out={params} > {log} 2>&1"

SnakeMake From line 421 of main/Snakefile

shell:
    "grep -v super {input} |  perl -pe 's/\s\.:/\t.\/.:/g'  > {output}"

SnakeMake From line 430 of main/Snakefile

shell:
    "java -jar beagle/beagle.28Jun21.220.jar gt={input} out={params} > {log} 2>&1"

SnakeMake From line 443 of main/Snakefile

shell:
    "vcftools --gzvcf {input} --remove-indels --recode --recode-INFO-all --out {params} 2> {log}"

SnakeMake VCFtools From line 458 of main/Snakefile

shell:
    "java -jar beagle/beagle.28Jun21.220.jar gt={input} out={params} > {log} 2>&1"

SnakeMake From line 473 of main/Snakefile

shell:
    "vcftools --gzvcf {input} --missing-indv --stdout > {output} 2> {log}"

SnakeMake VCFtools From line 487 of main/Snakefile

shell:
    "vcftools --gzvcf {input} --missing-indv --stdout > {output} 2> {log}"

SnakeMake VCFtools From line 500 of main/Snakefile

shell:
    "cut -f1 {input} > {output}"

SnakeMake From line 511 of main/Snakefile

shell:
    "cut -f1 {input} > {output}"

SnakeMake From line 522 of main/Snakefile

shell:
    "vcftools --vcf {input.vcf} --remove {input.miss} --recode --recode-INFO-all --out {params} 2> {log}"

SnakeMake VCFtools From line 539 of main/Snakefile

shell:
    "vcftools --vcf {input.vcf} --remove {input.miss} --recode --recode-INFO-all --out {params} 2> {log}"

SnakeMake VCFtools From line 556 of main/Snakefile

shell:
    "bcftools stats {input} > {output}"

SnakeMake BCFtools From line 570 of main/Snakefile

shell:
    "bcftools stats {input} > {output}"

SnakeMake BCFtools From line 582 of main/Snakefile

shell:
    "bcftools stats {input} > {output}"

SnakeMake BCFtools From line 594 of main/Snakefile

shell:
    "bcftools stats {input} > {output}"

SnakeMake BCFtools From line 606 of main/Snakefile

shell:
    "tabix -p vcf {input} && bcftools annotate --set-id +'%CHROM\_%POS\_%REF\_%FIRST_ALT' {input} > {output} 2> {log}"

SnakeMake BCFtools tabix From line 620 of main/Snakefile

shell:
    "mkdir -p {input}"

SnakeMake From line 632 of main/Snakefile

shell:
    "mkdir -p {input}"

SnakeMake From line 641 of main/Snakefile

shell:
    "plink --vcf {input} --genome 2> {log} && mv plink.genome plink/fullset/"

SnakeMake pLink From line 654 of main/Snakefile

shell:
    "plink --vcf {input[0]} --genome 2> {log} && mv plink.genome plink/noindels/"

SnakeMake pLink From line 668 of main/Snakefile

shell:
    "plink --threads 5 --vcf {input} --double-id --recode --out myplink 2> {log} && mv myplink* plink/"

SnakeMake pLink From line 682 of main/Snakefile

shell:
    "cd plink/ && plink --file {params} --recode HV --out ssr_progeny_population && mv ssr_progeny_population* fullset/"

SnakeMake pLink From line 695 of main/Snakefile

shell:
    "cd plink/ && plink --file {params} --recode HV --snps-only just-acgt --out ssr_progeny_population_noindels  && mv ssr_progeny_population_noindels* noindels/"

SnakeMake pLink From line 708 of main/Snakefile

shell:
    "cd plink/ && plink --file {params} --indep-pairphase 50 5 0.5 && mv plink.prune* fullset/"

SnakeMake pLink From line 722 of main/Snakefile

shell:
    "cd plink/ && plink --file {params} --r2 --ld-window-r2 0 && mv plink.ld fullset/"

SnakeMake pLink From line 735 of main/Snakefile

shell:
    "cd plink/ && plink --file {params} --blocks no-pheno-req && mv plink.blocks* fullset/"

SnakeMake pLink From line 749 of main/Snakefile

shell:
    "cd plink/ && plink --file {params} --indep-pairphase 50 5 0.5 && mv plink.prune* noindels/"

SnakeMake pLink From line 763 of main/Snakefile

shell:
    "cd plink/ && plink --file {params} --r2 --ld-window-r2 0 && mv plink.ld noindels/"

SnakeMake pLink From line 776 of main/Snakefile

shell:
    "cd plink/ && plink --file {params} --blocks no-pheno-req && mv plink.blocks* noindels/"

SnakeMake pLink From line 790 of main/Snakefile

__author__ = "Johannes Köster, Felix Mölder, Christopher Schröder"
__copyright__ = "Copyright 2017, Johannes Köster"
__email__ = "johannes.koester@protonmail.com, felix.moelder@uni-due.de"
__license__ = "MIT"


from snakemake.shell import shell

shell.executable("bash")

log = snakemake.log_fmt_shell(stdout=False, stderr=True)

params = snakemake.params.get("extra", "")
norm = snakemake.params.get("normalize", False)
assert norm in [True, False]

pipe = ""
if snakemake.output[0].endswith(".bcf"):
    if norm:
        pipe = "| bcftools norm -Ob -"
    else:
        pipe = "| bcftools view -Ob -"
elif norm:
    pipe = "| bcftools norm -"

if snakemake.threads == 1:
    freebayes = "freebayes"
else:
    chunksize = snakemake.params.get("chunksize", 100000)
    regions = (
        "<(fasta_generate_regions.py {snakemake.input.ref}.fai {chunksize})".format(
            snakemake=snakemake, chunksize=chunksize
        )
    )
    if snakemake.input.get("regions", ""):
        regions = (
            "<(bedtools intersect -a "
            r"<(sed 's/:\([0-9]*\)-\([0-9]*\)$/\t\1\t\2/' "
            "{regions}) -b {snakemake.input.regions} | "
            r"sed 's/\t\([0-9]*\)\t\([0-9]*\)$/:\1-\2/')"
        ).format(regions=regions, snakemake=snakemake)
    freebayes = ("freebayes-parallel {regions} {snakemake.threads}").format(
        snakemake=snakemake, regions=regions
    )

shell(
    "({freebayes} {params} -f {snakemake.input.ref}"
    " {snakemake.input.samples} {pipe} > {snakemake.output[0]}) {log}"
)

Python Snakemake BCFtools FreeBayes From line 1 of freebayes/wrapper.py

__author__ = "Jan Forster"
__copyright__ = "Copyright 2021, Jan Forster"
__email__ = "j.forster@dkfz.de"
__license__ = "MIT"


import os
from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "sambamba markdup {snakemake.params.extra} -t {snakemake.threads} "
    "{snakemake.input[0]} {snakemake.output[0]} "
    "{log}"
)

Python Snakemake Sambamba From line 1 of markdup/wrapper.py

__author__ = "Johannes Köster"
__copyright__ = "Copyright 2016, Johannes Köster"
__email__ = "koester@jimmy.harvard.edu"
__license__ = "MIT"


from snakemake.shell import shell

log = snakemake.log_fmt_shell(stdout=True, stderr=True)

shell(
    "samtools index {snakemake.params} {snakemake.input[0]} {snakemake.output[0]} {log}"
)