GBS Data Analysis Workflow: Variant Calling, Phasing, and Population Genetics
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
This workflow operates on a merged bam file of sample data, uses Freebayes to create a VCF file, performs filtering with bcftools/vcftools, performs phasing and imputation with beagle, then calculates popgen stats with Plink
Usage
Step 1: Install workflow
clone this workflow to your local computer
Step 2: Configure workflow
Configure the workflow according to your needs via editing the file
config.yaml
Step 3: Execute workflow
Test your configuration by performing a dry-run via
snakemake --use-conda -n
Execute the workflow locally via
snakemake --use-conda --cores $N
using
$N
cores or run it in a cluster environment via
snakemake --use-conda --cluster qsub --jobs 100
Code Snippets
64 65 | wrapper: "0.73.0/bio/sambamba/markdup" |
75 76 | wrapper: "0.73.0/bio/samtools/index" |
93 94 | wrapper: "0.73.0/bio/freebayes" |
111 112 | wrapper: "0.73.0/bio/freebayes" |
124 125 | shell: "bcftools stats {input} > {output}" |
137 138 | shell: "bcftools stats {input} > {output}" |
152 153 | shell: "vcftools --vcf {input} --max-missing 0.5 --mac 3 --minQ 20 --maf 0.03 --recode --recode-INFO-all --out {params} 2> {log}" |
167 168 | shell: "vcftools --vcf {input} --max-missing 0.75 --mac 3 --minQ 20 --maf 0.05 --recode --recode-INFO-all --out {params} 2> {log}" |
182 183 | shell: "vcftools --vcf {input} --max-missing 0.70 --mac 3 --minQ 20 --maf 0.05 --recode --recode-INFO-all --out {params} 2> {log}" |
197 198 | shell: "vcftools --vcf {input} --max-missing 0.25 --mac 3 --minQ 20 --maf 0.05 --recode --recode-INFO-all --out {params} 2> {log}" |
212 213 | shell: "vcftools --vcf {input} --max-missing 0.90 --mac 3 --minQ 20 --maf 0.05 --recode --recode-INFO-all --out {params} 2> {log}" |
227 228 | shell: "vcftools --vcf {input} --max-missing 0.10 --mac 3 --minQ 20 --maf 0.05 --recode --recode-INFO-all --out {params} 2> {log}" |
242 243 | shell: "vcftools --vcf {input} --max-missing 1 --mac 3 --minQ 20 --maf 0.05 --recode --recode-INFO-all --out {params} 2> {log}" |
255 256 | shell: "vcftools --vcf {input} --missing-indv --stdout > {output} 2> {log}" |
268 269 | shell: "vcftools --vcf {input} --missing-indv --stdout > {output} 2> {log}" |
281 282 | shell: "vcftools --vcf {input} --missing-indv --stdout > {output} 2> {log}" |
292 293 | shell: "mawk '$5 > 0.5' {input} | cut -f1 > {output}" |
303 304 | shell: "mawk '$5 > 0.5' {input} | cut -f1 > {output}" |
314 315 | shell: "mawk '$5 > 0.5' {input} | cut -f1 > {output}" |
330 331 | shell: "vcftools --vcf {input.vcf} --remove {input.miss} --recode --recode-INFO-all --out {params} 2> {log}" |
345 346 | shell: "vcftools --vcf {input} --remove-indels --recode --recode-INFO-all --out {params} 2> {log}" |
354 355 | shell: "grep -v super {input} | perl -pe 's/\s\.:/\t.\/.:/g' > {output}" |
363 364 | shell: "grep -v super {input} | perl -pe 's/\s\.:/\t.\/.:/g' > {output}" |
379 380 | shell: "vcftools --vcf {input.vcf} --remove {input.miss} --recode --recode-INFO-all --out {params} 2> {log}" |
395 396 | shell: "vcftools --vcf {input.vcf} --remove {input.miss} --recode --recode-INFO-all --out {params} 2> {log}" |
408 409 | shell: "grep -v super {input} | perl -pe 's/\s\.:/\t.\/.:/g' > {output}" |
421 422 | shell: "java -jar beagle/beagle.28Jun21.220.jar gt={input} out={params} > {log} 2>&1" |
430 431 | shell: "grep -v super {input} | perl -pe 's/\s\.:/\t.\/.:/g' > {output}" |
443 444 | shell: "java -jar beagle/beagle.28Jun21.220.jar gt={input} out={params} > {log} 2>&1" |
458 459 | shell: "vcftools --gzvcf {input} --remove-indels --recode --recode-INFO-all --out {params} 2> {log}" |
473 474 | shell: "java -jar beagle/beagle.28Jun21.220.jar gt={input} out={params} > {log} 2>&1" |
487 488 | shell: "vcftools --gzvcf {input} --missing-indv --stdout > {output} 2> {log}" |
500 501 | shell: "vcftools --gzvcf {input} --missing-indv --stdout > {output} 2> {log}" |
511 512 | shell: "cut -f1 {input} > {output}" |
522 523 | shell: "cut -f1 {input} > {output}" |
539 540 | shell: "vcftools --vcf {input.vcf} --remove {input.miss} --recode --recode-INFO-all --out {params} 2> {log}" |
556 557 | shell: "vcftools --vcf {input.vcf} --remove {input.miss} --recode --recode-INFO-all --out {params} 2> {log}" |
570 571 | shell: "bcftools stats {input} > {output}" |
582 583 | shell: "bcftools stats {input} > {output}" |
594 595 | shell: "bcftools stats {input} > {output}" |
606 607 | shell: "bcftools stats {input} > {output}" |
620 621 | shell: "tabix -p vcf {input} && bcftools annotate --set-id +'%CHROM\_%POS\_%REF\_%FIRST_ALT' {input} > {output} 2> {log}" |
632 633 | shell: "mkdir -p {input}" |
641 642 | shell: "mkdir -p {input}" |
654 655 | shell: "plink --vcf {input} --genome 2> {log} && mv plink.genome plink/fullset/" |
668 669 | shell: "plink --vcf {input[0]} --genome 2> {log} && mv plink.genome plink/noindels/" |
682 683 | shell: "plink --threads 5 --vcf {input} --double-id --recode --out myplink 2> {log} && mv myplink* plink/" |
695 696 | shell: "cd plink/ && plink --file {params} --recode HV --out ssr_progeny_population && mv ssr_progeny_population* fullset/" |
708 709 | shell: "cd plink/ && plink --file {params} --recode HV --snps-only just-acgt --out ssr_progeny_population_noindels && mv ssr_progeny_population_noindels* noindels/" |
722 723 | shell: "cd plink/ && plink --file {params} --indep-pairphase 50 5 0.5 && mv plink.prune* fullset/" |
735 736 | shell: "cd plink/ && plink --file {params} --r2 --ld-window-r2 0 && mv plink.ld fullset/" |
749 750 | shell: "cd plink/ && plink --file {params} --blocks no-pheno-req && mv plink.blocks* fullset/" |
763 764 | shell: "cd plink/ && plink --file {params} --indep-pairphase 50 5 0.5 && mv plink.prune* noindels/" |
776 777 | shell: "cd plink/ && plink --file {params} --r2 --ld-window-r2 0 && mv plink.ld noindels/" |
790 791 | shell: "cd plink/ && plink --file {params} --blocks no-pheno-req && mv plink.blocks* noindels/" |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | __author__ = "Johannes Köster, Felix Mölder, Christopher Schröder" __copyright__ = "Copyright 2017, Johannes Köster" __email__ = "[email protected], [email protected]" __license__ = "MIT" from snakemake.shell import shell shell.executable("bash") log = snakemake.log_fmt_shell(stdout=False, stderr=True) params = snakemake.params.get("extra", "") norm = snakemake.params.get("normalize", False) assert norm in [True, False] pipe = "" if snakemake.output[0].endswith(".bcf"): if norm: pipe = "| bcftools norm -Ob -" else: pipe = "| bcftools view -Ob -" elif norm: pipe = "| bcftools norm -" if snakemake.threads == 1: freebayes = "freebayes" else: chunksize = snakemake.params.get("chunksize", 100000) regions = ( "<(fasta_generate_regions.py {snakemake.input.ref}.fai {chunksize})".format( snakemake=snakemake, chunksize=chunksize ) ) if snakemake.input.get("regions", ""): regions = ( "<(bedtools intersect -a " r"<(sed 's/:\([0-9]*\)-\([0-9]*\)$/\t\1\t\2/' " "{regions}) -b {snakemake.input.regions} | " r"sed 's/\t\([0-9]*\)\t\([0-9]*\)$/:\1-\2/')" ).format(regions=regions, snakemake=snakemake) freebayes = ("freebayes-parallel {regions} {snakemake.threads}").format( snakemake=snakemake, regions=regions ) shell( "({freebayes} {params} -f {snakemake.input.ref}" " {snakemake.input.samples} {pipe} > {snakemake.output[0]}) {log}" ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 | __author__ = "Jan Forster" __copyright__ = "Copyright 2021, Jan Forster" __email__ = "[email protected]" __license__ = "MIT" import os from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=True, stderr=True) shell( "sambamba markdup {snakemake.params.extra} -t {snakemake.threads} " "{snakemake.input[0]} {snakemake.output[0]} " "{log}" ) |
1 2 3 4 5 6 7 8 9 10 11 12 13 | __author__ = "Johannes Köster" __copyright__ = "Copyright 2016, Johannes Köster" __email__ = "[email protected]" __license__ = "MIT" from snakemake.shell import shell log = snakemake.log_fmt_shell(stdout=True, stderr=True) shell( "samtools index {snakemake.params} {snakemake.input[0]} {snakemake.output[0]} {log}" ) |
Support
Do you know this workflow well? If so, you can
request seller status , and start supporting this workflow.
Created: 1yr ago
Updated: 1yr ago
Maitainers:
public
URL:
https://github.com/Nevada-Bioinformatics-Center/snakemake_gbs_phasing_workflow
Name:
snakemake_gbs_phasing_workflow
Version:
1
Downloaded:
0
Copyright:
Public Domain
License:
MIT License
Keywords:
- Future updates
Related Workflows
ENCODE pipeline for histone marks developed for the psychENCODE project
psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project.
The o...
Near-real time tracking of SARS-CoV-2 in Connecticut
Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...
ATLAS - Three commands to start analyzing your metagenome data
Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...
raw sequence reads
Genome assembly
Annotation track
checkm2
gunc
prodigal
snakemake-wrapper-utils
MEGAHIT
Atlas
BBMap
Biopython
BioRuby
Bwa-mem2
cd-hit
CheckM
DAS
Diamond
eggNOG-mapper v2
MetaBAT 2
Minimap2
MMseqs
MultiQC
Pandas
Picard
pyfastx
SAMtools
SemiBin
Snakemake
SPAdes
SqueezeMeta
TADpole
VAMB
CONCOCT
ete3
gtdbtk
h5py
networkx
numpy
plotly
psutil
utils
metagenomics
RNA-seq workflow using STAR and DESeq2
This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...
This Snakemake pipeline implements the GATK best-practices workflow
This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...