A Snakemake workflow for variant calling using GATK4 best practices

public 1yr ago Version: 1.0 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Snakemake workflow: variant calling using the Genome Analysis Toolkit (GATK) best practices

Motivation

This repository contains a pipeline built with Snakemake for variant calling using Illumina-generated sequences and is based on the GATK best practices for variant calling using either hard or soft filters.
Additionally, this pipeline aims to reproduce a recently published pipeline that optimized the GATK4 variant calling pipeline for Plasmodium falciparum ( preprint ). However, this is not limited to P. falciparum and can be used for any organism of interest.

Pipeline sections

The pipeline handles paired-end reads and below are the analysis sections in the Snakefile:

Step 1 - Compile List of Output Files

rule all - gather all output files

Step 2 - Gather Genome Data

gather_genome_data : aggregate genome data from the snpeff folder
gatk_genome_dict : create genome dictionary for gatk tools
samtools_index : index the genome fasta file
bedops_gff2bed : convert the genome annotation .gff to .bed file

Step 3 - Perform Fastq Quality Control

trim_reads : trim adapters and low quality bases using trimmomatic or fastp

Step 4 - Map Reads to Genome

bwa_index : generate bwa genome-index files for mapping reads
bwa_mem : map reads to genome, fixmate, convert .sam to .bam and finally remove artifacts
mark_duplicates : mark duplicate reads using gatk MarkDuplicatesSpark or Samblaster

Step 5 - Generate Mapping Quality Statistics

samtools_idxstats : calculate alignment statistics based on the reference sequence
samtools_flagstats : calculates and summarizes various alignment statistics
samtools_depth : calculate the depth of coverage for each position in the genome
gatk_insert_size_metrics : collect insert size metrics
gatk_alignment_summary_metrics : generate a summary of alignment metrics from the BAM file

Step 6 - Perform Variant Calling

gatk_haplotypecaller : call snps and indels via local re-assembly of haplotypes and generate gVCFs
generate_sample_name_map : generate a map of sample names and the respective vcf files
gatk_genomics_db_import : merge gVCFs into one genomic database
gatk_genotype_gvcfs : perform joint genotyping and generate the final VCF in which all samples have been jointly genotyped

Step 7 - Perform Variant Filtering (Hard or Soft)

bcftools_normalize : normalize indels, left-align variants, split multiallelic sites into multiple rows and recover multiallelics from multiple rows
hard_filter_variants:
- gatk_split_variants : separate snps and indels into separate vcf files
- gatk_filter_hard : apply hard filters to snps and indels
- gatk_merge_vcfs : merge snps and indels into one vcf file
soft_filter_variants:
- gatk_vqsr_indels : perform variant quality score recalibration to indels
- gatk_apply_vqsr_indels : apply variant quality score recalibration to indels
- gatk_vqsr_snps : perform variant quality score recalibration to snps
- gatk_apply_vqsr_snps : apply variant quality score recalibration to snps
gatk_filter_pass : filter out variants that do not pass the hard or soft filters

Step 8 - Annotatate Variants and Calculate Allele Frequencies

snpeff_annotate_variants : variant annotation and functional effect prediction
gatk_variants_to_table : extract variant information into a table
python - calculate allele frequencies and transform the summary table from wide to long format

Project dependencies:

Conda - an open-source package management system and environment management system that runs on various platforms, including Windows, MacOS, Linux
Snakemake - a workflow management system that aims to reduce the complexity of creating workflows by providing a fast and comfortable execution environment, together with a clean and modern specification language in python style.

Where to start

Install conda for your operating System ( the pipeline is currently tested on Linux and MacOS ):
- Linux
- MacOS
Clone this project using the following command in your terminal:
- git clone https://github.com/kevin-wamae/variant-calling-with-Snakemake-and-GATK.git
Type the following command in your terminal to navigate into the cloned directory using the command below. This will be the root directory of the project:
- cd variant-calling-with-Snakemake-and-GATK
Note: All subsequent commands should be run from the root directory of this project. However, users can modify the scripts to their liking

Directory structure

Below is the default directory structure:
- config/ - contains the Snakemake-configuration files
- input/ - contains input files
  - bed/ - contains the bed files for specifying the intervals of interest
  - fastq/ - contains the FastQ files
  - known_sites/ - contains the positive-training dataset for variant filtering
- output/ - contains numbered-output directories from the analysis
- workflow/ - contains the Snakemake workflow files
  - envs/ - contains the Conda environment-configuration files
  - scripts/ - contains the scripts used in the pipeline

.
|-- config
|-- input
| |-- bed
| |-- fastq
| -- known_sites
|-- output
|-- workflow |-- envs |-- scripts

This pipeline uses global_wildcards() to match FastQ sample names and mate files in the input/fastq/ directory, using the naming convention below:
- reads_R1.fastq.gz = first mate
- reads_R2.fastq.gz = second mate
- If you have a different naming convention (eg. this ), you can rename the FastQ files by executing the python script in the workflow/scripts/ directory:
  - python workflow/scripts/fastq_rename.py
- Therefore, the user can deposit their FastQ files in the input/fastq/ directory or edit the config/config.yaml file to point to the directory with FastQ files and the pipeline will automatically match the sample names and mates files
The configuration file ( config/config.yaml ) specifies additional resources and can be modified to suit one's needs, such as:
- Input files
- Output directories,
- The option to choose between tools and methods, e.g.:
  - fastp or trimmomatic for read trimming
  - gatk MarkDuplicatesSpark or samblaster for marking duplicates
  - hard or soft filtering of variants
- Other parameters, such as the number of threads to use

Running the analysis

After navigating into the root directory of the project, run the analysis by executing the following commands in your terminal to:

Create a conda analysis environment by running the command below in your terminal. This will create a conda environment named variant-calling-gatk and install Snakemake and SnpEff and Graphviz (for visualizing the workflow) in the environment. Note: This only needs to be done once.
- conda env create --file workflow/envs/environment.yaml
Activate the conda environment by running the command below in your terminal. Note: This needs to be done every time you exit and restart your terminal and want re-run this pipeline
- conda activate variant-calling-gatk
Execute the shell script below to create the SnpEff database for variant annotation. This will download the P. falciparum genome data from PlasmoDB and create a database in the output/ directory. Note: This is an important step because the genome-FASTA and GFF files are required for read-mapping and variant calling. It can also be modified to suit one's needs such as download genome files for your organism of interest:
- bash workflow/scripts/create_snpeff_db.sh
Finally, execute the whole Snakemake pipeline by running the following command in your terminal:
- snakemake --use-conda --cores 2 --jobs 1
- This will run the whole pipeline using a maximum of two cores and one job in parallel. The --cores flag specifies the number of cores to use for each job and the --jobs flag specifies the number of jobs to run in parallel.
- If you want to run the pipeline using more resources, you can increase the number of cores and jobs. For example, to run the pipeline using 4 cores and 2 jobs in parallel, run the following command:
  - snakemake --use-conda --cores 4 --jobs 2
- Additionally, you can change the threads entry in line 3 of the configuration file ( config/config.yaml ) to specify the number of cores to use for each step in the pipeline.
Once the analysis is complete, look through output/ directory to view the results of the analysis
Summary statistics can be generated with stand alone scripts in the workflow/scripts/ directory:
- To do this, create an conda environment with the following command:
  - conda env create --file workflow/envs/variant-calling-stats.yaml
  - activate the conda environment by running the following command: conda activate variant-calling-stats
- To generate a summary of the raw reads, run the following command and look through the stats_1_raw_fastq.tsv file in the project directory:
  - python workflow/scripts/get_raw_fastq_stats.py
- To generate a summary of the trimmed reads, run the following command and look through the stats_2_trimmed_fastq.tsv file:
  - python workflow/scripts/get_trimmed_fastq_stats.py
- To generate a summary of the mapped reads, run the following command and look through the stats_3_mapped_reads.tsv file:
  - python workflow/scripts/get_mapped_reads_stats.py
- To generate a summary of the variants called, run the following command and look through the stats_4_variant_calling.tsv file:
  - bash workflow/scripts/get_variant_calling_stats.sh
- Exit this conda environment by running the following command:
  - conda deactivate variant-calling-stats
Finally, you can deactivate the variant calling conda environment by running the following command:
- conda deactivate variant-calling-gatk

Feedback and Issues

Report any issues or bugs by openning an issue here or contact me via email at wamaekevin[at]gmail.com

Code Snippets

import os
import sys

# Get the directory path from the command line arguments
vcf_dir = sys.argv[1]
tsv_file = sys.argv[2]

# Get a list of all the VCF files in the directory
vcf_files = [f for f in os.listdir(vcf_dir) if f.endswith('.vcf.gz')]

# Create a list of tuples with the file paths and names
file_list = []
for vcf_file in vcf_files:
    file_name = os.path.splitext(vcf_file)[0]
    file_path = os.path.join(vcf_dir, vcf_file)
    file_list.append((file_path, file_name))

# Sort the list by the second element of each tuple (the file name)
file_list.sort(key=lambda x: x[1])

# Write the file names and paths to a TSV file
with open(tsv_file, 'w') as tsv_file:
    for file_path, file_name in file_list:
        tsv_file.write(os.path.splitext(file_name)[
                       0] + '\t' + file_path + '\n')

Python From line 2 of scripts/sample_vcf_map.py

output_file=$1

awk -F '\t' 'BEGIN {
    OFS = FS
}
NR == 1 {
    # Print the header with new column names for ANN fields
    for (i = 1; i <= 5; ++i) {
        printf("%s%s", $i, OFS)
    }
    printf("Allele%sAnnotation%sAnnotation_Impact%sGene_Name%sGene_ID%sFeature_Type%sFeature_ID%sTranscript_BioType%sRank%sHGVS.c%sHGVS.p%scDNA.pos%sCDS.pos%sAA.pos%sDistance%sErrors", OFS, OFS, OFS, OFS, OFS, OFS, OFS, OFS, OFS, OFS, OFS, OFS, OFS, OFS, OFS)
    for (i = 7; i <= NF; ++i) {
        printf("%s%s", OFS, $i)
    }
    printf("\n")
    next
}
{
    # Print the data with ANN fields split and without the ANN field
    for (i = 1; i <= 5; ++i) {
        printf("%s%s", $i, OFS)
    }
    split($6, ann, "|")
    for (i = 1; i <= 16; ++i) {
        printf("%s%s", ann[i], OFS)
    }
    for (i = 7; i <= NF; ++i) {
        printf("%s%s", $i, (i == NF) ? "\n" : OFS)
    }
}' > /dev/stdout

Shell From line 3 of scripts/split_annot_column.sh

library(argparse)    # For command line argument parsing
library(data.table)  # For fast data manipulation
library(tidyverse)   # For data wrangling and visualization

# Define command line arguments
parser <- argparse::ArgumentParser()

parser$add_argument(
                    "--input",
                    type = "character",
                    help = "Input file path")

parser$add_argument("--output",
                    type = "character",
                    help = "Output file path")

args <- parser$parse_args(commandArgs(TRUE))

# Read in the data using data.table::fread for fast reading of large files
vcf <- data.table::fread(args$input, nThread = 4)

# Reshape the data
vcf <- vcf %>%
  mutate(across(23:ncol(.), as.character)) %>% # convert columns to be reshaped to character type
  mutate(row_id = row_number()) %>%            # add a unique row identifier
  pivot_longer(
               cols = -c(CHROM:Errors, row_id),
               names_to = c("sample_name", "variable"),
               names_sep = "\\.") %>%
  pivot_wider(names_from = "variable", values_from = "value") %>%
  select(-row_id) %>% # drop the row_id column
  mutate(
    alt_allele_freq = if_else(str_detect(AD, ",0$"), 0,
                          as.numeric(str_extract(AD, "\\d+$")) / (as.numeric(str_extract(AD, "^[^,]+")) + as.numeric(str_extract(AD, "\\d+$")))),
    alt_allele_freq = round(alt_allele_freq, 3)
  ) %>%
  filter(! alt_allele_freq == 0) %>%
  arrange(sample_name, POS)

data.table::fwrite(vcf, args$output, nThread = 4)

write_tsv(vcf, args$output)

R tidyverse data.table argparse From line 2 of scripts/vcf_allele_frequency.R

run:
    shell(  # cp - copy genome fasta file from snpeff database location
        """
        cp -f {input.genome} {output.genome}
        """
    )
    shell(  # cp - copy annotation file from snpeff database location
        """
        cp -f {input.gff} {output.gff}
        """
    )

SnakeMake From line 209 of workflow/Snakefile

shell:
    """
    gatk --java-options "{params.java_opts}" CreateSequenceDictionary \
        --REFERENCE {input.genome} \
        --OUTPUT {output.genome_dict} \
        2> {log}
    """

SnakeMake gatk From line 235 of workflow/Snakefile

shell:
    """
    samtools faidx {input.genome}
    """

SnakeMake SAMtools From line 253 of workflow/Snakefile

shell:
    """
    convert2bed \
        --input=gff \
        --output=bed < {input} |\
    grep -e {params.feature} > {output}
    """

SnakeMake Bedops From line 270 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "-------    Running {params.trimmer}    -------"
    echo "##############################################"

    if [[ "{params.trimmer}" == "Fastp" ]]; then
        fastp \
            --thread {threads} \
            {params.opts_fastp} \
            --in1 {input.r1} \
            --in2 {input.r2} \
            --out1 {output.r1} \
            --out2 {output.r2} \
            --unpaired1 {output.r1_unpaired} \
            --unpaired2 {output.r2_unpaired} \
            --json {log.json} \
            --html {log.html} \
            2> {log.log}
    else
        trimmomatic PE \
            -threads {threads} \
            {input.r1} {input.r2} \
            {output.r1} {output.r1_unpaired} \
            {output.r2} {output.r2_unpaired} \
            {params.opts_trimmomatic} \
            2> {log.log}
    fi
    """

SnakeMake fastp Trimmomatic From line 314 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "-----------    Running BWA Index    ----------"
    echo "##############################################"

    bwa index -p {output.index} {input.genome} 2> {log}
    """

SnakeMake BWA From line 361 of workflow/Snakefile

shell:
    """
    echo "#####################################################"
    echo "Running BWA-> Fixmate-> SamFormatConverter-> CleanSam"
    echo "#####################################################"

    bwa mem \
        -t {threads} \
        {params.extra_bwa} \
        {params.read_groups} \
        {input.idx} \
        {input.reads} \
        2> {log.bwa} |\
    samtools fixmate \
        --threads {threads} \
        {params.extra_fixmate} \
        --output-fmt sam \
        /dev/stdin \
        /dev/stdout \
        2> {log.fixmate} |\
    gatk --java-options "{params.java_opts}" SamFormatConverter \
        --INPUT /dev/stdin \
        --OUTPUT /dev/stdout \
        2> {log.sam2bam} |\
    gatk --java-options "{params.java_opts}" CleanSam \
        -R {input.genome} \
        -I /dev/stdin \
        -O {output.bam} \
        2> {log.cleansam}
    """

SnakeMake SAMtools gatk BWA From line 398 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "---  Running {params.mark_duplicate_tool} ---"
    echo "##############################################"

    if [ "{params.mark_duplicate_tool}" == "MarkDuplicatesSpark" ]; then
        gatk --java-options "{params.java_opts}" MarkDuplicatesSpark \
            --spark-master local[{threads}] \
            -I {input.bam} \
            -O {output.bam} \
            {params.extra_gatk} \
            2> {log}
    elif [ "{params.mark_duplicate_tool}" == "Samblaster" ]; then
        samtools view \
            -h {input.bam} |\
        samblaster \
            {params.extra_samblaster} \
            2> {log} |\
        samtools sort \
            -@ {threads} \
            -o {output.bam} \
            2>> {log}
        samtools index {output.bam}
    else
        echo "Unsupported mark duplicate tool selected: {params.mark_duplicate_tool}"
        exit 1
    fi
    """

SnakeMake SAMtools gatk SAMBLASTER From line 451 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "------    Running Samtools IdxStats    ------"
    echo "##############################################"

    samtools idxstats {input.bam} > {output.idxstats}
    """

SnakeMake SAMtools From line 499 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "------    Running Samtools Flagstat    ------"
    echo "##############################################"

    samtools flagstat {input.bam} > {output.flagstat}
    """

SnakeMake SAMtools From line 521 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "------    Running Samtools Depth    ------"
    echo "##############################################"

    samtools depth {input.bam} > {output.depth}
    """

SnakeMake SAMtools From line 543 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "--  Running GATK CollectInsertSizeMetrics  --"
    echo "##############################################"

    gatk --java-options "{params.java_opts}" CollectInsertSizeMetrics \
        {params.extra} \
        -R {input.genome} \
        -I {input.bam} \
        -O {output.metrics} \
        -H {output.histogram} \
        2> {log}
    """

SnakeMake gatk From line 573 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo " Running GATK CollectAlignmentSummaryMetrics "
    echo "##############################################"

    gatk --java-options "{params.java_opts}" CollectAlignmentSummaryMetrics \
        {params.extra} \
        -R {input.genome} \
        -I {input.bam} \
        -O {output.metrics} \
        -H {output.histogram} \
        2> {log}
    """

SnakeMake gatk From line 609 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "-----    Running GATK HaplotypeCaller    -----"
    echo "##############################################"

    gatk --java-options "{params.java_opts}" HaplotypeCaller \
        --native-pair-hmm-threads {threads} \
        {params.extra} \
        -R {input.genome} \
        -L {input.intervals} \
        -I {input.bam} \
        -O {output.vcf} \
        2> {log}
    """

SnakeMake gatk From line 650 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "------    Generating VCF-Sample Map    ------"
    echo "##############################################"

    python workflow/scripts/sample_vcf_map.py \
        {params.directory} \
        {output}
    """

SnakeMake From line 677 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "----    Running GATK GenomicsDBImport    ----"
    echo "##############################################"

    gatk --java-options "{params.java_opts}" GenomicsDBImport \
        {params.extra} \
        --reader-threads {threads} \
        --genomicsdb-workspace-path {output.dir} \
        --sample-name-map {input.sample_map} \
        -L {input.intervals} \
        2> {log}
    """

SnakeMake gatk From line 707 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "-----     Running GATK GenotypeGVCFs     -----"
    echo "##############################################"

    gatk --java-options "{params.java_opts}" GenotypeGVCFs \
        {params.extra} \
        -R {input.genome} \
        -V gendb://{input.db} \
        -L {input.intervals} \
        -O {output.vcf} \
        2> {log}
    """

SnakeMake gatk From line 739 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "--- Running BCFtools Normalize & Index VCF ---"
    echo "##############################################"

    bcftools norm \
        {params.normalise} \
        --fasta-ref {input.genome} \
        {input.vcf} \
        2> {log} |\
    bcftools annotate \
        {params.annotate} \
        2>> {log} |\
    bcftools view \
        {params.view} \
        --output-type z \
        --output-file {output.vcf} \
        2>> {log}

    tabix -p vcf {output.vcf}
    """

SnakeMake BCFtools tabix From line 776 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "-----    Running GATK SplitVariants     -----"
    echo "##############################################"

    gatk --java-options "{params.java_opts}" SelectVariants \
        -R {input.genome} \
        -V {input.vcf} \
        -O {output.snps} \
        --select-type-to-include SNP \
        2> {log.snps}

    gatk --java-options "{params.java_opts}" SelectVariants \
        -R {input.genome} \
        -V {input.vcf} \
        -O {output.indels} \
        --select-type-to-include INDEL \
        2> {log.indels}
    """

SnakeMake gatk From line 818 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "---  Running GATK VariantFiltration (Hard) ---"
    echo "##############################################"

    gatk --java-options "{params.java_opts}" VariantFiltration \
        {params.extra_snps} \
        -R {input.genome} \
        -V {input.snps} \
        -O {output.snps} \
        2> {log.snps}

    gatk --java-options "{params.java_opts}" VariantFiltration \
        {params.extra_indels} \
        -R {input.genome} \
        -V {input.indels} \
        -O {output.indels} \
        2> {log.indels}
    """

SnakeMake gatk From line 862 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "---------   Running GATK MergeVcfs   ---------"
    echo "##############################################"

    gatk --java-options "{params.java_opts}" MergeVcfs \
        -I {input.snps} \
        -I {input.indels} \
        -O {output.vcf} \
        2> {log}
    """

SnakeMake gatk From line 898 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "- Running GATK VariantRecalibrator (INDELS) -"
    echo "##############################################"

    gatk --java-options "{params.java_opts}" VariantRecalibrator \
        {params.extra} \
        -R {input.genome} \
        -V {input.vcf} \
        -O {output.recal} \
        -resource:{params.resource} {params.known_sites} \
        --tranches-file {output.tranches} \
        --rscript-file {output.plots} \
        2> {log}
    """

SnakeMake gatk From line 932 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "----   Running GATK ApplyVQSR (INDELS)   -----"
    echo "##############################################"

    gatk --java-options "{params.java_opts}" ApplyVQSR \
        {params.extra} \
        -R {input.genome} \
        -V {input.vcf} \
        -O {output.vcf} \
        --tranches-file {input.tranches} \
        --recal-file {input.recal} \
        2> {log}
    """

SnakeMake gatk From line 971 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "--  Running GATK VariantRecalibrator (SNPs) --"
    echo "##############################################"

    gatk --java-options "{params.java_opts}" VariantRecalibrator \
        {params.extra} \
        -R {input.genome} \
        -V {input.vcf} \
        -O {output.recal} \
        -resource:{params.resource} {params.known_sites} \
        --tranches-file {output.tranches} \
        --rscript-file {output.plots} \
        2> {log}
    """

SnakeMake gatk From line 1008 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "------  Running GATK ApplyVQSR (SNPs)  -------"
    echo "##############################################"

    gatk --java-options "{params.java_opts}" ApplyVQSR \
        {params.extra} \
        -R {input.genome} \
        -V {input.vcf} \
        -O {output.vcf} \
        --tranches-file {input.tranches} \
        --recal-file {input.recal} \
        2> {log}
    """

SnakeMake gatk From line 1046 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "-------   Running GATK FilterVcfPass   -------"
    echo "##############################################"

    gatk --java-options "{params.java_opts}" SelectVariants \
        -V {input.vcf} \
        -O {output.vcf} \
        --exclude-filtered \
        2> {log}
    """

SnakeMake gatk From line 1078 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "--------   Running SnpEff Annotate   --------"
    echo "##############################################"

    snpEff ann \
        {params.extra} \
        -config {params.config} \
        {params.database} \
        {input.vcf} | bgzip -c > {output.vcf}

    tabix -p vcf {output.vcf}
    """

SnakeMake tabix snpEff From line 1112 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "------   Running GATK VariantsToTable   ------"
    echo "##############################################"  

    gatk --java-options "{params.java_opts}" VariantsToTable \
        -V {input.vcf} \
        --fields CHROM \
        --fields POS \
        --fields REF \
        --fields ALT \
        --fields TYPE \
        --fields ANN \
        --genotype-fields GT \
        --genotype-fields AD \
        --genotype-fields DP \
        --genotype-fields GQ \
        --genotype-fields PGT \
        --genotype-fields PID \
        --genotype-fields PL \
        --genotype-fields PS \
        -O /dev/stdout \
        2> {log} |\
    bash workflow/scripts/split_annot_column.sh > {output.variants}
    """

SnakeMake gatk From line 1141 of workflow/Snakefile

shell:
    """
    echo "##############################################"
    echo "--------   Running R AlleleFrequency  --------"
    echo "##############################################"

    Rscript workflow/scripts/vcf_allele_frequency.R \
        --input {input.variants} \
        --output {output.variants} \
        2> {log}
    """