QTL Snakemake Workflow

public public 1yr ago 0 bookmarks

The QTL-Snakemake-Workflow consists of 3 main fuctions which helps to utilize the QTLseqr tool for Next Generation sequencing bulk segregant analysis from a VCF file.

.. _main-getting-started:

Getting started

To run this package first we need to install the required libraries for the workflow which can be installed by running

.. code-block:: python

conda env create --file envs/condaenv.yml

Here we can define any name we want instead of the QTL_Test. This command will make a conda environment to run this package.

Once we have created the environment we need to configure the config.yaml file with the required parameters so that it can take input for different rules. After this to run the workflow we can use the command

.. code-block:: python

snakemake -pr

This command will run the workflow and generate the output in the QTL_Plots folder.

.. _manual-Work_Flow_Rules:

Work FLow Rules

The QTL workflow consists of three rules:

  1. VCF_Homozygous_Filtering

  2. QTL_VCF_to_Table_Parser

  3. QTL_Plotting

  4. VCF_Homozygous_Filtering

This rule takes any vcf file and filters the parent for selecting only the homozygous SNPS in the vcf file.

.. code-block:: python

input:
 expand("{vcf_file_name}.vcf.gz", vcf_file_name= config["vcf_file"]),
output:
 expand("Homozygous_Filtered_VCF/Homozygous_{vcf_file_name}.vcf.gz", vcf_file_name=config["vcf_file"])
params:
 filter_data= config["vcffilter"]["filters"]
shell:
 "( bcftools view"
 " {params.filter_data}"
 " {input}"
 " -O z"
 " -o {output}"
 ")" 

The input is a vcf file which needs to be defined in the config file under the vcf_file name and the output of this rule will be saved into the Homozygous_Filtered_VCF folder with the same vcf file name just a Homozygous title will be added in the start of the name so that we can distinguish between different files if the workflow is run for multiple vcf files. The filtering is done on the parents and the shell command takes input from the config file.

.. code-block:: python

 -i "GT[0]='0/0' && GT[1]='1/1' || GT[0]='1/1' && GT[1]='0/0'"

In the above script for filtering the parents which comes from the config file it is taking the parents and from our reference vcf file the parents are on GT[0] and GT[1] , but it should be adjusted according to the given vcf file.

  1. QTL_VCF_to_Table_Parser

This rule is for converting a vcf file into a table format file which is required by the QTL tool if the VCF file is not generated from GATK . This rule runs a R scripts which parses the vcf file.

.. code-block:: python

input:
 expand("Homozygous_Filtered_VCF/Homozygous_{vcf_file_name}.vcf.gz", vcf_file_name=config["vcf_file"]),
output:
 "QTL_VCF_to_Table/QTL_Table.csv",
script:
 "Scripts/QTL_Parser.R"

It takes an input a vcf file from the Homozygous_Filtered_VCF folder with a specified name as Homozygous_{vcf_file_name}.vcf.gz the vcf_file_name comes from the config file where the name of the vf file was given. The output is saved in QTL_VCF_to_Table folder The QTL_VCF_to_Table script takes input from the config file where the names of the High Bulk, Low Bulk are defined along with that it also takes the parameter Number_of_Chromosomes from the config file.

  1. QTL_Plotting

This rule runs the QTLseqr tool and generates the plots for Gprime Analysis and QTLseq Analysis . This rules runs an R script for generating the plots.

.. code-block:: python

input:
 "QTL_VCF_to_Table/QTL_Table.csv"
output:
 "QTL_Plots/DP_Filtering Data.pdf",
 "QTL_Plots/REF Frequency for Filtering Data.pdf",
 "QTL_Plots/SNP Index for Filtering Data.pdf",
 "QTL_Plots/GPrime Distribution with Hampel Outlier Filter.pdf",
 "QTL_Plots/GPrime Distribution with deltaSNP Outlier Filter.pdf",
 "QTL_Plots/SNP Density Plot.pdf",
 "QTL_Plots/Delta SNP Index Plot with Intervals.pdf",
 "QTL_Plots/GPrime Value Plot.pdf" 
script:
 "Scripts/QTL_Plotting.R"

It takes input the csv file developed by the QTL_VCF_to_Table_Parser along with the parameters defined within the config file for filtering the SNPs for better results and the output is saved into QTL_Plots folder.

.. toctree:: :caption: Getting started :name: getting_started :hidden: :maxdepth: 1

getting_started/installation tutorial/tutorial tutorial/short

.. toctree:: :caption: Executing workflows :name: execution :hidden: :maxdepth: 1

executing/cli executing/cluster-cloud executing/caching executing/interoperability

.. toctree:: :caption: Defining workflows :name: snakefiles :hidden: :maxdepth: 1

snakefiles/writing_snakefiles
snakefiles/rules
snakefiles/configuration
snakefiles/modularization
snakefiles/remote_files
snakefiles/utils
snakefiles/deployment
snakefiles/reporting

.. toctree:: :caption: API Reference :name: api-reference :hidden: :maxdepth: 1

api_reference/snakemake
api_reference/snakemake_utils
api_reference/internal/modules

.. toctree:: :caption: Project Info :name: project-info :hidden: :maxdepth: 1

project_info/citations
project_info/more_resources
project_info/faq
project_info/contributing
project_info/authors
project_info/history
project_info/license

Code Snippets

22
23
24
25
shell:
    """
    ( printf "#Samples:con-all,D2,D2_F2_tt,D2_F2_TT\nCHROM\\tPOS\\tREF\\tALT\\tRO\\tAO\\tGT\\tGQ\\tSampleRO\\tSampleAO\n" > {output} 
      bcftools view {input} {params.samples} | bcftools query {params.data} >> {output})"""
36
37
script:
    "../Scripts/plot_allele_freqs.py"
14
15
16
17
18
19
20
shell:
    "( bcftools view"
    "   {params.filter_data}"
    "   {input}"
    "   -O z"
    "   -o {output}"
    ")"
26
27
script:
   "../Scripts/QTL_Plotting.R"
12
13
script:
    "../Scripts/QTL_Parser.R"
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
import os
import matplotlib
#%matplotlib inline
#get_ipython().run_line_magic('matplotlib', 'inline')
from pathlib import Path
import pandas as pd
#from IPython.display import display
import numpy as np
import seaborn as sns
import re
from collections import OrderedDict
from matplotlib import pyplot as plt
import yaml
sns.set()
pd.options.display.max_columns = None # default is 20
pd.options.display.max_rows = 60 # default is 60


# # Plotting mutant allele frequencies
# 
# the VCF sample format fields output by freebayes are:
# 
#     GT:GQ:DP:AD:RO:QR:AO:QA:GL
# 
# The fields used for allele frequency plotting are:
# 
#  - RO: Reference allele observation count
#  - AO: Alternate allele observation count
#     
# TSV tables from filtered VCFs have to be created like this:
# 
# ```sh
# printf "#Samples:con-all,D2,D2_F2_tt,D2_F2_TT\nCHROM\\tPOS\\tREF\\tALT\\tRO\\tAO\\tGT\\tGQ\\tSampleRO\\tSampleAO\n" > filtered.tsv
# bcftools view freebayes_D2.filtered.vcf -s con-all,D2,D2_F2_tt,D2_F2_TT | bcftools query -f "%CHROM\t%POS\t%REF\t%ALT\t%RO\t%AO\t[,%GT]\t[,%GQ]\t[,%RO]\t[,%AO]\n" >> freebayes_D2.filtered.tsv
# 
# printf "#Samples:con-all,D3,D3_F2_tt,D3_F2_TT\nCHROM\\tPOS\\tREF\\tALT\\tRO\\tAO\\tGT\\tGQ\\tSampleRO\\tSampleAO\n" > filtered.tsv
# bcftools view freebayes_D3.filtered.vcf -s con-all,D3,D3_F2_tt,D3_F2_TT | bcftools query -f "%CHROM\t%POS\t%REF\t%ALT\t%RO\t%AO\t[,%GT]\t[,%GQ]\t[,%RO]\t[,%AO]\n" >> freebayes_D3.filtered.tsv
# 
# ```
# 
# The tables should look like this:
# 
#     CHROM   POS     REF ALT RO  AO  GT                 GQ               SampleRO    SampleAO
#     Chr01   344698  C   T   39  43  ,0/0,1/1,0/1,0/1   ,22,19,137,155   ,6,0,14,19  ,0,2,23,18
#     Chr01   2943267 T   A   140 109 ,0/0,1/1,0/1,0/1   ,134,45,140,140  ,30,0,66,44 ,0,16,51,42
#     Chr01   3751995 T   C   27  20  ,1/1,0/0,0/0,0/0   ,21,16,77,55     ,2,2,15,8   ,6,0,10,4
# 
# 

# ### Set input 
# 
# All user input is set in this section
# 
# ---

# #### File paths

# In[5]:

# Added by Anza to get the name of the file from the config file

config = yaml.load(open('config.yaml'))
vcf_file_name=config['AllelPlots']['vcffile']
#vcf_file_name=vcf_file_name.replace('.','_')
print(vcf_file_name)
tsv_file= "Allel_Frequency_Plots_Computomics/"+vcf_file_name+".tsv"
raw_output="Allel_Frequency_Plots_Computomics/"+vcf_file_name+".pdf"
raw_output_window="Allel_Frequency_Plots_Computomics/"+vcf_file_name+"_window.pdf"
# Define input table paths
tsv_file = Path(tsv_file)
print(tsv_file)
raw_output_pdf = Path(raw_output)
window_output_pdf = Path(raw_output_window)

# Define input table paths
#tsv_file = Path('freebayes_D2.filtered.tsv')
#raw_output_pdf = Path('freebayes_allele_freq_D2.pdf')
#window_output_pdf = Path('freebayes_allele_freq_window_D2.pdf')

#tsv_file = Path('freebayes_D3.filtered.tsv')
# raw_output_pdf = Path('freebayes_allele_freq_D3.pdf')
# window_output_pdf = Path('freebayes_allele_freq_window_D3.pdf')


# #### Sample names

# In[ ]:


# Order as: control, mutant, dwarf/mutant F2 pool, WT F2 pool
samples = ['con-all', 'D2', 'D2_F2_tt', 'D2_F2_TT']
sample_F2_WT = 'D2_F2_TT'
sample_F2_mutant = 'D2_F2_tt'

# samples = ['con-all', 'D3', 'D3_F2_tt', 'D3_F2_TT']
# sample_F2_WT = 'D3_F2_TT'
# sample_F2_mutant = 'D3_F2_tt'


# #### Chromosome lengths
# 
# These lengths are from the `Sbicolor_454_v3.0.1.fa` genome

# In[6]:


# All chromosomes that should be included in the plots.
chromosome_lengths = OrderedDict([('Chr01', 80884392), 
                      ('Chr02', 77742459),
                      ('Chr03', 74386277),
                      ('Chr04', 68658214),
                      ('Chr05', 71854669),
                      ('Chr06', 61277060),
                      ('Chr07', 65505356),
                      ('Chr08', 62686529),
                      ('Chr09', 59416394),
                      ('Chr10', 61233695)])


# #### Plot window and step size

# In[18]:


# Set window six
windowsize = 5000000
stepsize = 1000000


# No user input necessary below
# 
# ---
# 
# ### Define functions for data wrangling
# 
# 
# 

# In[7]:


# This function generates a tidy/longform table, which makes is easier to plot with seaborn.
# Will have one row per sample per variant, sample name in "sample", and rows for control/mutant observations and 
# frequency of mutant alleles
# 

def getTidyTable(raw_table, samples):

    # collects new rows
    row_accumulator = []

    def splitListToRows(row):
        '''Split one row into long form, one new row per sample'''
        if row['CHROM'] not in chromosome_lengths:
            # Only keep markers on the included chromosomes 
            return

        split_gt = row['GT'].split(',')
        if '.' in split_gt:
            # Remove rows where not all samples are genotyped
            return
        split_gq = row['GQ'].split(',')
        split_ro = row['SampleRO'].split(',')
        split_ao = row['SampleAO'].split(',')

        control_ref = split_gt[0] == '0/0'
        new_rows = []

        for i in range(4):
            new_row = row.to_dict()

            # Remove rows unnecessary for plotting
            new_row.pop('REF')
            new_row.pop('ALT')
            new_row.pop('RO')
            new_row.pop('AO')

            new_row['GT'] = split_gt[i]
            new_row['GQ'] = int(split_gq[i])
            new_row['SampleRO'] = split_ro[i]
            new_row['SampleAO'] = split_ao[i]
            new_row['sample'] = samples[i]

            # Get observations of WT/mutant alleles
            if control_ref:
                new_row['controlO'] = int(new_row['SampleRO'])
                new_row['mutantO'] = int(new_row['SampleAO'])
            else:
                new_row['controlO'] = int(new_row['SampleAO'])
                new_row['mutantO'] = int(new_row['SampleRO'])

            if (new_row['mutantO'] + new_row['controlO']) == 0:
                # Remove rows where there are no observations for a sample
                break
            else:
                # Get mutant allele frequency
                new_row['mutant_freq'] = new_row['mutantO'] / (new_row['mutantO'] + new_row['controlO'])
            new_rows.append(new_row)

        if len(new_rows) == 4:
            # only keep rows if there is one for each sample
            row_accumulator.extend(new_rows)

    raw_table.apply(splitListToRows, axis=1)
    table = pd.DataFrame(row_accumulator)
    table = table[['CHROM', 'POS', 'sample', 'GT', 'GQ', 
                   'SampleRO', 'SampleAO', 'controlO', 'mutantO', 'mutant_freq']]
#     table = table[table['CHROM'].str.startswith('Chr')]
    return table



# In[15]:


# Create a table with averaged mutant frequencies for the F2 pools with a sliding window

def get_window_table(table, sample_tt, sample_TT, windowsize, stepsize):

    # positions to include in window before/after the current pos, i.e. half the window size
    w = int(windowsize/2)

    rows = []
    for chrom in chromosome_lengths:

        # get F2 pool genotypes for current chromosomes
        chrom_table = table[(table['CHROM'] == chrom) 
                               & ((table['sample'] == sample_TT) 
                                  | (table['sample'] == sample_tt))].reset_index(drop=True)

        for i in range(1, chromosome_lengths[chrom]+1, stepsize):
            # Loop over windows

            wstart = max(1, i-w)
            wend = min(chromosome_lengths[chrom], i+w)

            # Get table for window
            wtable = chrom_table[(chrom_table['POS'] >= wstart) 
                                 & (chrom_table['POS'] <= wend)] 

            freqs_tt = wtable[wtable['sample'] == sample_tt]['mutant_freq']
            freqs_TT = wtable[wtable['sample'] == sample_TT]['mutant_freq']
            varcount = int(wtable.shape[0] / 2) # because of two samples per marker
            avg_TT = np.mean(freqs_TT)
            avg_tt = np.mean(freqs_tt)
            std_TT = np.std(freqs_TT)
            std_tt = np.std(freqs_tt)

            wsize = wend - wstart
            row_tt = {'CHROM': chrom,
                      'POS': i,
                      'sample': sample_tt,
                      'avg_mutant_freq': avg_tt,
                      'std': std_tt,
                      'varcount': varcount,
                      'window_size': wsize}
            rows.append(row_tt)
            row_TT = {'CHROM': chrom,
                      'POS': i,
                      'sample': sample_TT,
                      'avg_mutant_freq': avg_TT,
                      'std': std_TT,
                      'varcount': varcount,
                      'window_size': wsize}
            rows.append(row_TT)
    window_table = pd.DataFrame(rows)
    window_table = window_table[['CHROM', 'POS', 'sample', 'avg_mutant_freq', 'std', 'varcount', 'window_size']]
    return window_table


# ### Load input

# In[9]:


# Load TSV table
raw_table = pd.read_csv(tsv_file, sep='\t', na_values=['.'], comment='#')

# Remove leading commas
for col in ['GT', 'GQ', 'SampleRO', 'SampleAO']:
    raw_table.loc[:,col] = raw_table[col].str[1:]

table = getTidyTable(raw_table, samples)


# ### Plot raw frequencies
# 
# no smoothing/averaging with a window applied, raw frequencies are plotted

# In[10]:


# Plot raw mutant allele frequencies of F2 pools
plot_table = table[(table['sample'] == sample_F2_mutant) | (table['sample'] == sample_F2_WT)]
plot = sns.relplot(data=plot_table, x='POS', y='mutant_freq', row='CHROM', style='sample',hue='sample', aspect=7.0, height=4., kind='line', markers=True, dashes=False, linewidth=0.5)


# In[11]:


plot.savefig(raw_output_pdf)


# ### Plot sliding-window averaged frequencies
# 
# use a sliding window and plot the averaged allele frequency per window, scaling the dot by the number of snps/indels in the window.
# 
# Values used (determined manually to show clear signal):
# 
#  - window size: 5 Gbp
#  - step size: 1 Gbp

# In[12]:


windowsize=5000000
stepsize=1000000


# In[16]:


wtable = get_window_table(table, sample_F2_mutant, sample_F2_WT, windowsize, stepsize)

# Plot the per-window average allele frequencies
# Create one subplot per chromosome
wgrid = sns.FacetGrid(wtable, row='CHROM',aspect=7.0, height=4., hue='sample',legend_out=True)

# Plot lines
wgrid = wgrid.map(sns.lineplot, 'POS', 'avg_mutant_freq', linewidth=0.5).add_legend()

# Overlay with scatterplot with densities per chromosome
for i, chrom in enumerate(chromosome_lengths):
    plotdata = wtable[wtable['CHROM']==chrom]
    mincount = min(plotdata['varcount'])
    maxcount = max(plotdata['varcount'])
    sizes=(np.log2(mincount+1)*100,np.log2(maxcount+1)*100)
    sns.scatterplot(data=wtable[wtable['CHROM']==chrom], 
                    x='POS', y='avg_mutant_freq', size='varcount', hue='sample', legend=False,
                    ax=wgrid.axes[i,0], sizes=sizes)


# In[17]:


wgrid.savefig(window_output_pdf)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
install.packages('vcfR', repos='http://cran.us.r-project.org')
library(VariantAnnotation)
#library(vcfR)
library(yaml)
config <- yaml.load_file("config.yaml")
QTL_VCF_to_Table_Parser<-function(vcf_file_path,No_of_Chromosomes,HighBulk,LowBulk){

rice_vcf <- readVcf(vcf_file_path)
scan_vcf<- scanVcf(vcf_file_path)
ADREF_HighBulk <- paste0("AD_REF.",HighBulk)
ADALT_HighBulk <- paste0("AD_ALT.",HighBulk)
ADREF_LowBulk <- paste0("AD_REF.",LowBulk)
ADALT_LowBulk <- paste0("AD_ALT.",LowBulk)
GQ_HighBulk <- paste0("GQ.",HighBulk)
GQ_LowBulk <- paste0("GQ.",LowBulk)
table_colnames<- c("CHROM","POS","REF","ALT",ADREF_HighBulk,ADALT_HighBulk,GQ_HighBulk,ADREF_LowBulk,ADALT_LowBulk,GQ_LowBulk)


vcf_len<- scan_vcf$`*:*-*`$rowRanges@elementMetadata@nrows
chrom_vcf_names<- as.character.Array(droplevels(scan_vcf$`*:*-*`$rowRanges@seqnames@values[1:No_of_Chromosomes]))
QTL_Rice <- as.data.frame(matrix(ncol=10,nrow = vcf_len))
QTL_Rice$V2<-as.array(rice_vcf@rowRanges@ranges@start)
QTL_Rice$V1<-unlist(lapply(strsplit(as.array(rice_vcf@rowRanges@ranges@NAMES),':'),'[[',1))
QTL_Rice$V3<-scan_vcf$`*:*-*`$REF
QTL_Rice$V4<-scan_vcf$`*:*-*`$ALT
QTL_Rice$V7<- scan_vcf$`*:*-*`$GENO$GQ[,HighBulk]
QTL_Rice$V10<- scan_vcf$`*:*-*`$GENO$GQ[,LowBulk]
QTL_Rice$V5<-lapply(scan_vcf$`*:*-*`$GENO$AD[,HighBulk],'[[',1)
QTL_Rice$V6<-lapply(scan_vcf$`*:*-*`$GENO$AD[,HighBulk],'[[',2)
QTL_Rice$V8<-lapply(scan_vcf$`*:*-*`$GENO$AD[,LowBulk],'[[',1)
QTL_Rice$V9<-lapply(scan_vcf$`*:*-*`$GENO$AD[,LowBulk],'[[',2)
QTL_Rice$V2<- as.integer.Array(QTL_Rice$V2)
QTL_Rice$V3<- as.character.Array(QTL_Rice$V3)
QTL_Rice$V4<- as.character.Array(QTL_Rice$V4)
QTL_Rice$V5<- as.integer.Array(QTL_Rice$V5)
QTL_Rice$V6<- as.integer.Array(QTL_Rice$V6)
QTL_Rice$V7<- as.integer.Array(QTL_Rice$V7)
QTL_Rice$V8<- as.integer.Array(QTL_Rice$V8)
QTL_Rice$V9<- as.integer.Array(QTL_Rice$V9)
QTL_Rice$V10<- as.integer.Array(QTL_Rice$V10)

colnames(QTL_Rice) <- table_colnames



write.table(QTL_Rice,'QTL_VCF_to_Table/QTL_Table.csv',row.names = FALSE,col.names = TRUE,sep = '\t')
}

QTL_VCF_to_Table_Parser(config$QTL_Parser$VCF_File_Name_or_Path,config$QTL_Parser$Number_of_Chromosomes,config$QTL_Parser$HighBulk,config$QTL_Parser$LowBulk)
 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
library(yaml)
library(data.table)
library(devtools)

install_github("bmansfeld/QTLseqr",upgrade_dependencies=  TRUE)
library("QTLseqr")
library("ggplot2")

config <- yaml.load_file("config.yaml")
QTL_Plotting <- function(HighBulk,LowBulk, No_of_Chromosomes){
  options(warn=-1)
  table_file <- 'QTL_VCF_to_Table/QTL_Table.csv'
  chrom_names<-as.list(unique(fread(table_file,sep = "\t", select = c("CHROM") ))) 
  #chrom_vcf_names<- as.character.Array(droplevels(scan_vcf$`*:*-*`$rowRanges@seqnames@values[1:No_of_Chromosomes]))
  df <- importFromTable(file = table_file,
                        highBulk = HighBulk,
                        lowBulk = LowBulk,
                        chromList = chrom_names$CHROM[1:No_of_Chromosomes],sep = '\t')

  ggplot(data = df)+geom_histogram(aes(x = DP.HIGH + DP.LOW))+xlim(0,1000)
  ggsave(filename="QTL_Plots/DP_Filtering Data",device= "pdf", width=20, height=10)
  ggplot(data = df)+geom_histogram(aes(x = REF_FRQ))
  ggsave(filename="QTL_Plots/REF Frequency for Filtering Data",device = "pdf", width=20, height=10)
  ggplot(data = df)+geom_histogram(aes(x = SNPindex.HIGH))
  ggsave(filename = "QTL_Plots/SNP Index for Filtering Data",device="pdf", width=20, height=10)


  df_filt <-
    filterSNPs(
      SNPset = df,
      refAlleleFreq = config$QTL_Parser$REF_Allel_Frequency,
      minTotalDepth = config$QTL_Parser$Min_Total_Depth,
      maxTotalDepth = config$QTL_Parser$Max_Total_Depth,
      depthDifference = config$QTL_Parser$Depth_Difference,
      minSampleDepth = config$QTL_Parser$Min_Sample_Depth,
      minGQ = config$QTL_Parser$Min_GQ,
      verbose = TRUE
    )

  df_filt <- runQTLseqAnalysis(df_filt,
                               windowSize = config$QTL_Parser$WindowSize,
                               popStruc = "F2",
                               bulkSize = c(config$QTL_Parser$High_Bulk_Size, config$QTL_Parser$Low_Bulk_Size),
                               replications = 10000,
                               intervals = c(95, 99))

  df_filt <- runGprimeAnalysis(df_filt,windowSize = config$QTL_Parser$WindowSize,outlierFilter = "deltaSNP",filterThreshold = config$QTL_Parser$Filter_Threshold)

  plotGprimeDist(SNPset = df_filt, outlierFilter = "Hampel")
  ggsave(filename = "QTL_Plots/GPrime Distribution with Hampel Outlier Filter",device = "pdf", width=20, height=10)

  plotGprimeDist(SNPset =df_filt, outlierFilter = "deltaSNP", filterThreshold = config$QTL_Parser$Filter_Threshold)
  ggsave(filename = "QTL_Plots/GPrime Distribution with deltaSNP Outlier Filter",device = "pdf", width=20, height=10)

  p1 <- plotQTLStats(SNPset = df_filt, var = "nSNPs")
  ggsave(filename = "QTL_Plots/SNP Density Plot",plot=p1,device = "pdf", width=20, height=10)

  p2 <- plotQTLStats(SNPset = df_filt, var = "deltaSNP", plotIntervals = TRUE)
  ggsave(filename = "QTL_Plots/Delta SNP Index Plot with Intervals",plot=p2,device = "pdf", width=20, height=10)
  p3 <- plotQTLStats(SNPset = df_filt, var = "Gprime", plotThreshold = TRUE, q = config$QTL_Parser$FDR_q)
  ggsave(filename = "QTL_Plots/GPrime Value Plot",plot=p3,device = "pdf", width=20, height=10)

}

QTL_Plotting(config$QTL_Parser$HighBulk,config$QTL_Parser$LowBulk,config$QTL_Parser$Number_of_Chromosomes)
ShowHide 4 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/AnzaGhaffar/QTL-Snakemake-Workflow
Name: qtl-snakemake-workflow
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...