Detection of Dominant Genetic Associations with Gene Expression Levels

public public 1yr ago 0 bookmarks

This analysis pipeline can be used to detect loci having non-additive genetic assoications with gene expression levels.

Theory

We applied a multiple linear regression model to identify dominant associaions between SNPs and gene expression levels.

image

$G_A$ stands for the number of non-reference alleles. $G_A$ = 0 if genotype is reference homozygous; $G_A$ = 1 of genotype is heterozygous; $G_A$ = 2 if genotype is non-reference homozygous. This is the variable that captures the additive effects, which is commonly used in the analysis of gene expression quantitative loci(eQTLs).

$G_D$ captures dominant effects, where $G_D$ = 1 if genotype is heterozygous and $G_D$ = 0 if genotype is either reference or non-reference homozygous.

E denotes the observed gene expression levels. Our model assumes that the noise across samples is normally distributed with 0 mean and some variance value.

Our null hypothesis is that there is no evidence for a dominant association, which is equivalent to β2 not significantly different from zero. An ANOVA model can be used instead of a multiple linear regression to detect both additive and dominant effects of the genotype; however the F-test used in the ANOVA model aims to test if at least one beta is significantly different from zero. This does not fulfill our objective, as we specifically wish to test whether there is evidence showing that β2 is not equal to zero. Alternatively, a t-test can be used to test the effect sizes of Ga and Gd separately.

Here is an illustration of what dominant eQTL looks like. image

Workflow

image

image

Getting Started

Dependencies

  • snakemake 3.13.3

  • python2.7

  • numpy

  • R

Snakemake Workflow

Input files: Genotype file (genotype.txt) and RNA-Seq counts matrix file (gene_expr.txt) downloaded from GTEx project

  1. Run PCA analysis on the genotype matrix
Rscript --vanilla pca_genotype.R genotype genotype.txt genotype_pca.txt 
  1. Regress out sex, age, race and hidden covariates from gene expression matrix using PEER package
Rscript --vanilla peer_factor.R gene_expr.txt genotype_pca.txt phenotype_info.txt gene_expr_PEER.txt
  1. Split jobs for parallele computation:1000 genes per job
python split_into_files.py gene_expr_PEER.txt {files}_peer_expr.txt
  1. Merge SNPs that are within +/- 100kb of gene body with gene expression matrixes by column into a large matrix
python snps_counts_comb_by_chr_no_filter.py snp.h5 index.h5 genotype_pca.txt {files}_peer_expr.txt {files}_snps_counts_comb.txt
  1. Adjust p-values using beta approximation and further speed up using matrix multiplication
Rscript --vanilla beta_adjust_P_matrix_multip_corrected.R {files}_snps_counts_comb.txt {files}_output.txt
  1. Merge output files
cat {files}_output.txt > all_chr_output.txt
  1. extract out snp-gene pair that shows dominant effects
Rscript --vanilla extract_snp_gene_pair_comb_matrix.R {files}_output.txt {files}_snps_counts_comb.txt dominant_snp_gene_pair.txt

Authors

Jing Gu

Acknowledgements

  • Dr. Graham McVicker Salk Institute for Biological Studies

  • Patrick Fiaux UC San Diego

  • Arko Sen Salk Institute for Biological Studies

  • Hsiuyi Chen Salk Institute for Biological Studies

  • Selene Tyndale Salk Institute for Biological Studies

Code Snippets

75
76
77
78
shell:
    "mkdir -p {config[output_dir]}/{wildcards.sample}/filtered_SNPs ;"     
    "{config[py2]} {config[script_dir]}/get_Overlapped_SNPs_GTEx.py "                                                
    "{wildcards.chrom} {input.snp_tab} {input.snp_hapl} {input.genotypes} {input.exp_matrix} {output};"
89
90
91
shell:
    "{config[py2]} {config[script_dir]}/merge_genotype_files.py "                                                
    "{output} {input};"
100
101
102
shell:
    "mkdir -p {config[output_dir]}/{wildcards.sample}/pca ;"     
    "Rscript --vanilla {config[script_dir]}/pca_genotypes.R {input} {output};"
SnakeMake From line 100 of master/Snakefile
114
115
116
shell:
    "mkdir -p {config[output_dir]}/{wildcards.sample}/PEER ;"     
    "Rscript --vanilla {config[script_dir]}/peer_factor.R {input.exp_matrix} {input.pca_matrix} {input.env_matrix} {output};"
SnakeMake From line 114 of master/Snakefile
128
129
130
131
shell:
    "mkdir -p {config[output_dir]}/{wildcards.sample}/PEER ;"     
    "{config[py2]} {config[script_dir]}/split_into_files.py "
    "{input.expr_matrix} {input.job_nums} {output};"
SnakeMake From line 128 of master/Snakefile
144
145
146
147
shell:        
    "mkdir -p {config[output_dir]}/{wildcards.sample}/snps_counts_comb ;"                                                        
    "{config[py2]} {config[script_dir]}/snps_counts_comb_by_chr_no_filter.py "                                                
    "{input.snp_tab} {input.snp_index} {input.filtered_SNPs} {input.counts_matrix} {output};"
SnakeMake From line 144 of master/Snakefile
157
158
159
160
shell:
     "echo HOSTNAME=$HOSTNAME >&2 ; "
     "mkdir -p {config[output_dir]}/{wildcards.sample}/output ;"                                                                                          
     "Rscript --vanilla {config[script_dir]}/beta_adjust_P_matrix_multip_corrected.R {input} {output};"
SnakeMake From line 157 of master/Snakefile
170
171
shell:
    "cat {input} > {output} ;"
SnakeMake From line 170 of master/Snakefile
186
187
shell:
    "Rscript --vanilla {config[script_dir]}/prepare_GO_analysis.R {input.merged_output} {input.gene_ref} {output};"
SnakeMake From line 186 of master/Snakefile
199
200
shell:
    "{config[py2]} {config[script_dir]}/GO_analysis/go_cat_fisher_test.py -n all {input.fg_files} {input.bg_files} > {output} ;" 
SnakeMake From line 199 of master/Snakefile
212
213
shell:
     "Rscript --vanilla {config[script_dir]}/extract_snp_gene_pair_comb_matrix.R {input.summary} {input.f} {wildcards.sample} {output};"
SnakeMake From line 212 of master/Snakefile
ShowHide 11 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/j3gu/Dominant-eQTLs
Name: dominant-eqtls
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...