Reproducible Gene-Level Association Studies Workflow for Multi-Omic Analysis

public public 1yr ago 0 bookmarks

This repository contains an entire end-to-end workflow to reproduce gene-level association studies described in:

Zhou, D., Jiang, Y., Zhong, X. et al. A unified framework for joint-tissue transcriptome-wide association and Mendelian randomization analysis. Nat Genet 52 (2020). https://doi.org/10.1038/s41588-020-0706-2

The workflow is described in and executed by Snakemake , a management system for scalable reproducible analysis pipelines.

Requirements

  • Python 3.9

  • conda - https://docs.conda.io/en/latest/

  • R>=4.0 including tidyverse and bookdown packages (the latter for creating summary.pdf )

Running workflow

Clone this repository to reproduce associations and generate summary.pdf . Associations results are generated by SPrediXcan.py and saved under results/{ukbb_id}-{model}_{tissue}.csv .

$ git clone https://github.com/manzt/zhou-et-al-natgen-2020 && cd zhou-et-al-natgen-2020
$ conda env create --file environment.yml
$ snakemake --cores all # downloads all data, runs associations, and generates `summary.pdf`

Running the complete workflow will download and organize all the input data necessary to reproduce the gene-level association results described in Table S7,

  • GWAS summary statistics from UKBB

  • Pretrained Transcriptome Prediction Model databases (JTI, PrediXcan, UTMOST)

  • Covariance matrices of SNps within each gene model

The data are organized in the following directory structure, and associations are generated using SPrediXcan.py from MetaXcan .

data/
├── covariances/
├── GWAS/
├── weights/
└── supplementary_tables.xlsx

Individual associations

Running all the associations can take some time. If you are interested in running individual associations, you may run SPrediXcan.py for individual GWAS summary statistics / model-tissue -specific weights explicitly by matching the following pattern with a snakmake wildcard results/{ukbb_id}-{model}_{tissue}.csv .

For example,

$ snakemake --cores all results/30740_irnt-UTMOST_Muscle_Skeletal.csv

will only perform the steps necessary to produce the gene-level association output for Glucose ( 37040_irnt ) using the pretrained UTMOST weights for Muscle Skeletal tissue.

Code Snippets

32
33
34
35
36
37
38
39
shell:
  """
  cd notebooks
  Rscript -e "bookdown::render_book('index.Rmd', 'bookdown::pdf_book')"
  cd .. 
  mv notebooks/_book/_main.pdf {output}
  rmdir notebooks/_book
  """
SnakeMake From line 32 of main/Snakefile
44
45
46
47
48
49
shell:
  """
  wget https://github.com/hakyimlab/MetaXcan/zipball/{} -O tmp.zip
  unzip tmp.zip
  rm tmp.zip
  """.format(METAXCAN_HASH)
SnakeMake From line 44 of main/Snakefile
53
shell: "wget https://zenodo.org/record/3842289/files/{wildcards.sample}.db -O {output}"
SnakeMake From line 53 of main/Snakefile
57
shell: "wget https://zenodo.org/record/3842289/files/{wildcards.sample}.txt.gz -O {output}"
SnakeMake From line 57 of main/Snakefile
63
shell: "paste <(cut -f 2- {input}) <(wget -qO - {params} | gunzip) > {output}"
SnakeMake From line 63 of main/Snakefile
67
shell: "wget https://broad-ukb-sumstats-us-east-1.s3.amazonaws.com/round2/annotations/variants.tsv.bgz -O {output}"
SnakeMake From line 67 of main/Snakefile
72
shell: "wget https://static-content.springer.com/esm/art%3A10.1038%2Fs41588-020-0706-2/MediaObjects/41588_2020_706_MOESM3_ESM.xlsx -O {output}"
SnakeMake From line 72 of main/Snakefile
77
78
79
80
81
shell:
  """
  gunzip -c {input} | head -n 1 | cut -f -6 > {output} || true
  sort <(gunzip -c {input} | cut -f -6 | sed 1d) >> {output} || true
  """
SnakeMake From line 77 of main/Snakefile
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
shell:
  """
  python {input.SPrediXcan} --model_db_path {input.model_db} \
    --covariance {input.covariance} \
    --gwas_file {input.gwas_file} \
    --snp_column rsid \
    --effect_allele_column alt \
    --non_effect_allele_column ref \
    --chromosome_column chr \
    --position_column pos \
    --beta_column beta \
    --se_column se \
    --pvalue_column pval \
    --freq_column minor_AF \
    --output_file {output} \
  """
SnakeMake From line 90 of main/Snakefile
ShowHide 9 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/manzt/zhou-et-al-natgen-2020
Name: zhou-et-al-natgen-2020
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...