DNA Methylation Analysis Pipeline with Snakemake

public public 1yr ago 0 bookmarks

Use snakefile as main pipeline and defines paths for files and directives

Conda environment

Run pipeline in Conda specified env

  1. Install the conda environment found under envs/rbio.yaml (conda has to be installed on the system):
conda env create -f envs/rbio.yaml
  1. Activate env after installation
conda activate rbio
  1. Start R terminal

  2. Install 1) devtools + MatrixEQTL and 2) annotation template

# matrixEQTL
install.packages("devtools")
devtools::install_github("andreyshabalin/MatrixEQTL")
# annotation package
BiocManager::install("IlluminaHumanMethylationEPICanno.ilm10b2.hg19")

Configuration

You may change the thresholds and cut-offs at configs/workflow.json

  • maf : the minor allele frequency threshold

Converting impute2 genotypes to dosages.

  • post_maf : the minor allele frequency threshold to be applied after matrixEQTL has been run.

Useful to define different summaries.

  • cisDist : the maximal distance between SNP-CpGs pairs until which it counts as a 'cis-pairs'.

Cis-pairs will be reported in a separate file.

  • pvThresholdCis : The P value threshold to be applied for cis-meQTLs.

Only associations with P value < pvThresholdCis will be reported

  • pvThresholdTrans : The p-value threshold to be applied for trans-meQTLs.

Only associations with P value < pvThresholdTrans will be reported

  • significant_pv_cutoff : For final results files.

P value significant threshold e.g.: 1e-14

Execution

Execute snakemake in the project directory to run the full pipeline
Manually call the all_dosage_to_matrixEQTL and the all_methylation_to_matrixEQTL rules

Run snakemake with parallel task processing

./scripts/run_pipeline.sh <num-parallel-tasks>

Configure the script above to your VM capacity with: [scripts/run_pipeline.sh](scripts/run_pipeline.sh)

Covariates

meth ~ geno + sex + age + bmi + wbc + PCs1..20 + houseman estimates (except gran) + plate + batch

Code Snippets

38
39
40
41
shell:
	"""
	cat {input.snp_pos} > {output}
	"""
SnakeMake From line 38 of main/Snakefile
72
73
74
75
76
77
shell:
	"""
	zcat {input} | ./scripts/impute2dosage.py -m {params.maf} |\
	  sed -E 's/^[a-zA-Z0-9-]+\t/{wildcards.chr}\t/' |\
	  bgzip -c > {output}
	"""
SnakeMake From line 72 of main/Snakefile
101
102
script:
	"scripts/annotate_snps_maf.R"
SnakeMake From line 101 of main/Snakefile
127
128
script:
	"scripts/prepare_covariates.R"
SnakeMake From line 127 of main/Snakefile
145
146
script:
	"scripts/methylation_to_matrixEQTL.R"
SnakeMake From line 145 of main/Snakefile
156
157
158
159
shell:
	"""
	cat {input.cpg_pos} | grep -v "start" > {output}
	"""
SnakeMake From line 156 of main/Snakefile
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
	shell:
		"""
		echo "Extracting SNP ids."
		zcat {input.dosage_file} | cut -f 2 > {output.snp_ids}

		echo "Getting SNP locations"
		zcat {input.dosage_file} | cut -d "\t" -f 2,3 | \
                  awk '{{OFS="\t"}} {{print $1,"{wildcards.chr}",$2}}' >> \
		  {output.snp_positions}

		echo "Remove additional columns and order by given ordering, add SNP ids"
		key=$(cat {input.ordering} | tr '\n' ':' | sed 's/.$//')

		zcat {input.dosage_file} | cut -f 1,2,3,4,5 --complement | \
		  awk -v key="$key" -f scripts/reorder.awk | \
		  paste -d "\t" {output.snp_ids} - > {output.genotypes_matrixEQTL}

		echo "Splitting matrixEQTL input files into chunks"
		split -l {params.chunk_size} {output.genotypes_matrixEQTL} {params.geno_split_prefix}
		split -l {params.chunk_size} {output.snp_positions} {params.snp_pos_split_prefix}

		"""
SnakeMake From line 181 of main/Snakefile
229
230
script:
	"scripts/run_matrixEQTL.R"
SnakeMake From line 229 of main/Snakefile
264
265
266
267
268
269
270
271
272
273
274
	shell:
		"""
		# reset the matrixEQTL header, change 'gene' to 'cpg'

		header='SNP\tcpg\tbeta\tt-stat\tp-value'
                pattern={params.input_dir}{wildcards.chr}

		(echo -e ${{header}} ; awk '{{if($5+0 <= {params.pv_cutoff}) print;}}' ${{pattern}}_*_cis_associations.out) > {output.cis}
		(echo -e ${{header}} ; awk '{{if($5+0 <= {params.pv_cutoff}) print;}}' ${{pattern}}_*_trans_associations.out) > {output.trans}

		"""
SnakeMake From line 264 of main/Snakefile
283
284
285
286
287
shell:
	"""
	cat {input.cis} > {output.cis}
	cat {input.trans} > {output.trans}
	"""
SnakeMake From line 283 of main/Snakefile
301
302
303
304
305
306
307
308
shell:
	"""
	header="SNP\tcpg\tbeta\tt-stat\tp-value"

	(echo -e ${{header}} ; fgrep -w -f {input.st1} {input.cis} ) > {output.cis_in_st1}
	(echo -e ${{header}} ; fgrep -w -f {input.st1} {input.trans} ) > {output.trans_in_st1}
	(echo -e ${{header}} ; grep -h -v SNP {output.cis_in_st1} {output.trans_in_st1} ) > {output.combined}
	"""
SnakeMake From line 301 of main/Snakefile
322
323
324
325
326
327
328
shell:
	"""
	header="SNP\tcpg\tbeta\tt-stat\tp-value"

	(echo -e ${{header}} ; awk '{{ if($5+0 <= {params.pv_cutoff} ) print; }}' {input.cis} {input.trans} ) > {output.combined}

	"""
SnakeMake From line 322 of main/Snakefile
337
338
339
340
341
342
shell:
	"""
	cut -f 2 {input.common_probes} > {output.temp_probes}
	header="SNP\tcpg\beta\t-stat\p-value"
	( echo -e ${{header}} ; fgrep -v -w -f {output.temp_probes} {input.combined} ) > {output.epic_specific}
	"""
SnakeMake From line 337 of main/Snakefile
368
369
script:
	"scripts/create_full_result_table.R"
SnakeMake From line 368 of main/Snakefile
381
382
script:
	"scripts/extract_common_probes.R"
SnakeMake From line 381 of main/Snakefile
409
410
script:
	"scripts/compare_450k_to_EPIC.R"
SnakeMake From line 409 of main/Snakefile
429
430
431
432
433
434
435
shell:
	"""
	fgrep -w -h -f {input.snps} {params.geno_pattern} > {output.geno}
	fgrep -w -h -f {input.snps} {params.snp_pos_pattern} > {output.snp_pos}
	fgrep -w -h -f {input.cpgs} {params.meth_pattern} > {output.meth}
	fgrep -w -h -f {input.cpgs} {params.cpg_pos_pattern} > {output.cpg_pos}
	"""
SnakeMake From line 429 of main/Snakefile
462
463
script:
	"scripts/run_matrixEQTL.R"
SnakeMake From line 462 of main/Snakefile
479
480
481
482
483
484
485
	shell:
		"""
		 (head -n 1 {input.cis} ; \
                   fgrep -w -f {input.cis_pairs} {input.cis} ; \
                   fgrep -w -f {input.longrange_pairs} {input.longrange} ; \
                   fgrep -w -f {input.trans_pairs} {input.trans} ) > {output}
		"""
SnakeMake From line 479 of main/Snakefile
512
513
script:
        "scripts/create_full_result_table.R"
SnakeMake From line 512 of main/Snakefile
533
534
script:
	"scripts/create_effect_comparison_plot.R"
SnakeMake From line 533 of main/Snakefile
545
546
547
548
549
550
shell:
	"""
	zcat {input} | sort -k1,1 -k3,3n | bgzip > {output} 
	# index using tabix, 1: chr; 3: pos
	tabix -s 1 -b 3 -e 3 {output}
	"""
567
568
script:
	"scripts/residualize_methylation.R"
SnakeMake From line 567 of main/Snakefile
593
594
script:
	"scripts/match_epic_with_sentinels.R"
SnakeMake From line 593 of main/Snakefile
609
610
611
612
613
614
615
616
	shell:
		"""
		fgrep -w -h -f {input.cosmo_pairs} {input.epic_associations_cis} {input.epic_associations_trans} > {output.matched}

		cut -f 1,2 {output.matched} > {output.matched_pairs}

                fgrep -w -v -f {output.matched_pairs} {input.cosmo_pairs} > {output.missing}
		"""
SnakeMake From line 609 of main/Snakefile
ShowHide 24 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/lintingyi2014/meQTL
Name: meqtl
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...