repository for UBC BIOF501 term project

public public 1yr ago Version: v1.0.0 0 bookmarks

BIOF501_term_project

Differential m6A RNA methylation with Direct RNA sequencing

1. Background

N6-methyladenosine (m6A) is the most common RNA modifications modulating functions of cellular RNA species.

xPore is a Python package identifying differentail RNA modifications from Nanopore sequencing data.

xPore can identify positions of m6A sites at single-base resolution.

2. Dataset Description

Demo samples are downloaded from link provided by xPore.

METTL3 is the most dominant m6A methylation writter.

HEK293T-METTL3-KO-rep1 is METTL3 knockout (KO) of HEK293T cell line. (m6a_test/HEK293T-METTL3-KO-rep1/fastq/)

HEK293T-WT-rep1 is wild-type (WT) of HEK293T cell line. (m6a_test/HEK293T-WT-rep1/fastq/)

By comparing two different conditions, xPore will run differnetial m6a methylation analysis.

3. Workflow Visualization

  1. Align to transcriptome

  2. Bam files indexing

  3. Resquiggle using nanopolish

  4. xPore differentail RNA modifications analysis

4. Results

Result table will be generated m6a_test/out/diffmod.table

Result table shown below:

id position kmer diff_mod_rate_KO_vs_WT pval_KO_vs_WT z_score_KO_vs_WT ... sigma2_unmod sigma2_mod conf_mu_unmod conf_mu_mod mod_assignment t-test
ENSG00000114125 141745412 GGACT -0.823318 4.241373e-115 -22.803411 ... 5.925238 18.048687 0.968689 0.195429 lower 1.768910e-19
ENSG00000159111 47824212 GGACT -0.828023 1.103790e-88 -19.965293 ... 2.686549 13.820089 0.644436 0.464059 lower 5.803242e-18
ENSG00000159111 47824138 GGGAC -0.757891 1.898161e-73 -18.128515 ... 3.965195 9.877299 0.861480 0.359984 lower 9.708552e-08

5. Cloning

Clone this repository and also submodules not manageable by conda(or pip)

git clone --recursive https://github.com/sunsetyerin/BIOF501_term_project.git

6. Python 3 environment

This repository has been developped using python3.7.3 through python3.7.6 in a miniconda environment.

Though conda is not required, it is highly recommended in order to manage the dependencies.

Using snakemake to manage the environment

From a bash terminal:

snakemake --use-conda --conda-create-envs-only --cores [cores available]

7. Snakemake usage of this repo

All scripts, steps and jobs are managed using snakemake

Input files, workflow parameters and output path are provided through config files ( configs/sample_config.yaml ).

The workflow is designed to run differentially methylated m6A RNA analysis with different conditions

# To display the jobs and commands including subworkflows
snakemake general -np
# to visualize the acyclic rulegraph
snakemake general --forceall --rulegraph | dot -Tpdf > dag.pdf
# To launch the jobs
snakemake general --use-conda

8. Perform the processing of a single sample

Config files contains the values of variables needed to perform Differential RNA modifications analysis.

Most of the data directories hosting results will be created when necessary by the snakemake manager, although it is possible to create the folders in advance.

Code Snippets

49
50
51
52
53
54
55
    shell:
        " ".join(["minimap2 -ax map-ont -uf -t {resources.cpus} --secondary=no",
                  "{params.ref} {input.fastq} > {output.aligned}",
                  "2>> {log}"])

rule sam2bam:
    """
78
79
80
81
82
    shell:
        " ".join(["samtools view -Sb {input.sam} | samtools sort -o {output.bam} - &>> {log}"])

rule bamidx:
    """
105
106
107
108
109
    shell:
        " ".join(["samtools index {input.bam_sort} > {output.bamidx} &>> {log}"])

rule nanopolish_index:
    """
134
135
136
137
138
139
140
141
    shell:
        " ".join(["nanopolish index",
                  "--directory {input.fast5_dir} {input.fastq} --verbose",
                #   "> {output.nanopolish_index}",
                  "&> >(tee {log})"])

rule nanopolish_eventalign:
    """
181
182
183
184
185
186
187
188
189
190
191
192
193
    shell:
        " ".join(["nanopolish eventalign",
                  "--reads {input.fastq}",
                  "--bam {input.bam_sort}",
                  "--genome {input.genome}",
                  "--signal-index",
                  "--scale-events",
                  "--threads {resources.cpus}",
                  "> {output.nanopolish_eventalign}",
                  "&> >(tee {log})"])

rule xpore_dataprep: 
    """
222
223
224
225
226
227
228
229
230
231
232
233
    shell:
        " ".join(["xpore dataprep", 
                  "--eventalign {input.nanopolish_eventalign}",
                  #"--gtf_or_gff {input.genomic_annotation}",
                  #"--transcript_fasta {input.transcriptome}",
                  "--out_dir {params.outdir}",
                  #"--genome",
                  "--n_processes {resources.cpus}",
                  "&> >(tee {log})"])

rule xpore_diffmode: 
    """
257
258
shell:
    "xpore diffmod --config {input.xpore_config} --n_processes {resources.cpus} 2> {log}"
ShowHide 4 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/sunsetyerin/BIOF501_term_project
Name: biof501_term_project
Version: v1.0.0
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...