repository for UBC BIOF501 term project

public 1yr ago Version: v1.0.0 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

BIOF501_term_project

Differential m6A RNA methylation with Direct RNA sequencing

1. Background

N6-methyladenosine (m6A) is the most common RNA modifications modulating functions of cellular RNA species.

xPore is a Python package identifying differentail RNA modifications from Nanopore sequencing data.

xPore can identify positions of m6A sites at single-base resolution.

2. Dataset Description

Demo samples are downloaded from link provided by xPore.

METTL3 is the most dominant m6A methylation writter.

HEK293T-METTL3-KO-rep1 is METTL3 knockout (KO) of HEK293T cell line. (m6a_test/HEK293T-METTL3-KO-rep1/fastq/)

HEK293T-WT-rep1 is wild-type (WT) of HEK293T cell line. (m6a_test/HEK293T-WT-rep1/fastq/)

By comparing two different conditions, xPore will run differnetial m6a methylation analysis.

3. Workflow Visualization

Align to transcriptome
Bam files indexing
Resquiggle using nanopolish
xPore differentail RNA modifications analysis

4. Results

Result table will be generated m6a_test/out/diffmod.table

Result table shown below:

id position kmer diff_mod_rate_KO_vs_WT pval_KO_vs_WT z_score_KO_vs_WT ... sigma2_unmod sigma2_mod conf_mu_unmod conf_mu_mod mod_assignment t-test
ENSG00000114125 141745412 GGACT -0.823318 4.241373e-115 -22.803411 ... 5.925238 18.048687 0.968689 0.195429 lower 1.768910e-19
ENSG00000159111 47824212 GGACT -0.828023 1.103790e-88 -19.965293 ... 2.686549 13.820089 0.644436 0.464059 lower 5.803242e-18
ENSG00000159111 47824138 GGGAC -0.757891 1.898161e-73 -18.128515 ... 3.965195 9.877299 0.861480 0.359984 lower 9.708552e-08

5. Cloning

Clone this repository and also submodules not manageable by conda(or pip)

git clone --recursive https://github.com/sunsetyerin/BIOF501_term_project.git

6. Python 3 environment

This repository has been developped using python3.7.3 through python3.7.6 in a miniconda environment.

Though conda is not required, it is highly recommended in order to manage the dependencies.

Using snakemake to manage the environment

From a bash terminal:

snakemake --use-conda --conda-create-envs-only --cores [cores available]

7. Snakemake usage of this repo

All scripts, steps and jobs are managed using snakemake

Input files, workflow parameters and output path are provided through config files ( configs/sample_config.yaml ).

The workflow is designed to run differentially methylated m6A RNA analysis with different conditions

# To display the jobs and commands including subworkflows
snakemake general -np
# to visualize the acyclic rulegraph
snakemake general --forceall --rulegraph | dot -Tpdf > dag.pdf
# To launch the jobs
snakemake general --use-conda

8. Perform the processing of a single sample

Config files contains the values of variables needed to perform Differential RNA modifications analysis.

Most of the data directories hosting results will be created when necessary by the snakemake manager, although it is possible to create the folders in advance.

Code Snippets

    shell:
        " ".join(["minimap2 -ax map-ont -uf -t {resources.cpus} --secondary=no",
                  "{params.ref} {input.fastq} > {output.aligned}",
                  "2>> {log}"])

rule sam2bam:
    """

SnakeMake Minimap2 From line 49 of snakefiles/methylation.snake

    shell:
        " ".join(["samtools view -Sb {input.sam} | samtools sort -o {output.bam} - &>> {log}"])

rule bamidx:
    """

SnakeMake SAMtools From line 78 of snakefiles/methylation.snake

    shell:
        " ".join(["samtools index {input.bam_sort} > {output.bamidx} &>> {log}"])

rule nanopolish_index:
    """

SnakeMake SAMtools From line 105 of snakefiles/methylation.snake

    shell:
        " ".join(["nanopolish index",
                  "--directory {input.fast5_dir} {input.fastq} --verbose",
                #   "> {output.nanopolish_index}",
                  "&> >(tee {log})"])

rule nanopolish_eventalign:
    """

SnakeMake nanopolish From line 134 of snakefiles/methylation.snake

    shell:
        " ".join(["nanopolish eventalign",
                  "--reads {input.fastq}",
                  "--bam {input.bam_sort}",
                  "--genome {input.genome}",
                  "--signal-index",
                  "--scale-events",
                  "--threads {resources.cpus}",
                  "> {output.nanopolish_eventalign}",
                  "&> >(tee {log})"])

rule xpore_dataprep: 
    """

SnakeMake nanopolish From line 181 of snakefiles/methylation.snake

    shell:
        " ".join(["xpore dataprep", 
                  "--eventalign {input.nanopolish_eventalign}",
                  #"--gtf_or_gff {input.genomic_annotation}",
                  #"--transcript_fasta {input.transcriptome}",
                  "--out_dir {params.outdir}",
                  #"--genome",
                  "--n_processes {resources.cpus}",
                  "&> >(tee {log})"])

rule xpore_diffmode: 
    """

SnakeMake From line 222 of snakefiles/methylation.snake

shell:
    "xpore diffmod --config {input.xpore_config} --n_processes {resources.cpus} 2> {log}"

SnakeMake From line 257 of snakefiles/methylation.snake

ShowHide 4 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://github.com/sunsetyerin/BIOF501_term_project

Name: biof501_term_project

Version: v1.0.0

Badge:

Insert copied code into your website to add a link to this workflow.

License: None

Keywords:

Minimap2 nanopolish SAMtools Snakemake

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free