Integrative Pipeline for Splicing Analysis

public 1yr ago Version: 2 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Integrative Pipeline for Splicing Analysis

Installation & Run

To use this program you must have python environment with the following programs and libraries installed:

snakemake
pysam
pandas
numpy

Such environment may be created via conda:

conda create -n ipsa
conda activate ipsa
conda install -c conda-forge -c bioconda snakemake pysam pandas

To install the pipeline:
git clone https://github.com/Leoberium/pyIPSA.git

To run (test run if input directory is empty):

Python environment with required libraries must be active
Make directory with Snakefile active
Run snakemake command

Test run produces empty aggregated_junction_stats.tsv file in output directory.

Options for snakemake command are available in snakemake documentation .

For Arcuda users

Just load module with Python and install libraries:

module load ScriptLang/python/3.8.3
pip3 install --user --upgrade snakemake pandas pysam

After that you can run pipeline using cluster engine: snakemake --cluster qsub --j <number of jobs>

Working folders

Folder in repository:

config - the folder with config file, where you set up your pipeline
deprecated - the folder with old scripts not used in workflow
known_SJ - the folder with annotated splice junctions
workflow - the folder with working scripts of the pipeline

Additional directories created

genomes - the folder which stores all downloaded genomes
annotations - the folder which stores all downloaded annotations

Code Snippets

shell:
    """
    wget -O {output.genome}.gz {params.url}
    gunzip {output.genome}.gz
    """

SnakeMake From line 6 of rules/genome.smk

run:
    pysam.index(input.bam)

SnakeMake From line 12 of rules/junctions.smk

shell:
    "python3 -m workflow.scripts.count_junctions "
    "-i {input.bam} "
    "-k {input.known} "
    "-o {output.junctions} "
    "-l {output.library_stats} "
    "{params.primary} {params.unique} "
    "-t {params.threads}"

SnakeMake From line 28 of rules/junctions.smk

shell:
    "python3 -m workflow.scripts.gather_library_stats "
    "{OUTPUT_DIR}/J1  "
    "-o {output.tsv}"

SnakeMake From line 43 of rules/junctions.smk

shell:
    "python3 -m workflow.scripts.aggregate_junctions "
    "-i {input.junctions} "
    "-s {input.library_stats} "
    "-o {output.aggregated_junctions} "
    "--min_offset {params.min_offset} "
    "--min_intron_length {params.min_intron_length} "
    "--max_intron_length {params.max_intron_length}"

SnakeMake From line 59 of rules/junctions.smk

shell:
     "python3 -m workflow.scripts.annotate_junctions "
     "-i {input.aggregated_junctions} "
     "-k {input.known_sj} "
     "-f {input.genome} "
     "-o {output.annotated_junctions}"

SnakeMake From line 81 of rules/junctions.smk

shell:
    "python3 -m workflow.scripts.choose_strand "
    "-i {input.annotated_junctions} "
    "-r {input.ranked_list} "
    "-o {output.stranded_junctions} "
    "-s {output.junction_stats}"

SnakeMake From line 96 of rules/junctions.smk

run:
    d = defaultdict(list)
    for replicate in input.junction_stats:
        p = Path(replicate)
        name = Path(p.stem).stem
        with p.open("r") as f:
            d["replicate"].append(name)
            for line in f:
                if line.startswith("-"):
                    break
                left, right = line.strip().split(": ")
                d[left].append(right)
    df = pd.DataFrame(d)

SnakeMake From line 109 of rules/junctions.smk

shell:
     "python3 -m workflow.scripts.filter "
     "-i {input.stranded_junctions} "
     "-e {params.entropy} "
     "-c {params.total_count} "
     "{params.gtag} "
     "-o {output.filtered_junctions}"

SnakeMake From line 136 of rules/junctions.smk

shell:
     "python3 -m workflow.scripts.merge_junctions "
     "{input.stranded_junctions} "
     "-o {output.merged_junctions}"

SnakeMake From line 150 of rules/junctions.smk

shell:
    "python3 -m workflow.scripts.count_polyA "
    "-i {input.bam} "
    "-o {output.polyA} "
    "{params.primary} {params.unique} "
    "-t {params.threads}"

SnakeMake From line 12 of rules/polyA.smk

shell:
    "python3 -m workflow.scripts.aggregate_polyA "
    "-i {input.polyA} "
    "-s {input.library_stats} "
    "-o {output.aggregated_polyA} "
    "--min_overhang {params.min_overhang} "

SnakeMake From line 28 of rules/polyA.smk

shell:
    "python3 -m workflow.scripts.count_sites "
    "-i {input.bam} "
    "-j {input.junctions} "
    "-s {input.stats} "
    "-o {output.pooled_sites} "
    "{params.primary} {params.unique} "
    "-t {params.threads}"

SnakeMake From line 13 of rules/pooled_sites.smk

shell:
    "python3 -m workflow.scripts.aggregate_sites "
    "-i {input.sites} "
    "-s {input.stats} "
    "-o {output.aggregated_pooled_sites} "
    "-m {params.min_offset}"

SnakeMake From line 31 of rules/pooled_sites.smk

shell:
     "python3 -m workflow.scripts.filter "
     "-i {input.aggregated_pooled_sites} "
     "--sites "
     "-e {params.entropy} "
     "-c {params.total_count} "
     "-o {output.filtered_pooled_sites}"

SnakeMake From line 47 of rules/pooled_sites.smk

shell:
    "python3 -m workflow.scripts.count_sites "
    "-i {input.bam} "
    "-j {input.junctions} "
    "-s {input.stats} "
    "-o {output.sites} "
    "{params.primary} {params.unique} "
    "-t {params.threads}"

SnakeMake From line 13 of rules/sites.smk

shell:
    "python3 -m workflow.scripts.aggregate_sites "
    "-i {input.sites} "
    "-s {input.stats} "
    "-o {output.aggregated_sites} "
    "-m {params.min_offset}"

SnakeMake From line 31 of rules/sites.smk

shell:
     "python3 -m workflow.scripts.filter "
     "-i {input.aggregated_sites} "
     "--sites "
     "-e {params.entropy} "
     "-c {params.total_count} "
     "-o {output.filtered_sites}"

SnakeMake From line 47 of rules/sites.smk

shell:
     "python3 -m workflow.scripts.compute_rates "
     "-j {input.filtered_junctions} "
     "-s {input.filtered_sites} "
     "-o {output.rates}"

SnakeMake From line 52 of workflow/Snakefile

shell:
     "python3 -m workflow.scripts.compute_rates "
     "-j {input.filtered_junctions} "
     "-s {input.filtered_pooled_sites} "
     "-o {output.rates}"