Integrative Pipeline for Splicing Analysis

public public 1yr ago Version: 2 0 bookmarks

Integrative Pipeline for Splicing Analysis

Installation & Run

To use this program you must have python environment with the following programs and libraries installed:

  • snakemake

  • pysam

  • pandas

  • numpy

Such environment may be created via conda:

  1. conda create -n ipsa

  2. conda activate ipsa

  3. conda install -c conda-forge -c bioconda snakemake pysam pandas

To install the pipeline:
git clone https://github.com/Leoberium/pyIPSA.git

To run (test run if input directory is empty):

  1. Python environment with required libraries must be active

  2. Make directory with Snakefile active

  3. Run snakemake command

Test run produces empty aggregated_junction_stats.tsv file in output directory.

Options for snakemake command are available in snakemake documentation .

For Arcuda users

Just load module with Python and install libraries:

  1. module load ScriptLang/python/3.8.3

  2. pip3 install --user --upgrade snakemake pandas pysam

After that you can run pipeline using cluster engine: snakemake --cluster qsub --j <number of jobs>

Working folders

Folder in repository:

  1. config - the folder with config file, where you set up your pipeline

  2. deprecated - the folder with old scripts not used in workflow

  3. known_SJ - the folder with annotated splice junctions

  4. workflow - the folder with working scripts of the pipeline

Additional directories created

  1. genomes - the folder which stores all downloaded genomes

  2. annotations - the folder which stores all downloaded annotations

Code Snippets

 6
 7
 8
 9
10
shell:
    """
    wget -O {output.genome}.gz {params.url}
    gunzip {output.genome}.gz
    """
12
13
run:
    pysam.index(input.bam)
28
29
30
31
32
33
34
35
shell:
    "python3 -m workflow.scripts.count_junctions "
    "-i {input.bam} "
    "-k {input.known} "
    "-o {output.junctions} "
    "-l {output.library_stats} "
    "{params.primary} {params.unique} "
    "-t {params.threads}"
43
44
45
46
shell:
    "python3 -m workflow.scripts.gather_library_stats "
    "{OUTPUT_DIR}/J1  "
    "-o {output.tsv}"
59
60
61
62
63
64
65
66
shell:
    "python3 -m workflow.scripts.aggregate_junctions "
    "-i {input.junctions} "
    "-s {input.library_stats} "
    "-o {output.aggregated_junctions} "
    "--min_offset {params.min_offset} "
    "--min_intron_length {params.min_intron_length} "
    "--max_intron_length {params.max_intron_length}"
81
82
83
84
85
86
shell:
     "python3 -m workflow.scripts.annotate_junctions "
     "-i {input.aggregated_junctions} "
     "-k {input.known_sj} "
     "-f {input.genome} "
     "-o {output.annotated_junctions}"
 96
 97
 98
 99
100
101
shell:
    "python3 -m workflow.scripts.choose_strand "
    "-i {input.annotated_junctions} "
    "-r {input.ranked_list} "
    "-o {output.stranded_junctions} "
    "-s {output.junction_stats}"
109
110
111
112
113
114
115
116
117
118
119
120
121
run:
    d = defaultdict(list)
    for replicate in input.junction_stats:
        p = Path(replicate)
        name = Path(p.stem).stem
        with p.open("r") as f:
            d["replicate"].append(name)
            for line in f:
                if line.startswith("-"):
                    break
                left, right = line.strip().split(": ")
                d[left].append(right)
    df = pd.DataFrame(d)
136
137
138
139
140
141
142
shell:
     "python3 -m workflow.scripts.filter "
     "-i {input.stranded_junctions} "
     "-e {params.entropy} "
     "-c {params.total_count} "
     "{params.gtag} "
     "-o {output.filtered_junctions}"
150
151
152
153
shell:
     "python3 -m workflow.scripts.merge_junctions "
     "{input.stranded_junctions} "
     "-o {output.merged_junctions}"
12
13
14
15
16
17
shell:
    "python3 -m workflow.scripts.count_polyA "
    "-i {input.bam} "
    "-o {output.polyA} "
    "{params.primary} {params.unique} "
    "-t {params.threads}"
28
29
30
31
32
33
shell:
    "python3 -m workflow.scripts.aggregate_polyA "
    "-i {input.polyA} "
    "-s {input.library_stats} "
    "-o {output.aggregated_polyA} "
    "--min_overhang {params.min_overhang} "
13
14
15
16
17
18
19
20
shell:
    "python3 -m workflow.scripts.count_sites "
    "-i {input.bam} "
    "-j {input.junctions} "
    "-s {input.stats} "
    "-o {output.pooled_sites} "
    "{params.primary} {params.unique} "
    "-t {params.threads}"
31
32
33
34
35
36
shell:
    "python3 -m workflow.scripts.aggregate_sites "
    "-i {input.sites} "
    "-s {input.stats} "
    "-o {output.aggregated_pooled_sites} "
    "-m {params.min_offset}"
47
48
49
50
51
52
53
shell:
     "python3 -m workflow.scripts.filter "
     "-i {input.aggregated_pooled_sites} "
     "--sites "
     "-e {params.entropy} "
     "-c {params.total_count} "
     "-o {output.filtered_pooled_sites}"
13
14
15
16
17
18
19
20
shell:
    "python3 -m workflow.scripts.count_sites "
    "-i {input.bam} "
    "-j {input.junctions} "
    "-s {input.stats} "
    "-o {output.sites} "
    "{params.primary} {params.unique} "
    "-t {params.threads}"
31
32
33
34
35
36
shell:
    "python3 -m workflow.scripts.aggregate_sites "
    "-i {input.sites} "
    "-s {input.stats} "
    "-o {output.aggregated_sites} "
    "-m {params.min_offset}"
47
48
49
50
51
52
53
shell:
     "python3 -m workflow.scripts.filter "
     "-i {input.aggregated_sites} "
     "--sites "
     "-e {params.entropy} "
     "-c {params.total_count} "
     "-o {output.filtered_sites}"
52
53
54
55
56
shell:
     "python3 -m workflow.scripts.compute_rates "
     "-j {input.filtered_junctions} "
     "-s {input.filtered_sites} "
     "-o {output.rates}"
65
66
67
68
69
shell:
     "python3 -m workflow.scripts.compute_rates "
     "-j {input.filtered_junctions} "
     "-s {input.filtered_pooled_sites} "
     "-o {output.rates}"
ShowHide 20 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/Leoberium/pyIPSA
Name: pyipsa
Version: 2
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Other Versions:
Downloaded: 0
Copyright: Public Domain
License: None
Keywords:
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...