Sparse Signaling Pathway Sampling: MCMC for signaling pathway inference

public 1yr ago Version: v0.1.4 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation, topic

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Sparse Signaling Pathway Sampling

Code related to the manuscript Inferring signaling pathways with probabilistic programming (Merrell & Gitter, 2020) Bioinformatics, 36:Supplement_2, i822–i830.

This repository contains the following:

SSPS : A method that infers relationships between variables using time series data.
- Modeling assumption: the time series data is generated by a Dynamic Bayesian Network (DBN).
- Inference strategy: MCMC sampling over possible DBN structures.
- Implementation: written in Julia, using the Gen probabilistic programming language
Analysis code:
- simulation studies;
- convergence analyses;
- evaluation on experimental data;
- a Snakefile for managing all of the analyses.

Installation and basic setup

(If you plan to reproduce all of the analyses, then make sure you're on a host with access to plenty of CPUs. Ideally, you would have access to a cluster of some sort.)

Clone this repository

git clone git@github.com:gitter-lab/ssps.git

Install Julia 1.6 (and all Julia dependencies)

Download the correct Julia binary here: https://julialang.org/downloads/.
E.g., for Linux x86_64:

$ wget https://julialang-s3.julialang.org/bin/linux/x64/1.6/julia-1.6.7-linux-x86_64.tar.gz 
$ tar -xvzf julia-1.6.7-linux-x86_64.tar.gz

Find additional installation instructions here: https://julialang.org/downloads/platform/.
Use Pkg -- Julia's package manager -- to install the project's julia dependencies:

$ cd ssps/SSPS
$ julia --project=. 
 _
 _ _ _(_)_ | Documentation: https://docs.julialang.org
 (_) | (_) (_) |
 _ _ _| |_ __ _ | Type "?" for help, "]?" for Pkg help.
 | | | | | | |/ _` | |
 | | |_| | | | (_| | | Version 1.6.7 (2022-07-19)
 _/ |\__'_|_|_|\__'_| | Official https://julialang.org/ release
|__/ |
julia> using Pkg
julia> Pkg.instantiate()
julia> exit()

Reproducing the analyses

In order to reproduce the analyses, you will need some extra bits of software.

We use Snakemake -- a python package -- to manage the analysis workflow.
We use some other python packages to postprocess the results, produce plots, etc.
Some of the baseline methods are implemented in R or MATLAB.

Hence, the analyses entail some extra setup:

Install python dependencies (using conda )
- For the purposes of these instructions, we assume you have Anaconda3 or Miniconda3 installed, and have access to the conda environment manager.
  (We recommend using Miniconda ; find full installation instructions here .)
- We recommend setting up a dedicated virtual environment for this project. The following will create a new environment named ssps and install the required python packages:
```
$ conda create -n ssps -c conda-forge pandas matplotlib numpy bioconda::snakemake-minimal
$ conda activate ssps
(ssps) $
```
- If you plan to reproduce the analyses on a cluster , then install cookiecutter and the complete version of snakemake
```
(ssps) $ conda install -c conda-forge cookiecutter bioconda::snakemake
```
and find the appropriate Snakemake profile from this list: https://github.com/Snakemake-Profiles/doc install the Snakemake profile using cookiecutter:
```
(ssps) $ cookiecutter https://github.com/Snakemake-Profiles/htcondor.git
```
replacing the example with the desired profile.
Install R packages
Check whether MATLAB is installed.
- If you don't have MATLAB, then you won't be able to run the exact DBN inference method of Hill et al., 2012 .
- You'll need to comment out the hill method wherever it appears in analysis_config.yaml .

After completing this additional setup, we are ready to run the analyses .

Make any necessary modifications to the configuration file: analysis_config.yaml . This file controls the space of hyperparameters and datasets explored in the analyses.
Run the analyses using snakemake :
- If you're running the analyses on your local host, simply move to the directory containing Snakefile and call snakemake .
```
(ssps) $ cd ssps
(ssps) $ snakemake
```
- Since Julia is a dynamically compiled language, some time will be devoted to compilation when you run SSPS for the first time. You may see some warnings in stdout -- this is normal.
- If you're running the analyses on a cluster, call snakemake with the same Snakemake profile you found here :
```
(ssps) $ cd ssps
(ssps) $ snakemake --profile YOUR_PROFILE_NAME
```
(You will probably need to edit the job submission parameters in the profile's config.yaml file.)
Relax. It will take tens of thousands of cpu-hours to run all of the analyses.

Running SSPS on your data

Follow these steps to run SSPS on your dataset. You will need

a CSV file (tab separated) containing your time series data
a CSV file (comma separated) containing your prior edge confidences.
Optional: a JSON file containing a list of variable names (i.e., node names).

Install the python dependencies if you haven't already. Find detailed instructions above.
cd to the run_ssps directory
Configure the parameters in ssps_config.yaml as appropriate
Run Snakemake: $ snakemake --cores 1 . Increase 1 to increase the maximum number of CPU cores to be used.

A note about parallelism

SSPS allows two levels of parallelism: (1) at the Markov chain level and (2) at the iteration level.

Chain-level parallelism is provided via Snakemake. For example, Snakemake can run 4 chains simultaneously if you specify --cores 4 at the command line: $ snakemake --cores 4 . In essence, this just creates 4 instances of SSPS that run simultaneously.
Iteration-level parallelism is provided by Julia's multi-threading features . The number of threads available to a SSPS instance is specified by an environment variable: JULIA_NUM_THREADS .
The total number of CPUs used by your SSPS jobs is the product of Snakemake's --cores parameter and Julia's JULIA_NUM_THREADS environment variable. Concretely: if we run snakemake --cores 2 and have JULIA_NUM_THREADS=4 , then up to 8 CPUs may be used at one time by the SSPS jobs.

Licenses

The dream-challenge data is described in Hill et al., 2016 and is originally from Synapse .

Code Snippets

import pandas as pd
import numpy as np
import argparse
import json

def build_weighted_adj(eda_filename):

    df = pd.read_csv(eda_filename, sep=" ")
    df.reset_index(inplace=True)
    antibodies = df["level_0"].unique()
    print("ANTIBODIES: ", antibodies)
    antibody_map = { a:i for (i,a) in enumerate(antibodies) }

    V = len(antibody_map)
    adj = np.zeros((V,V))

    for (_, row) in df.iterrows():
        a = row["level_0"]
        b = row["level_2"]
        adj[antibody_map[a],antibody_map[b]] = row["EdgeScore"]

    print(adj)

    antibody_ls = [0 for i in antibody_map]
    for (name, idx) in antibody_map.items():
        antibody_ls[idx] = name

    return adj, antibody_ls



if __name__=="__main__":

    parser = argparse.ArgumentParser(description="")
    parser.add_argument("eda_file", help="path to a DREAM challenge time series CSV file")
    parser.add_argument("output_file", help="path where the output CSV will be written")
    parser.add_argument("antibody_file", help="path to output JSON file containing the indices of antibodies")
    args = parser.parse_args()

    adj_mat, antibody_ls = build_weighted_adj(args.eda_file)

    df = pd.DataFrame(adj_mat)
    df.to_csv(args.output_file, sep=",", index=False, header=False) 

    json.dump(antibody_ls, open(args.antibody_file, "w"))

Python Pandas numpy JSON From line 6 of scripts/preprocess_dream_prior.py

import pandas as pd
import os
import argparse
import numpy as np

EXCLUDE = {"foxo3a_ps318_s321", "taz_ps89"}


def to_minutes(timestr):
    """
    Convert a time string (e.g., '10min') to a floating point
    number of minutes (e.g., 60.0)
    """

    if timestr[-3:] == "min":
        num = float(timestr[:-3])
    elif timestr[-2:] == "hr":
        num = float(timestr[:-2]) * 60.0

    return num


def get_antibody_row(df):
    for i, row in df.iterrows():
        if "Antibody Name" in row.values:
            return i
    return -1


def get_start_idxs(df):
    for i, row in df.iterrows():
        cols = np.where(row.values == "Timepoint")
        if len(cols[0]) > 0:
            return i, cols[0][0]+1
    return -1, -1


def load_dream_ts(csv_path, keep_start=False):
    """
    Read in a DREAM challenge time series CSV file
    and return a DataFrame with appropriate columns.
    """

    # the original CSV is strangely formatted -- it has
    # an extra column and a multi-line header.
    df = pd.read_csv(csv_path)
    df.drop("Unnamed: 0", axis=1, inplace=True)

    antibody_row = get_antibody_row(df)
    data_start_row, data_start_col = get_start_idxs(df)

    df.iloc[data_start_row, data_start_col:] = df.iloc[antibody_row, data_start_col:].values
    df.columns = df.loc[data_start_row,:].values
    df = df.loc[(data_start_row + 1):,:]
    df.index = range(df.shape[0])

    # The original format doesn't give a "Stimulus" label
    # at timepoint 0; we'll restore the label if necessary
    if keep_start:
        print("TOO BAD.") 
        df = df[df["Stimulus"].isnull() == False]
    else:
        # Otherwise, just remove these rows.
        df = df[df["Stimulus"].isnull() == False]

    return df


def create_standard_dataframe(dream_df, ignore_stim=False,
                                        ignore_inhib=False):
    """
    For each context contained in `dream_df`, create a time series
    dataframe.
    """

    context_cols = []
    if not ignore_inhib:
        context_cols.append("Inhibitor")
    if not ignore_stim:
        context_cols.append("Stimulus")

    joiner = lambda x: "_".join(x)

    dream_df["context"] = df[context_cols].apply(joiner, axis=1) 
    contexts = dream_df["context"].unique()

    dream_df.rename(columns={"Timepoint": "timestep"}, inplace=True)
    dream_df.loc[:,"timeseries"] = dream_df[["Inhibitor","Stimulus"]].apply(joiner, axis=1)

    dream_df.sort_values(["context","timeseries","timestep"], inplace=True)

    keep_cols = ["context", "timeseries", "timestep"]
    idx_cols = keep_cols+["Inhibitor", "Stimulus"]

    # IMPORTANT: standard order of variables = lexicographic
    var_cols = [c for c in dream_df.columns if c not in idx_cols]
    var_cols = [c for c in var_cols if c.lower() not in EXCLUDE] 

    dream_df = dream_df[keep_cols + var_cols]

    dream_df = dream_df.astype({v:"float64" for v in var_cols})
    dream_df[var_cols] = dream_df[var_cols].applymap(np.log)

    # Deduplicate by taking means... not sure if this is the right way to go
    gp = dream_df.groupby(keep_cols)
    dream_df = gp.mean()
    dream_df.reset_index(inplace=True)

    return dream_df 


if __name__=="__main__":

    # Get command line args
    parser = argparse.ArgumentParser(description="")
    parser.add_argument("timeseries_file", help="path to a DREAM challenge time series CSV file")
    parser.add_argument("output_dir", help="directory where the output CSVs will be written")
    parser.add_argument("--ignore-stim", help="Do NOT treat different stimuli as different contexts.",
                        action="store_true")
    parser.add_argument("--ignore-inhibitor", help="Do NOT treat different inhibitors as different contexts.", 
                        action="store_true")
    parser.add_argument("--keep-start", help="Keep the time series data at timepoint 0",
                        action="store_true")
    args = parser.parse_args()

    ts_filename = str(args.timeseries_file)
    ignore_stim = args.ignore_stim
    ignore_inhib = args.ignore_inhibitor

    # Load the DREAM challenge data
    df = load_dream_ts(ts_filename, keep_start=args.keep_start)

    # transform these columns into more useful forms
    df["Timepoint"] = df["Timepoint"].map(to_minutes)
    df.loc[df["Inhibitor"].isnull(), "Inhibitor"] = "nothing"
    df.loc[df["Stimulus"].isnull(), "Stimulus"] = "nothing"

    # Convert the data to (context-specific) time series dataframes,
    # formatted correctly for our analysis
    new_ts_df = create_standard_dataframe(df, ignore_stim=ignore_stim, 
                                              ignore_inhib=ignore_inhib)

    in_fname = os.path.basename(ts_filename)
    cell_line = in_fname.split("_")[0] 

    contexts = new_ts_df["context"].unique()

    for ctxt in contexts:
        ctxt_str = "cl={}_stim={}".format(cell_line, ctxt)
        out_df = new_ts_df[new_ts_df["context"] == ctxt]
        out_df.iloc[:,1:].to_csv(os.path.join(str(args.output_dir), ctxt_str+".csv"),
                                 sep="\t", index=False)

Python Pandas numpy From line 8 of scripts/preprocess_dream_ts.py

import pandas as pd
import numpy as np
import matplotlib as mpl

mpl.use('Agg')
mpl.rcParams['text.usetex'] = True

from matplotlib import pyplot as plt
import sys
import os
import argparse
import script_util as su



def compute_t_statistics(df, test_key_cols, sample_col, qty_cols,
                             method_col, baseline_name):

    methods = df[method_col].unique().tolist()
    baseline = df[df[method_col] == baseline_name]

    key_arrs = [df[k].unique() for k in test_key_cols+[method_col]]

    df.set_index(test_key_cols+[method_col], inplace=True)
    baseline.set_index(test_key_cols, inplace=True)

    result_df = pd.DataFrame(index=pd.MultiIndex.from_product(key_arrs),
                             columns=qty_cols)

    for ks in result_df.index:

        diffs = df.loc[ks, qty_cols] - baseline.loc[ks[:-1], qty_cols].values
        diffs.reset_index(inplace=True)

        gp = diffs.groupby(by=test_key_cols)
        means = gp[qty_cols].mean()
        stds = gp[qty_cols].std()
        n = means.shape[0]

        f = lambda x: x / np.sqrt(n)
        ses = stds.apply(f)
        ts = means / ses

        result_df.loc[ks, qty_cols] = ts.values

    result_df.index.rename(test_key_cols+[method_col], inplace=True)
    result_df.reset_index(inplace=True)
    return result_df


def aggregate_scores(table, key_cols, score_cols):

    gp = table.groupby(key_cols)
    agg = gp[score_cols].mean()
    agg.reset_index(inplace=True)

    return agg 


def make_heatmap(ax, relevant, x_col, y_col, qty_col, **kwargs):

    x_vals = relevant[x_col].unique()
    y_vals = relevant[y_col].unique()

    grid = np.zeros((len(y_vals), len(x_vals)))
    for i, x in enumerate(x_vals):
        for j, y in enumerate(y_vals):
            grid[j,i] = relevant.loc[(relevant[x_col] == x) & (relevant[y_col] == y), qty_col]

    img = ax.imshow(grid, origin="lower", **kwargs)
    ax.set_xticks(list(range(len(x_vals))))
    ax.set_xticklabels(x_vals)
    ax.set_yticks(list(range(len(y_vals))))
    ax.set_yticklabels(y_vals)
    #ax.set_xlim([-0.5, len(x_vals)-0.5])
    #ax.set_ylim([-0.5, len(y_vals)-0.5])

    #ax.label_outer()
    return img


def subplot_heatmaps(qty_df, macro_x_col, macro_y_col, 
                     micro_x_col, micro_y_col, qty_col, score_str,
                     output_filename="simulation_scores.png",
                     macro_x_vals=None, macro_y_vals=None,
                     cmap="Greys", vmin=None, vmax=None):

    if macro_x_vals is None:
        macro_x_vals = qty_df[macro_x_col].unique().tolist()

    if macro_y_vals is None:
        macro_y_vals = qty_df[macro_y_col].unique().tolist()

    n_rows = len(macro_y_vals)
    n_cols = len(macro_x_vals)

    fig, axarr = plt.subplots(n_rows, n_cols, 
                              sharey=True, sharex=True, 
                              figsize=(2.0*n_cols,2.0*n_rows))


    in_macro_y_vals = lambda x: x in macro_y_vals
    relevant_scores = qty_df.loc[qty_df[macro_y_col].map(in_macro_y_vals) , qty_col]

    if vmin is None:
        vmin = relevant_scores.quantile(0.05)
    if vmax is None:
        vmax = relevant_scores.quantile(0.95)

    nrm = mpl.colors.Normalize(vmin=vmin,vmax=vmax)
    mappable = mpl.cm.ScalarMappable(norm=nrm, cmap=cmap)

    imgs = []

    # Iterate through the different subplots
    for i, myv in enumerate(macro_y_vals):
        for j, psize in enumerate(macro_x_vals):

            ax = axarr[i][j]
            relevant = qty_df.loc[(qty_df[macro_y_col] == myv) & (qty_df[macro_x_col] == psize),:]

            img = make_heatmap(ax, relevant, micro_x_col, micro_y_col, qty_col, norm=nrm, cmap=cmap)
            imgs.append(img)

            #ax.set_xlim([0,3])
            #ax.set_ylim([0,3])
            if i == len(macro_y_vals)-1:
                ax.set_xlabel("${}$\n$V$ = {:d}".format(micro_x_col, int(psize)),family='serif')
            if j == 0:
                ax.set_ylabel("{}\n${}$".format(su.NICE_NAMES[myv], micro_y_col),family='serif')


    fig.suptitle("Simulation Study: {}".format(su.NICE_NAMES[score_str]),family='serif',fontsize=16)
    plt.tight_layout(rect=[0.0,0.0,1,0.95])
    fig.colorbar(imgs[-1], ax=axarr, location="top", shrink=0.8, pad=0.05, fraction=0.05, use_gridspec=True)

    plt.savefig(output_filename, dpi=300)#, bbox_inches="tight")


if __name__=="__main__":

    args = sys.argv
    infile = args[1]
    mean_outfile = args[2]
    t_outfile = args[3]
    score_str = args[4]
    baseline_name = args[5]
    methods = args[6:]

    table = pd.read_csv(infile, sep="\t") 

    key_cols = ["v","r","a"]
    sample_col = "replicate"
    score_cols = [score_str]
    method_col = "method"

    aggregate_table = aggregate_scores(table, key_cols + [method_col], score_cols)
    #aggregate_table.to_csv("means.tsv", sep="\t")

    t_stat_table = compute_t_statistics(table, key_cols, sample_col, score_cols,
                                        method_col, baseline_name)
    #t_stat_table.to_csv("t_statistics.tsv", sep="\t")

    print(mean_outfile)
    subplot_heatmaps(aggregate_table, "v", "method", "r", "a", score_str, score_str,
                     output_filename=mean_outfile, macro_y_vals=methods+[baseline_name],
                     cmap="Greys")

    print(t_outfile)
    subplot_heatmaps(t_stat_table, "v", "method", "r", "a", score_str, "t_stat_{}".format(score_str), 
                     output_filename=t_outfile, macro_y_vals=methods,
                     cmap="RdBu", vmin=-5.0, vmax=5.0) 

Python Pandas numpy matplotlib From line 6 of scripts/sim_heatmap.py

import script_util as su
import pandas as pd
import sys
import numpy as np
import os

if __name__=="__main__":

    input_files = sys.argv[1:-1]
    output_file = sys.argv[-1]

    AUCPR_STR = "aucpr"
    AUCROC_STR = "aucroc"

    table = su.tabulate_results(input_files, [[AUCPR_STR],[AUCROC_STR]])

    methods = [f.split(os.path.sep)[-2] for f in input_files]
    table["method"] = methods

    table.to_csv(output_file, index=False, sep="\t")

Python Pandas numpy From line 1 of scripts/tabulate_scores.py

shell:
    "python scripts/tabulate_scores.py {input.scores} {output}"

SnakeMake From line 148 of master/Snakefile

shell:
    "python scripts/tabulate_scores.py {input.mcmc} {input.baselines} {output}"

SnakeMake From line 159 of master/Snakefile

shell:
    "julia --project={JULIA_PROJ_DIR} {input.simulator} {wildcards.v} {wildcards.t} {SIM_M} {wildcards.r} {wildcards.a} {POLY_DEG} {output.ref} {output.true} {output.ts}"

SnakeMake From line 173 of master/Snakefile

shell:
    "julia --project={JULIA_PROJ_DIR} {input.scorer} --truth-file {input.tr_dg} --pred-file {input.pp_res} --output-file {output.out}"

SnakeMake From line 187 of master/Snakefile

shell:
    "python scripts/sim_heatmap.py {input} {output.mean} {output.t} {wildcards.score} prior_baseline {SIM_METHODS}" 

SnakeMake From line 200 of master/Snakefile

shell:
    "julia --project={JULIA_PROJ_DIR} {input.pp} --chain-samples {input.raw} --output-file {output.out} --burnin {CONV_BURNIN}"

SnakeMake From line 227 of master/Snakefile

shell:
    "julia --project={JULIA_PROJ_DIR} {input.method} {input.ts_file} {input.ref_dg} {output} {CONV_TIMEOUT}"\
    +" --n-steps {CONV_MAX_SAMPLES} --regression-deg {wildcards.d}"\
    +" --lambda-prop-std {wildcards.lstd}"

SnakeMake From line 242 of master/Snakefile

shell:
    "julia --project={JULIA_PROJ_DIR} {input.pp} --chain-samples {input.raw}  --output-file {output.out}"

SnakeMake From line 259 of master/Snakefile

shell:
    "julia --project={JULIA_PROJ_DIR} {input.method} {input.ts_file} {input.ref_dg} {output} {SIM_TIMEOUT}"\
    +" --regression-deg {wildcards.d} --n-steps {SIM_MAX_SAMPLES}"\
    +" --lambda-prop-std 3.0 --large-indeg 15.0"

SnakeMake From line 273 of master/Snakefile

shell:
    "julia --project={JULIA_PROJ_DIR} {input.pp} --chain-samples {input.raw}  --output-file {output.out}"

SnakeMake From line 293 of master/Snakefile

shell:
    "julia --project={JULIA_PROJ_DIR} {input.method} {input.ts_file} {input.ref_dg} {output} {SIM_TIMEOUT}"\
    +" --regression-deg 1 --n-steps {SIM_MAX_SAMPLES}"\
    +" --lambda-prop-std 3.0 --large-indeg 15.0 --proposal uniform"

SnakeMake From line 308 of master/Snakefile

shell:
    "Rscript {FUNCH_DIR}/funchisq_wrapper.R {input.ts_file} {output}"

SnakeMake From line 330 of master/Snakefile

shell:
    "matlab -nodesktop -nosplash -nojvm -singleCompThread -r \'cd(\"{HILL_DIR}\"); try, hill_dbn_wrapper(\"{input.ts_file}\", \"{input.ref_dg}\", \"{output}\", -1, \"auto\", {SIM_TIMEOUT}), catch e, quit(1), end, quit\'"

SnakeMake From line 353 of master/Snakefile

shell:
    "matlab -nodesktop -nosplash -nojvm -singleCompThread -r \'cd(\"{HILL_DIR}\"); try, hill_dbn_wrapper(\"{input.ts}\", \"{input.ref}\", \"{output}\", {wildcards.deg}, \"{wildcards.mode}\", {HILL_TIME_TIMEOUT}), catch e, quit(1), end, quit\'"

SnakeMake From line 367 of master/Snakefile

shell:
    "python {SCRIPT_DIR}/tabulate_timetest_results.py {input} {output}"

SnakeMake From line 376 of master/Snakefile

shell:
    "julia --project={JULIA_PROJ_DIR} {input.method} {input.ts} {input.ref} {output}"

SnakeMake From line 396 of master/Snakefile

shell:
    "julia --project={JULIA_PROJ_DIR} {input.method} {input.ref} {output}"

SnakeMake From line 414 of master/Snakefile

shell:
    "python {input.scorer} {input.preds} {input.tr_desc} {input.ab} {output.out}"

SnakeMake From line 444 of master/Snakefile

shell:
    "julia --project={JULIA_PROJ_DIR} {input.pp} --chain-samples {input.raw} --output-file {output.out}"\
    +" --stop-points {DREAM_STOPPOINTS}"

SnakeMake From line 459 of master/Snakefile

shell:
    "julia --project={JULIA_PROJ_DIR} {input.method} {input.ts_file} {input.ref_dg} {output} {DREAM_TIMEOUT}"\
    +" --n-steps {CONV_MAX_SAMPLES} --regression-deg {wildcards.d}"\
    +" --lambda-prop-std {wildcards.lstd} --large-indeg {MCMC_INDEG}"

SnakeMake From line 475 of master/Snakefile

shell:
    "python {input.scorer} {input.preds} {input.tr_desc} {input.ab} {output.out}"

SnakeMake From line 492 of master/Snakefile

shell:
    "Rscript {input.method} {input.ts_file} {output}"

SnakeMake From line 506 of master/Snakefile

shell:
    "matlab -nodesktop -nosplash -nojvm -singleCompThread -r \'cd(\"{HILL_DIR}\"); try, hill_dbn_wrapper(\"{input.ts_file}\", \"{input.ref_dg}\", \"{output}\", -1, \"auto\", {SIM_TIMEOUT}), catch e, quit(1), end, quit\'"

SnakeMake From line 520 of master/Snakefile

shell:
    "julia --project={JULIA_PROJ_DIR} {input.method} {input.ts} {input.ref} {output}"

SnakeMake From line 535 of master/Snakefile

shell:
    "julia --project={JULIA_PROJ_DIR} {input.method} {input.ref} {output}"

SnakeMake From line 549 of master/Snakefile

shell:
    "python scripts/preprocess_dream_ts.py {input} {DREAM_PREP_TS_DIR} --ignore-inhibitor"

SnakeMake From line 558 of master/Snakefile

shell:
    "python scripts/preprocess_dream_prior.py {input} {output.edges} {output.ab}"

SnakeMake From line 568 of master/Snakefile

ShowHide 29 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://github.com/gitter-lab/ssps

Name: ssps

Version: v0.1.4

Badge:

Insert copied code into your website to add a link to this workflow.

License: MIT License

Keywords:

JSON Pandas Snakemake matplotlib numpy

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free