Snakemake pipeline for scATAC metacell utilities such as primed and lineage specific scores, chromVAR, in-silico ChIP

public 1yr ago Version: v1.0.0 0 bookmarks

View Workflow

This repository contains a snakemake pipeline for utilities with ATAC and RNA metacells inferred using SEACells algorithm (ref-1). The following outputs are generated

In-silico ChIP (ref-2) TF targets
ChromVAR results (ref-3) using In-silico ChIP results
Primed and lineage specific peaks in single-cell data as described in (ref-4)

Overview:

Below is a DAG showing the rule dependencies in the snakemake pipeline. View the Snakefile to see details on each rule.

DAG of workflow

peak_scores generates primed and lineage-specific accessibility scores. compute_ins_chip performs in-silico ChIP analysis diff_acc can be used for differential accessibility between cell types using metacells chromvar generatees chromVAR results

Output of the pipeline

The output of the pipeline is to update the user provided anndata objects following the style in scanpy. Given the extensive changes, we recommend using copies of your anndata objects to avoid corrupting your original objects.

The input single-cell anndata ( *_sc_ad ) and metacell anndata ( *_mc_ad ) objects are updated with the following information.

atac_mc_ad.uns['gp_corrs'] : Metacell level peak accessibility and gene expression correlations computed using SEACells
atac_mc_ad.varm['<celltypeA>_<celltypeB>_diff_acc'] : Differential accessibiliy between cell type A and cell type B using edgeR
atac_sc_ad.varm['OpenPeaks'] : Matrix indicating which peak is open in each cell type
atac_sc_ad.layers[f'<Target cell type>_primed'] : ATAC peaks primed in Target cell type
atac_sc_ad.layers[f'<Target cell type>_lineage_specific'] : ATAC lineage specific peaks in Target cell type
rna_sc_ad.layers[f'<Target cell type>_primed'] : Primed accessiblity scores for each gene in the target cell type
rna_sc_ad.layers[f'<Target cell type>_lineage_specific'] : Lineage specific accessiblity scores for each gene in Target cell type
atac_sc_ad.varm['FIMO'], atac_sc_ad.uns['FIMOcolumns'] : FIMO motif enrichment for ATAC peaks
atac_sc_ad.varm['InSilicoChip'], atac_sc_ad.uns['FIMOcolumns'] : In-silico ChIP motif enrichment for ATAC peaks
atac_sc_ad.varm['FIMO'], atac_sc_ad.uns['FIMOcolumns'] : FIMO motif enrichment for ATAC peaks
atac_sc_ad.varm['InSilicoChip'], atac_sc_ad.uns['InSilicoChipcolumns'] : In-silico ChIP motif enrichment for ATAC peaks
rna_sc_ad.obsm['chromVAR_deviations'] : chromVAR results with In-silico ChIP results
rna_sc_ad.varm['geneXTF'] : Gene X TF targets inferred using SEACells gene-peaks correlations and in-silico ChIP results.

Environment Setup

In order to run this pipeline, certain Python and R packages need to be installed. Importantly, the environment.yaml / requirements.txt files include the snakemake package.

Dependencies

Before creating the environment, make sure the following dependencies are installed.

conda
- Conda is required for managing environments, whether you chose to create the environment using conda or pip.
R 4.1.0
- This is the version of R used throughout the pipeline. If you wish to use a different R version / installation, modify the renv.lock file and the R version in config.yaml
MEME (recommended: 5.1.1)
- This includes the fimo tool used to find TF motifs. If you wish to use your own installation of the MEME Suite, update the MEME version in config.yaml

Python Modules

There are two options, using conda or using pip .

With conda :

envName=atac-metacell-utils
conda env create -n "$envName" --file envs/environment.yaml
conda activate "$envName"

With pip :

In an activated, bare conda environment:

pip install -r envs/requirements.txt

R modules

To ensure the project runs smoothly, an renv.lock file is included under the envs/ directory. Using the renv package makes it easy to distribute the required packages and their correct versions.

If renv is not yet installed, run:

snakemake --cores 1 renv_install --config renv_loc=<path/to/user/R/library>

NOTE: --cores can tell snakemake how many cores are available for use. It will automatically run the jobs in parallel, if possible.

This will run the snakemake rule that will install the renv package into the library location specified above

Next, we can install all the required R packages:

snakemake --cores 1 renv_init_restore

Modifying the `config.yaml` file

For each project using this pipeline, parameters can be set in the configuration file located at:

config/config.yaml

In order for the pipeline to run, you will have to set the following parameters:

atac str : Path to the SEACell-level ATAC AnnData with normalized and log-transformed counts
rna str : Path to the SEACell-level RNA AnnData with normalized and log-transformed counts
sc_atac str : Path to the single-cell-level ATAC AnnData with matching peaks to atac AnnData. Must have nFrags annotation in .obs .
sc_rna str : Path to the single-cell-level RNA AnnData used to generate the SEACell-level RNA data.

NOTE: The RNA and ATAC SEACell AnnDatas should have matched/common obs_names .

Optional Parameters:

The following values have default values specified in the included config.yaml but can be adjusted to suit

Peak and genome params: NOTE: Make sure the in_meme file is appropriate for the chosen genome

in_meme str : Path to the .meme file for FIMO, default=data/cis-bp-tf-information.meme
- The default MEME motif file was created using the script: workflow/scripts/make_meme.py
- Motifs from other databases can also be converted to a format compatible with MEME - for example, the MEME utility jaspar2meme can convert JASPAR PFM files.
width int : Number of base pairs to resize ATAC peaks to. The summit will be the midpoint, default=150
genome str : Genome to use, will be referenced in the all_seqs , chromvar , and gp_corr rules, default=hg38
- Currently supported genomes include:
  - hg38
  - hg19
  - hg18
  - mm9
  - mm10

In silico ChIP params:

verbose str/flag : Whether to print info for each TF, default="" (False), set to "--verbose" for True
min_chip_score float : minimum in silico ChIP score to pass filtering, default=0.10
min_peak_hits int : minimum TFs passing in silico ChIP filters, default=30
vmax int : vmax for plotting, vmin = -vmax, default=3
embedding str : sc_ad.obsm field for plotting, default=umap

Gene-Peak correlation params:

n_jobs int : Number of cores for the gp_rule , default=1
test_set str/flag : Whether to generate a test set, default="" (False), set to "--test_set" for True
n_genes int : number of genes to use in test set, if applicable, default=20
min_corr float : minimum gene-peak correlation to pass filtering, default=0.0
max_pval float : max p-value for gene-peak correlation to pass filtering, default=0.1
min_peaks int : minimum number of significant peaks to pass gene filtering, default=5

Differential accessibility params:

group_variable str : the name of the column in the single-cell ATAC .obs to annotate metacells by, such as celltype.
to_compare str : A string of comma-separated pairs of levels of group_variable , separated by forward slashes. example: "EryPre1,proB/Mono,proB/EryPre1,HSC/Mono,HSC"

Peak selection params:

target list : Target lineage for identification of primed and lineage specific peaks. List of length 2: 1)Target lineage name for annotation and 2)Value of group_variable .
start str : Value of the group_variable corresponding to the stem cell population
reference dict : A dictionary of lineages and their corresponding representative cell type, defined in YAML format. Note that the comparisons should be included for each reference and target combination, and each reference and start combination
min_logFC float : the minimum log fold change for a peak to be considered to be differentially accessible in the target lineage. `default=0
max_logFC float : the max allowable log-fold change accessibility of a peak in a reference lineage relative to the starting state. default=0.25

Gene-Peak correlation params , Differential accessibility params , Peak selection params should be updated for computing primed and lineage-specific peaks and scores.

Command-line config specification

Instead of altering the config.yaml file directly, you can also pass the params using the keys in the config file on the command line.

For example if we wanted to generate a test_set:

snakemake --cores 1 all --config test_set="--test_set"

Passing in a different config file

In the Snakefile, the global variable, config , is set with the line:

configfile: "config/config.yaml"

The keys defined in the default config/config.yaml can be overwritten by passing a different config file (still .yaml ). For example:

snakemake --cores 1 all --configfile <new_config_file.yaml>

NOTE: All keys defined by the configfile: statement, the --configfile and/or --config command line arguments are part of the final config dictionary.

If two methods define the same key, command line arguments overwrite keys from the configfile: statement

Snakemake will output a statement to make it clear to the user:

Config file config/config.yaml is extended by additional config specified via the command line

Optional: Download `hg38.gtf`

For convenience, a rule is included to download the hg38.gtf file into a data directory for use in the gp_corr rule.

snakemake --cores 1 dl_hg38

If a different genome is being used, create a data/ directory and place the .gtf file inside. Modify config.yaml accordingly.

Running the pipeline

After the environment has been set up and the configuration file is set, the pipeline is ready to run!

We can always conduct a "dry run" to test that all required inputs/parameters have been set before running the pipeline for real:

snakemake -nr all

-n flag tells snakemake to make a "dry run"
-r flag will ouput the reason each rule is running

To generate all files, run:

snakemake --cores 1 all

NOTE: In the case that a rule fails, it is not necessary to rerun rules that successfully completed.

In fact, calling the all rule again will only run the rules for which their output is missing.

snakemake --cores 1 all

For more control, individual rules can also be called:

If you do this, make sure all the input dependencies have already been generated! -n can be appended for a dry run to determine if inputs are available.

snakemake --cores 1 name_of_rule

After modifications of inputs or parameters, you can force a rule to rerun using the -R flag:

snakemake --cores 1 -R name_of_rule

For even more control, the --allowed-rules argument can be used to pass a space-separated list of rules that are allowed to run. This is helpful when trying to avoid re-running time-intensive rules such as fimo and gp_corr when calling all or individual rules that get inputs from upstream rules.

For example, the following will run/re-run the chromvar rule and any of the allowed rules if necessary.

snakemake --cores 1 chromvar --allowed_rules peak_tf compute_ins_chip prep_chromvar

When running this pipeline on an HPC system which uses lmod to load software, do not load snakemake as a module - this can cause conflicts where the Python version used by snakemake is the one provided by the module, not the gene-TF conda environment.

To run using the snakemake version installed with gene-TF , make sure the gene-TF environment is active, then run the pipeline as follows:

python -m snakemake --cores 1 name-of-rule

Submitting to Slurm

To run this pipeline non-interactively on a cluster using the Slurm job manager, create a shell script calling snakemake . An template script is shown below.

$1 : name of snakemake rule to run
$2 : number of cores to use

#!/bin/bash
#SBATCH --cpus-per-task=16
#SBATCH --job-name=snakemake
#SBATCH --partition campus-new
#SBATCH --nodes=1
#SBATCH --time=2-00:00:00
#SBATCH --mem=150
snakemake --cores $2 $1

Output (stdout and stderr) captured by Slurm is written to a .log file, specified with the -o / --output flag.

After creating the shell script, and making it executable, run:

sbatch -o /path/to/log/file.log name_of_script.sh rule_name num_cores

Log Files

For the fimo rule, separate log files are created and can be accessed in the logs/ directory.

Other log files can be accessed in the .snakemake/log directory.

References

SEACells: https://www.nature.com/articles/s41587-023-01716-9
In-silico ChIP: https://www.biorxiv.org/content/10.1101/2022.06.15.496239v1
chromVAR: https://www.nature.com/articles/nmeth.4401
Mellon: https://www.biorxiv.org/content/10.1101/2023.07.09.548272v2 The custom MOTIF annotations are based off of in silico ChIP-seq , introduced by Argelaguet et al in this paper.

Code Snippets

args <-commandArgs(trailingOnly=TRUE)
install.packages("renv", lib=args[1], repos='http://cran.us.r-project.org')

R renv From line 1 of scripts/install_renv.R

import scanpy as sc
import SEACells
import pandas as pd
import numpy as np


def main():

    # Load data
    print('Loading data...')
    atac_sc_ad = sc.read(snakemake.input["sc_atac"])
    group_variable = snakemake.params['cell_type_obs']

    # Open peaks per cell type
    # Aggregate counts per cell type
    print('Aggregating counts per cell type')
    ct_ad = SEACells.core.summarize_by_SEACell(atac_sc_ad,
                                               group_variable, summarize_layer='X')
    ct_ad.obs['n_counts'] = np.ravel(ct_ad.X.sum(axis=1))
    # SVD for filler - only used for identifying itself as nearest neighbor
    sc_svd = pd.DataFrame(atac_sc_ad.obsm['X_svd'], index=atac_sc_ad.obs_names)
    ct_ad.obsm['X_svd'] = sc_svd.groupby(
        atac_sc_ad.obs[group_variable]).mean().loc[ct_ad.obs_names, :]

    # Open peaks
    # Determine open peaks in each metacell
    print('Open peaks in each cell type')
    SEACells.accessibility.determine_metacell_open_peaks(
        ct_ad, n_neighbors=1, n_jobs=1)

    # Update anndata
    print('Update and saving anndata')
    atac_sc_ad.varm['OpenPeaks'] = pd.DataFrame(ct_ad.layers['OpenPeaks'].T,
                                                columns=ct_ad.obs_names, index=atac_sc_ad.var_names)

    # Save
    atac_sc_ad.write(snakemake.input["sc_atac"])

    # CReate directory to mark completion
    import os
    os.makedirs(str(snakemake.output["out_dir"]), exist_ok=True)


if __name__ == "__main__":
    main()

Python Pandas numpy SEACells From line 1 of scripts/open_peaks.py

import scanpy as sc
import pandas as pd
import numpy as np
from tqdm.auto import tqdm


def main():

    # Load data
    print('Loading anndata...')
    atac_ad = sc.read(snakemake.input["sc_atac"])
    atac_meta_ad = sc.read(snakemake.input["meta_atac"])

    # set target lineage and cell type
    target = snakemake.params["target"]
    target_lineage = target[0]
    target_celltype = target[1]
    # Start cell type
    start_celltype = snakemake.params["start"]
    # Reference lineages
    ref_lineages = snakemake.params["reference"]

    # set filtering parameters
    min_logFC = snakemake.params["min_logFC"]
    max_logFC = snakemake.params["max_logFC"]
    max_pval = snakemake.params["max_pval"]
    min_corr = snakemake.params["min_corr"]


    print("Loading gene-peak corrs...")
    gene_peak_corrs = dict(list(atac_meta_ad.uns['gp_corrs'].groupby('gene')))

    print("Loading differential results")
    diff_peaks = dict()
    start_peaks = dict()
    for ref_lineage in ref_lineages.keys():
        ref_celltype = ref_lineages[ref_lineage]

        # Differential accessibility compared to target cell type
        diff_peaks[ref_lineage] = atac_meta_ad.varm[f'{ref_celltype}_{target_celltype}_diff_acc']
        # diff_peaks[ref_lineage]['feature'] = diff_peaks[ref_lineage].index

        # Differential accessibility compared to stem cell type
        start_peaks[ref_lineage] = atac_meta_ad.varm[f'{ref_celltype}_{start_celltype}_diff_acc']
        # start_peaks[ref_lineage]['feature'] = diff_peaks[ref_lineage].index

    # Select list of peaks which are positively correlated with gene expression and meet p value criteria
    print("Removing peaks not positively correlated with gene expression...")
    correlated_peaks = pd.Series(dtype=object)
    for gene in tqdm(gene_peak_corrs.keys()):
        if isinstance(gene_peak_corrs[gene], int):
            continue
        res = gene_peak_corrs[gene]
        iter_peaks = res.index[(res['cor'] >= min_corr) & (res['pval'] < max_pval)]
        correlated_peaks = iter_peaks.union(correlated_peaks)
    filtered_peaks = correlated_peaks.unique()
    print(f'Retained {len(filtered_peaks)} peaks')


    # Peaks that are open in the target lineage
    print('Retain peaks that are open in the target lineage')
    open_peaks = atac_ad.varm['OpenPeaks']
    filtered_peaks = filtered_peaks[open_peaks.loc[filtered_peaks, [target_celltype]].sum(axis=1) > 0]
    print(f'Retained {len(filtered_peaks)} peaks')


    print(f"Retain peaks with greater accessibility in {target_lineage} compared to other lineages")
    retain_peaks = pd.Series(False, index=filtered_peaks)
    for ref in diff_peaks.keys():
        diff = diff_peaks[ref].loc[filtered_peaks]
        retain = filtered_peaks[(diff['logFC'] > min_logFC) & ~np.isnan(diff['groupA_N'])]
        retain_peaks[retain] = True
    filtered_peaks = filtered_peaks[retain_peaks]
    print(f'Retained {len(filtered_peaks)} peaks')


    print("Remove peaks with increased accessibility in reference cell types compared to stem cells")
    retain_peaks = pd.Series(False, index=filtered_peaks)
    for ref in start_peaks.keys():
        diff = start_peaks[ref].loc[filtered_peaks]
        retain = filtered_peaks[(diff['logFC'] < max_logFC)]
        retain_peaks[retain] = True
    filtered_peaks = filtered_peaks[retain_peaks]
    print(f'Retained {len(filtered_peaks)} peaks')


    # Primed and lineage specific peaks
    primed_peaks = filtered_peaks[open_peaks.loc[filtered_peaks, start_celltype] > 0]
    lin_spec_peaks = filtered_peaks[open_peaks.loc[filtered_peaks, start_celltype] == 0]

    print("Peak selection completed! Adding results to anndata ...")
    atac_ad.var[f'{target_lineage}_primed'] = atac_ad.var_names.isin(primed_peaks)
    atac_ad.var[f'{target_lineage}_lineage_specific'] = atac_ad.var_names.isin(lin_spec_peaks)

    print('Saving')
    atac_ad.write(snakemake.input["sc_atac"])

    # CReate directory to mark completion
    import os
    os.makedirs(str(snakemake.output["out_dir"]), exist_ok=True)

    print("Completed!")

if __name__ == "__main__":
    main()

Python Pandas numpy tqdm From line 1 of scripts/peak_selection.py

import scanpy as sc
import numpy as np
import pandas as pd
from tqdm.auto import tqdm


def _dot_func(x, y):
    return x.dot(y)

def impute_data_with_weights(ad, data):
    res = _dot_func(ad.obsp['ImputeWeights'], data)
    return pd.DataFrame(res, index=ad.obs_names, columns=ad.var_names)


def main():

    # Load params
    print('Load data...')
    atac_sc_ad = sc.read_h5ad(snakemake.input["sc_atac"])
    atac_meta_ad = sc.read_h5ad(snakemake.input["meta_atac"])
    rna_sc_ad = sc.read_h5ad(snakemake.input["sc_rna"])
    target_lineage = snakemake.params["target"][0]
    min_corr = snakemake.params["min_corr"]
    max_pval = snakemake.params["max_pval"]
    min_peaks = snakemake.params["min_peaks"]

    # Reconstruct gene peak correlations
    gene_peak_corrs = dict(list(atac_meta_ad.uns['gp_corrs'].groupby('gene')))

    # Peak accessibility imputation
    print('Peak accessibility imputation...')
    lin_peaks = atac_sc_ad.var_names[
        (atac_sc_ad.var[f'{target_lineage}_primed']) |
        (atac_sc_ad.var[f'{target_lineage}_lineage_specific'])
    ]
    imputed_peak_matrix = impute_data_with_weights(atac_sc_ad[:, lin_peaks],
                                                   pd.DataFrame(atac_sc_ad[:, lin_peaks].layers['tf_idf'].todense(),
                                                                columns=lin_peaks, index=atac_sc_ad.obs_names))

    # Scores
    from scipy.sparse import csr_matrix
    acc_scores = dict()
    for key in ['primed', 'lineage_specific']:
        print(f'Computing {key} scores')

        # Imputead accessibility
        lin_peaks = atac_sc_ad.var_names[atac_sc_ad.var[f'{target_lineage}_{key}']]
        imp_peaks = imputed_peak_matrix[lin_peaks]

        # Scores per gene
        acc_scores[key] = pd.DataFrame(
            0.0, index=atac_sc_ad.obs_names, columns=rna_sc_ad.var_names)
        for gene in tqdm(rna_sc_ad.var_names):
            if gene not in gene_peak_corrs.keys():
                continue

            # Gene peak correlations
            res = gene_peak_corrs[gene]
            if isinstance(res, int):
                continue

            gene_peaks = res.index[(res['cor'] > min_corr)
                                   & (res['pval'] < max_pval)]
            if len(gene_peaks) < min_peaks:
                continue

            key_peaks = gene_peaks[gene_peaks.isin(imp_peaks.columns)]
            if len(key_peaks) == 0:
                continue

            acc_scores[key].loc[:, gene] = imp_peaks[key_peaks].dot(res.loc[key_peaks, 'cor']) \
                / np.sum(res.loc[key_peaks, 'cor'])

        # Save results to rna andata
        rna_sc_ad.layers[f'{target_lineage}_{key}'] = csr_matrix(acc_scores[key])

    # Save results
    print('Saving anndata')
    rna_sc_ad.write_h5ad(snakemake.input["sc_rna"])

    # CReate directory to mark completion
    import os
    os.makedirs(str(snakemake.output["out_dir"]), exist_ok=True)


if __name__ == "__main__":
    main()

Python Pandas numpy scipy tqdm From line 1 of scripts/primed_lin_scores.py

shell:
    "mkdir -p data && wget https://dp-lab-data-public.s3.amazonaws.com/SEACells-multiome/hg38.gtf -O data/hg38.gtf"

SnakeMake From line 17 of workflow/Snakefile

shell:
    """
    Rscript workflow/scripts/install_renv.R {input.libloc}
    """

SnakeMake From line 24 of workflow/Snakefile

shell:
    """
    R -e 'renv::init(bare=TRUE)'
    R -e 'renv::restore(lockfile=\"envs/renv.lock\")'
    """

SnakeMake From line 30 of workflow/Snakefile

shell:
    "python {input.script} --atac={input.atac} --outdir={params.out_dir}"

SnakeMake From line 46 of workflow/Snakefile

shell:
    """
    Rscript {input.script} {input.peaks} {output.outfile} {params.span} {params.genome}
    """

SnakeMake From line 58 of workflow/Snakefile

shell:
    """
    # fimo -oc {output.fimo_dir} {input.meme} {input.seqs} 2> {log.out}
    # rm -f {output.fimo_dir}/cisml.xml

    """

SnakeMake FiMO From line 71 of workflow/Snakefile

shell:
    "python {input.script} --peak_file={input.peaks} --fimo_res={input.fimo_dir} -o={output.out_dir} --sc_atac={input.sc_atac}"

SnakeMake From line 90 of workflow/Snakefile

shell:
    """
    mkdir -p {output.out_dir}
    python {input.metacell_script} --atac={input.atac} --sc_atac={input.sc_atac} --cell_type_obs={params.cell_type_obs} -o={output.out_dir}
    Rscript {input.script} {output.out_dir} {params.to_compare} {params.cell_type_obs}
    python {input.pyscript} --atac={input.atac} --to_compare={params.to_compare} --data_dir={output.out_dir}
    """

SnakeMake From line 106 of workflow/Snakefile

shell:
    """
    mkdir -p {output.gp_dir}
    python {input.script} --atac={input.atac} --rna={input.rna} -o={output.gp_dir} --n_jobs={params.n_jobs} \\
        --gtf_file={input.gtf_file} {params.test_set} --n_genes={params.n_genes} --min_corr={params.min_corr} \\
        --max_pval={params.max_pval}
    """

SnakeMake From line 130 of workflow/Snakefile

script:
    "scripts/open_peaks.py"

SnakeMake From line 149 of workflow/Snakefile

script:
    "scripts/peak_selection.py"

SnakeMake From line 174 of workflow/Snakefile

script:
    "scripts/primed_lin_scores.py"

SnakeMake From line 196 of workflow/Snakefile

shell:
    """
    mkdir -p {output.insc_dir}
    python {input.script} --atac={input.atac} --rna={input.rna} --sc_atac={input.sc_atac} -o={output.insc_dir} {params.verbose}
    touch "{output.insc_dir}/.ins_chip"
    """

SnakeMake From line 212 of workflow/Snakefile

shell:
    """
    python {input.script} --atac={input.atac} --rna={input.rna}  --datadir={input.data_dir}
    """

SnakeMake From line 226 of workflow/Snakefile

shell:
    """
    mkdir -p {output.chromvar_indir}
    python {input.script} --sc_atac={input.sc_atac} --ins_chip_dir={input.ins_chip_out} --min_chip={params.min_chip} \\
        --min_peak_hits={params.min_peak_hits} -o={output.chromvar_indir}
    """

SnakeMake From line 241 of workflow/Snakefile

shell:
    """
    mkdir -p {output.chromvar_outdir}
    Rscript {input.script} {input.peak_file} {input.chromvar_indir} {output.chromvar_outdir} {params.genome}
    python {input.pyscript} --sc_rna {input.sc_rna} --input {output.chromvar_outdir}
    """

SnakeMake From line 261 of workflow/Snakefile

shell:
    """
    mkdir -p {output.gtf_dir}
    python {input.script} --atac={input.atac} --sc_atac={input.sc_atac} --sc_rna={input.sc_rna}\\
     --min_corr={params.min_corr} --max_pval={params.max_pval} \\
     --min_peaks={params.min_peaks} -o={output.gtf_dir}
    """

SnakeMake From line 283 of workflow/Snakefile

run:
    if config["interactive_clean"]:
        shell("rm -ri {params.out_dir}")
    else:
        shell("rm -r {params.out_dir}")

SnakeMake From line 296 of workflow/Snakefile

ShowHide 17 more snippets with no or duplicated tags.

Comments

Support

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Created: 1yr ago

Updated: 1yr ago

Maitainers: public

URL: https://github.com/settylab/atac_metacell_utilities

Name: atac_metacell_utilities

Version: v1.0.0

Badge:

Insert copied code into your website to add a link to this workflow.

License: None

Keywords:

Sample ID Peak calling Gene expression matrix FiMO Pandas SCANPY Snakemake renv numpy scipy SEACells tqdm ChIP-seq

Future updates

Related Workflows

psychip_snakemake — Show Details View Workflow

ENCODE pipeline for histone marks developed for the psychENCODE project

public

psychip pipeline is an improved version of the ENCODE pipeline for histone marks developed for the psychENCODE project. The o...

raw sequence reads Alignment Sequence alignment report macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

Near-real time tracking of SARS-CoV-2 in Connecticut

public

Repository containing scripts to perform near-real time tracking of SARS-CoV-2 in Connecticut using genomic data. This pipeli...

JSON nextclade Augur Biopython FOCUS Pandas Snakemake bs4 epiweeks geopy matplotlib numpy pycountry pycountry-convert uszipcode

Free

cellranger-snakemake-gke — Show Details View Workflow

snakemake workflow to run cellranger on a given bucket using gke.

public

A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...

macs2 ucsc-bedclip bedGraphToBigWig BEDTools BWA Picard SAMtools Snakemake

Free

ATLAS - Three commands to start analyzing your metagenome data

public

Metagenome-atlas is a easy-to-use metagenomic pipeline based on snakemake. It handles all steps from QC, Assembly, Binning, t...

raw sequence reads Genome assembly Annotation track checkm2 gunc prodigal snakemake-wrapper-utils MEGAHIT Atlas BBMap Biopython BioRuby Bwa-mem2 cd-hit CheckM DAS Diamond eggNOG-mapper v2 MetaBAT 2 Minimap2 MMseqs MultiQC Pandas Picard pyfastx SAMtools SemiBin Snakemake SPAdes SqueezeMeta TADpole VAMB CONCOCT ete3 gtdbtk h5py networkx numpy plotly psutil utils metagenomics

Free

175

rna-seq-star-deseq2 — Show Details View Workflow

RNA-seq workflow using STAR and DESeq2

public

This workflow performs a differential gene expression analysis with STAR and Deseq2. The usage of this workflow is described ...

Free

dna-seq-gatk-variant-calling — Show Details View Workflow

This Snakemake pipeline implements the GATK best-practices workflow

public

This Snakemake pipeline implements the GATK best-practices workflow for calling small germline variants. The usage of thi...

VCF raw sequence reads Variant calling genetic variants gatk rust-bio-tools snakemake-wrapper-utils tabix BCFtools BWA FastQC MultiQC Pandas Picard SAMtools Snakemake Trimmomatic Variant Effect Predictor (VEP) common matplotlib numpy seaborn DNA

Free

Snakemake pipeline for scATAC metacell utilities such as primed and lineage specific scores, chromVAR, in-silico ChIP

Overview:

Output of the pipeline

Environment Setup

Dependencies

Python Modules

R modules

Modifying the config.yaml file

Optional Parameters:

Command-line config specification

Passing in a different config file

Optional: Download hg38.gtf

Running the pipeline

Submitting to Slurm

Log Files

References

Code Snippets

Comments

Support

Free

Related Workflows

public

public

public

public

public

public

Modifying the `config.yaml` file

Optional: Download `hg38.gtf`