Pipeline for Cut&Tag analysis

public public 1yr ago 0 bookmarks

cutTag-pipeline

Snakemake pipeline for Cut&Tag analysis

Setup

1. Configure the project directory

# cd into a project directory
# type the following to get a copy of the pipeline
git clone https://github.com/maxsonBraunLab/cutTag-pipeline.git
#create a directory for your fastq files
cd cutTag-pipeline
mkdir -p data/raw
# link fastqs to data/raw
ln -s /path/to/fastq/files/sample1_R1.fastq.gz data/raw
ln -s /path/to/fastq/files/sample1_R2.fastq.gz data/raw
ln -s /path/to/fastq/files/sample2_R1.fastq.gz data/raw
ln -s /path/to/fastq/files/sample2_R2.fastq.gz data/raw
...
# make scripts executable
chmod +x src/*.py src/*.sh *.sh

IMPORTANT

After creating the symlinks, rename all the symlinks in data/raw to the following format: {condition}_{replicate}_{mark}_{R1|R2}.fastq.gz

For example, a file with this original name LIB200706TB_M6Q3_RBP1_S93_L001_R1_001.fastq.gz will be renamed to M6Q_3_RBP1_R1.fastq.gz

2. Make the sample sheet and deseq2 metadata.

Activate an environment containing snakemake, and then run the script make_sample_sheet.py script from the root of the directory.

$ python src/make_sample_sheet.py data/raw

This will make a samplesheet for the experiment called samplesheet.tsv in the root of the directory as well as the file src/deseq2_metadata.csv , the contents of the samplesheet will be structured like the following example:

sample	R1	R2	mark	condition	igg
HoxE_1_IgG	data/raw/HoxE_1_IgG_R1.fastq.gz	data/raw/HoxE_1_IgG_R2.fastq.gz	IgG	HoxE	HoxE_1_IgG
HoxE_1_Rbp1	data/raw/HoxE_1_Rbp1_R1.fastq.gz	data/raw/HoxE_1_Rbp1_R2.fastq.gz	Rbp1	HoxE	HoxE_1_Rbp1
HoxE_2_Rbp1	data/raw/HoxE_2_Rbp1_R1.fastq.gz	data/raw/HoxE_2_Rbp1_R2.fastq.gz	Rbp1	HoxE	HoxE_2_Rbp1
HoxE_3_Rbp1	data/raw/HoxE_3_Rbp1_R1.fastq.gz	data/raw/HoxE_3_Rbp1_R2.fastq.gz	Rbp1	HoxE	HoxE_3_Rbp1
HoxM_1_IgG	data/raw/HoxM_1_IgG_R1.fastq.gz	data/raw/HoxM_1_IgG_R2.fastq.gz	IgG	HoxM	HoxM_1_IgG
HoxM_1_Rbp1	data/raw/HoxM_1_Rbp1_R1.fastq.gz	data/raw/HoxM_1_Rbp1_R2.fastq.gz	Rbp1	HoxM	HoxM_1_Rbp1
HoxM_2_Rbp1	data/raw/HoxM_2_Rbp1_R1.fastq.gz	data/raw/HoxM_2_Rbp1_R2.fastq.gz	Rbp1	HoxM	HoxM_2_Rbp1
HoxM_3_Rbp1	data/raw/HoxM_3_Rbp1_R1.fastq.gz	data/raw/HoxM_3_Rbp1_R2.fastq.gz	Rbp1	HoxM	HoxM_3_Rbp1
HoxW_1_IgG	data/raw/HoxW_1_IgG_R1.fastq.gz	data/raw/HoxW_1_IgG_R2.fastq.gz	IgG	HoxW	HoxW_1_IgG
HoxW_1_Rbp1	data/raw/HoxW_1_Rbp1_R1.fastq.gz	data/raw/HoxW_1_Rbp1_R2.fastq.gz	Rbp1	HoxW	HoxW_1_Rbp1
HoxW_2_Rbp1	data/raw/HoxW_2_Rbp1_R1.fastq.gz	data/raw/HoxW_2_Rbp1_R2.fastq.gz	Rbp1	HoxW	HoxW_2_Rbp1
HoxW_3_Rbp1	data/raw/HoxW_3_Rbp1_R1.fastq.gz	data/raw/HoxW_3_Rbp1_R2.fastq.gz	Rbp1	HoxW	HoxW_3_Rbp1

The script splits the file name on the '_' and uses the first split for the condition, and the second split for the mark. The 'igg' column is the same as the 'sample' column and should be manually replaced with the sample name of the IGG or control you would like to use for that sample. If the sample is an IgG it can be the same as its name, and won't affect peak calling.

So a fixed version of the table above would look like this:

sample	R1	R2	mark	condition	igg
HoxE_1_IgG	data/raw/HoxE_1_IgG_R1.fastq.gz	data/raw/HoxE_1_IgG_R2.fastq.gz	IgG	HoxE	HoxE_1_IgG
HoxE_1_Rbp1	data/raw/HoxE_1_Rbp1_R1.fastq.gz	data/raw/HoxE_1_Rbp1_R2.fastq.gz	Rbp1	HoxE	HoxE_1_IgG
HoxE_2_Rbp1	data/raw/HoxE_2_Rbp1_R1.fastq.gz	data/raw/HoxE_2_Rbp1_R2.fastq.gz	Rbp1	HoxE	HoxE_1_IgG
HoxE_3_Rbp1	data/raw/HoxE_3_Rbp1_R1.fastq.gz	data/raw/HoxE_3_Rbp1_R2.fastq.gz	Rbp1	HoxE	HoxE_1_IgG
HoxM_1_IgG	data/raw/HoxM_1_IgG_R1.fastq.gz	data/raw/HoxM_1_IgG_R2.fastq.gz	IgG	HoxM	HoxM_1_IgG
HoxM_1_Rbp1	data/raw/HoxM_1_Rbp1_R1.fastq.gz	data/raw/HoxM_1_Rbp1_R2.fastq.gz	Rbp1	HoxM	HoxM_1_IgG
HoxM_2_Rbp1	data/raw/HoxM_2_Rbp1_R1.fastq.gz	data/raw/HoxM_2_Rbp1_R2.fastq.gz	Rbp1	HoxM	HoxM_1_IgG
HoxM_3_Rbp1	data/raw/HoxM_3_Rbp1_R1.fastq.gz	data/raw/HoxM_3_Rbp1_R2.fastq.gz	Rbp1	HoxM	HoxM_1_IgG
HoxW_1_IgG	data/raw/HoxW_1_IgG_R1.fastq.gz	data/raw/HoxW_1_IgG_R2.fastq.gz	IgG	HoxW	HoxW_1_IgG
HoxW_1_Rbp1	data/raw/HoxW_1_Rbp1_R1.fastq.gz	data/raw/HoxW_1_Rbp1_R2.fastq.gz	Rbp1	HoxW	HoxW_1_IgG
HoxW_2_Rbp1	data/raw/HoxW_2_Rbp1_R1.fastq.gz	data/raw/HoxW_2_Rbp1_R2.fastq.gz	Rbp1	HoxW	HoxW_1_IgG
HoxW_3_Rbp1	data/raw/HoxW_3_Rbp1_R1.fastq.gz	data/raw/HoxW_3_Rbp1_R2.fastq.gz	Rbp1	HoxW	HoxW_1_IgG

For this example there was only one IgG per condition, so the sample name corresponding to that IGG was used for each sample in the condition. In the case that each sample had its own control file, each entry would correspond to the IgG for that sample. If only one IgG was used in the whole experiment, then its sample name could be used for each row. If you are not using IgG set config['USEIGG'] to false, and don't modify the samplesheet.

3. Edit configuration files

  1. Edit runtime configuration in the file config.yml :

    • Specify whether or not to trim adapters from raw reads to improve downstream alignment rate. You can decide based on adapter content in the sequencing core's FastQC reports.

    • Specify whether or not to use IGG for peak calling.

    • Specify the path to the bowtie2 index for the genome you are aligning to.

    • Specify the path to the fastq screen configuration file.

    • Specify the path to the genome FASTA file.

    • Specify the path to the chromosome sizes.

  2. The pipeline detects samples in the subdirectory data/raw with the following assumptions:

    • Paired end reads

    • Read 1 and 2 are designated using "_R1", and "_R2"

    • Samples are designated in the following convention: {condition}_{replicate}_{mark}_{R1|R2}.fastq.gz This format affects the output files and ensures the bigwig files from the same marks are merged.

  3. Make sure the src/deseq2_metadata.csv looks right. The file is created when you run make_sample_sheet.py and should have the following properties:

  • Should have two columns labeled "sample", and "condition"

  • The sample column corresponds to the individual biological replicates (includes all fields around the "_" delimiter).

  • The condition should be the condition for each sample, which uses the first field with the "_" delimiter.

  • If you have multiple conditions and marks to analyze, you can introduce more columns into this file and adjust the deseq2.R file to account for extra covariates.

The file src/deseq2_metadata is populated with the following example data:

sample,condition
HoxE_1_IgG,HoxE
HoxE_1_Rbp1,HoxE
HoxE_2_Rbp1,HoxE
HoxE_3_Rbp1,HoxE
HoxM_1_IgG,HoxM
HoxM_1_Rbp1,HoxM
HoxM_2_Rbp1,HoxM
HoxM_3_Rbp1,HoxM
HoxW_1_IgG,HoxW
HoxW_1_Rbp1,HoxW
HoxW_2_Rbp1,HoxW
HoxW_3_Rbp1,HoxW

4. Set up SLURM integration (for batch jobs)

Do this step if are running the pipeline as a batch job and don't yet have a SLURM profile set up.

Download the slurm folder from the maxsonBraunLab repository and copy the entire thing to ~/.config/snakemake .

Your file configuration for SLURM should be as follows:

~/.config/snakemake/slurm/<files>

Change the file permissions for the scripts in the slurm folder so that they are executable. To do this, run:

chmod +x ~/.config/snakemake/slurm/slurm*

Execution

A "dry-run" can be accomplished to see what and how files will be generated by using the command:

snakemake -nrp

To invoke the pipeline, please use either of the two options below:

# run in interactive mode. Recommended for running light jobs.
snakemake -j <n cores> --use-conda --conda-prefix $CONDA_PREFIX_1/envs
# run in batch mode. Recommended for running intensive jobs.
sbatch run_pipeline_conda.sh

For users running the pipeline in batch mode, run_pipeline_conda.sh is a wrapper script that contains the following command:

snakemake -j <n jobs> --use-conda --conda-prefix $CONDA_PREFIX_1/envs --profile slurm --cluster-config cluster.yaml

Additional setup instructions are provided in the wrapper script.

You can standardize further arguments for running the pipeline in batch mode using the following instructions . The maxsonBraunLab repository slurm contains further instructions to set up a SnakeMake slurm profile.

Reproducible results with SnakeMake + Singularity

To ensure the reproducibility of your results, we recommend running a SnakeMake workflow using Singularity containers. These containers standardize the underlying operating system of the workflow (e.g. Ubuntu, centOS, etc.), while conda tracks the installation of bioinformatic software (e.g. bowtie2, samtools, deseq2). To utilize Singularity in your analysis, log in to an interactive node and load the module first like this:

# request an interactive node
srun -p light --time=36:00:00 --pty bash
# re-activate your environment with snakemake
conda activate <snakemake-env>
# load the singularity program
module load /etc/modulefiles/singularity/current

By default, Singularity will create and use a cache directory in your personal user root folder (i.e. in /home/users/<username> ). This may create problems as there is limited space in a user's root folder on Exacloud. To avoid issues with space in your root folder, you can set the Singularity cache directory path to a folder in your lab group directory like this:

# make a cache folder inside your lab user folder 
mkdir /home/groups/MaxsonLab/<your_user_folder>/singularity_cache
# make the path to the cache folder accessible to other processes
export SINGULARITY_CACHEDIR=/home/groups/MaxsonLab/<your_user_folder>/singularity_cache

If you are an experienced user, you can add the export SINGULARITY_CACHEDIR=... line to your .bashrc file. Otherwise, run the export SINGULARITY_CACHEDIR=... command before doing the steps below.

More Singularity documentation on Exacloud can be found here . If it is your first time running the pipeline, and especially when using Singularity, we must install all the conda environments using the following command:

indices_folder="/home/groups/MaxsonLab/indices"
conda_folder="${CONDA_PREFIX_1}/envs"
fastq_folder="/home/groups/MaxsonLab/input-data2/path/to/FASTQ/folder/"
snakemake -j 1 \
	--verbose \
	--use-conda \
	--conda-prefix $conda_folder \
	--use-singularity \
	--singularity-args "--bind $indices_folder,$conda_folder,$fastq_folder" \
	--conda-create-envs-only

The above code snippet will take about an hour or more to set up, but is a one-time installation. After creating the conda environments, symlinks, samplesheet.tsv , and the src/deseq2_metadata.csv , we can invoke the pipeline in the same shell like this:

# Singularity + interactive run
snakemake -j <n cores> \
	--use-conda \
	--conda-prefix $conda_folder \
	--use-singularity \
	--singularity-args "--bind $indices_folder,$conda_folder,$fastq_folder"
# Singularity + slurm (batch) run
sbatch run_pipeline_singularity.sh

For users running the Singularity version of the pipeline in batch mode, run_pipeline_singularity.sh is a wrapper script for the pipeline. You will need to add the appropriate FASTQ folder path to the script prior to running. Additional instructions are provided in the wrapper script.

NOTE: make sure to use double quotes, and insert an integer for the -j flag.

The above command will install the pipeline's conda environments into the conda-prefix directory - this means that conda environments are actually not stored INSIDE the container. The --bind argument binds the host (Exacloud) paths to the container to access the genome indices, conda prefix, and the path to the raw sequencing files. The --profile slurm in the wrapper script will configure default settings for SnakeMake to interact with SLURM - more information can be found here . Feel free to create another snakemake profile that has its own set of singularity arguments for added convenience.

Method

Without adapter trimming:

With adapter trimming enabled:

Output

Below is an explanation of each output directory:

aligned - sorted and aligned sample bam files
callpeaks - the output of callpeaks.py with peak files and normalized bigwigs for each sample.
counts - the raw counts for each mark over a consensus peak list
deseq2 - the results of deseq2 fore each counts table (by mark)
dtools - fingerprint plot data for multiqc to use
fastp - adapter-trimmed FASTQ files (if adapter-trimming option is enabled in `config.yml`)
fastqc - fastqc results
fastq_screen - fastq_screen results
logs - runtime logs for each snakemake rule
markd - duplicate marked bam files
multiqc - contains the file multiqc_report.html with a lot of QC info about the experiment.
plotEnrichment - FRIP statistics for each mark
preseq - library complexity data for multiqc report

Deseq2 outputs

Each mark should have the following output files:

"data/deseq2/{mark}/{mark}-rld-pca.png" - PCA of counts after reguarlized log transformation. 
"data/deseq2/{mark}/{mark}-vsd-pca.png" - PCA of counts after variance stabilizing transformation.
"data/deseq2/{mark}/{mark}-normcounts.csv" - normalized count for each sample in each consensus peak.
"data/deseq2/{mark}/{mark}-lognormcounts.csv" - log2 normalized counts for each sample in each consensus peak.
"data/deseq2/{mark}/{mark}-rld.png", - the sdVsMean plot using regularized log transformation.
"data/deseq2/{mark}/{mark}-vsd.png" - the sdVsMean plot using variance stabilizing transformation.
"data/deseq2/{mark}/{mark}-vsd-dist.png" - the sample distance matrix after variance stabilizing transformation.
"data/deseq2/{mark}/{mark}-rld-dist.png" - the sample distance matrix using regularized log transformation.
"data/deseq2/{mark}/{mark}-dds.rds" - the R object with the results of running the DEseq() function.

For each contrast, the differentially expressed genes are written to a file ending in -diffexp.tsv as well as those with an adjusted p-value less than 0.05 with the extension -sig05-diffexp.tsv . A summary of the results using an alpha of 0.05 is also written to a file with the extension -sig05-diffexp-summary.txt . Additionally two MA plots are written to the file ending in plotMA.png that have highlighted differential peaks with an adjusted p-value less than 0.1.

See the following paper for further explanations of the above plots and data transforms: https://bioconductor.org/packages/release/bioc/vignettes/DESeq2/inst/doc/DESeq2.html#exporting-results-to-csv-files

Code Snippets

94
95
shell:
    "fastp -i {input.r1} -I {input.r2} -o {output.r1} -O {output.r2} --detect_adapter_for_pe --thread {threads} -j {log} -h /dev/null"
109
110
shell:
    "fastqc -t {threads} --outdir data/fastqc {input} > {log} 2>&1"
123
124
125
shell:
    "fastq_screen --aligner bowtie2 --threads {threads} --outdir data/fastq_screen "
    "--conf {config[FASTQ_SCREEN]} --force {input} > {log} 2>&1"
139
140
141
142
143
144
shell:
    "bowtie2 --local --very-sensitive-local "
    "--no-unal --no-mixed --threads {threads} "
    "--no-discordant --phred33 "
    "-I 10 -X 700 -x {config[GENOME]} "
    "-1 {input.r1} -2 {input.r2} 2>{log.err} | samtools view -@ {threads} -Sbh - > {output}"
156
157
shell:
    "sambamba sort --tmpdir=data/aligned -t {threads} -o {output} {input} > {log} 2>&1"
169
170
shell:
    "sambamba markdup --tmpdir=data/markd -t {threads} {input} {output} > {log} 2>&1"
182
183
shell:
    "sambamba index -t {threads} {input} > {log} 2>&1"
194
195
shell:
    "bamCoverage -b {input[0]} -o {output} --binSize 10 --smoothLength 50 --normalizeUsing CPM -p {threads} "
206
207
shell:
    "bash src/mergebw.sh -c {config[CSIZES]} -o {output} {input}"
SnakeMake From line 206 of main/Snakefile
218
219
220
221
222
shell:
    "cat {input} | sort -k1,1 -k2,2n | "
    "bedtools merge | "
    "bedtools intersect -a - -b {input} -c | "
    "awk -v OFS='\t' '$4>=2 {{print}}' > {output}"
232
233
shell:
    "src/fraglen-dist.sh {input} {output}"
SnakeMake From line 232 of main/Snakefile
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
run:
    pd.options.plotting.backend = "plotly"
    dfs = []
    for i in input:
        cond_marker = [os.path.basename(i).split(".")[0]]
        temp_df = pd.read_csv(i, sep = "\t", index_col = 0, names = cond_marker)
        dfs.append(temp_df)
    df = pd.concat(dfs, axis = 1)
    fraglen = df.plot()
    fraglen.update_layout( 
        title='Fragment Length Distribution', 
        xaxis_title='Fragment Length (bp)', 
        yaxis_title='Counts', 
        legend_title_text='Samples')
    fraglen.write_html(str(output))
267
268
shell:
    "preseq c_curve -B {resources.defect_mode} -l 1000000000 -o {output} {input} > {log} 2>&1"
281
282
shell:
    "preseq lc_extrap -B {resources.defect_mode} -l 1000000000 -e 1000000000 -o {output} {input} > {log} 2>&1"
293
294
shell:
    "plotFingerprint -b {input[0]} --smartLabels --outRawCounts {output} > {log} 2>&1"
308
309
shell:
    "gopeaks -b {input[0]} {params.igg} -o data/callpeaks/{wildcards.sample} {params.params} > {log} 2>&1"
SnakeMake From line 308 of main/Snakefile
319
320
shell:
   "bash src/consensus_peaks.sh {wildcards.mark} {config[N_INTERSECTS]} {output}"
SnakeMake From line 319 of main/Snakefile
331
332
shell:
    "bash src/skip_frip.sh {input[0]} {input[1]} {output[0]} {output[1]} {log}"
SnakeMake From line 331 of main/Snakefile
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
run:
    pd.options.plotting.backend = "plotly"
    dfs = []
    for i in sorted(input):
        cond_marker = "_".join(i.split("_")[1:3])
        temp_df = pd.read_csv(i, sep = "\t", usecols=["percent"]).rename(columns = {'percent': cond_marker})
        dfs.append(temp_df)
    frip_df = pd.concat(dfs, axis = 1)
    frip_df = frip_df.rename(index={0: 'inside'})
    frip_df.loc["outside"] = 100 - frip_df.loc['inside']
    fig = go.Figure(data=[
        go.Bar(name="inside_peaks", x=frip_df.columns, y=frip_df.loc['inside'], marker_color='rgb(255, 201, 57)'),
        go.Bar(name='outside_peaks', x=frip_df.columns, y=frip_df.loc['outside'], marker_color='rgb(0,39, 118)')
    ])
    fig.update_layout(barmode='stack', 
        title='Fraction of Reads in Peaks by Sample', 
        xaxis_tickfont_size=14, yaxis=dict(title='Fraction of reads in peaks', 
        titlefont_size=16, tickfont_size=14), xaxis=dict(title='Samples'))
    fig.write_html(str(output))
380
381
script:
    "src/deseq2.R"
390
391
shell:
    "bash src/homer.sh -m {wildcards.mark} -s 1 -p 8 -g {config[FASTA]}"
SnakeMake From line 390 of main/Snakefile
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
import pandas as pd

# map samples to fastqs
def get_samples():
    """
    return list of samples from samplesheet.tsv
    """
    return list(st.index)

def get_marks():
    """
    return list of marks from samplesheet.tsv
    """
    return list(st['mark'])

def get_mark_conditions():
    """
    return list of samples by condition
    """
    st['mark_condition']=st['mark'].astype(str)+"_"+st['condition']
    return st['mark_condition'].unique().tolist()

def get_tracks_by_mark_condition(wildcards):
    """
    return list of tracks by mark_condition
    """
    st['mark_condition']=st['mark'].astype(str)+"_"+st['condition']
    samps=st.groupby(["mark_condition"])['sample'].apply(list)[wildcards.mark_condition]
    return [f"data/tracks/{s}.bw" for s in samps]

def get_peaks_by_mark_condition(wildcards):
    """
    return list of peaks by mark_condition
    """
    st['mark_condition']=st['mark'].astype(str)+"_"+st['condition']
    samps=st.groupby(["mark_condition"])['sample'].apply(list)[wildcards.mark_condition]
    return [f"data/callpeaks/{s}_peaks.bed" for s in samps]

def get_bowtie2_input(wildcards):
    """
    returns reads associated with a sample
    """
    r1=st.loc[wildcards.sample]['R1']
    r2=st.loc[wildcards.sample]['R2']
    return r1,r2

def get_reads():
    """
    get list of all reads
    """
    rlist=list(st['R1'])+list(st['R2'])
    rlist=[os.path.basename(f).split('.')[0] for f in rlist]
    return rlist

def get_igg(wildcards):
    """
    Returns the igg file for the sample unless
    the sample is IgG then no control file is used.
    """ 
    if config['USEIGG']:
        igg=st.loc[wildcards.sample]['igg']
        iggbam=f'data/markd/{igg}.sorted.markd.bam'
        isigg=config['IGG'] in wildcards.sample
        if not isigg:
            return f'-c {iggbam}'
        else:
            return ""
    else:
        return ""

def get_callpeaks(wildcards):
    """
    Returns the callpeaks input files
    """
    bam=f"data/markd/{wildcards.sample}.sorted.markd.bam"
    bai=f"data/markd/{wildcards.sample}.sorted.markd.bam.bai"
    if config["USEIGG"]:
        igg=st.loc[wildcards.sample]['igg']
        iggbam=f'data/markd/{igg}.sorted.markd.bam'
        iggbam=f'data/markd/{igg}.sorted.markd.bam.bai'
        isigg=config['IGG'] in wildcards.sample
        if not isigg:
            return [bam,bai,iggbam]
        else:
            return [bam,bai]
    else:
        return [bam,bai]

def callpeaks_params(wildcards):
    """
    Returns callpeaks parameters specified by the user in the samplesheet
    """
    params = st.loc[wildcards.sample]['gopeaks']
    if params == "-":
        return ""
    else:
        return params

def defect_mode(wildcards, attempt):
    if attempt == 1:
        return ""
    elif attempt > 1:
        return "-D"
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
if ! command -v bedtools &> /dev/null
then
    echo "bedtools could not be found"
    exit
fi

# check args
if [[ $# -eq 0 ]]
then
    echo "no arguments supplied"
    exit 1
fi

MARK=$1
N_INTERSECTS=$2
OUTFILE=$3

# array of replicates per mark
declare -a mpks=(data/callpeaks/*${MARK}*_peaks.bed)

all_samples=$(echo ${mpks[@]} | tr ' ' '\n' | cut -d/ -f3 | cut -d_ -f1-3)
all_conditions=$(echo ${mpks[@]} | tr ' ' '\n' | cut -d/ -f3 | cut -d_ -f1)

for condition in $all_conditions; do

    # file I/O
    tmp_output="data/counts/$MARK.$condition.tmp.bed"

    # list all replicates in one condition
    all_replicates=$(find data/callpeaks/ -name "*$condition*$MARK*.bed" | sort | tr '\n' ' ')

    # find widest peak that appear in at least n replicates + export to tmp file.
    cat $all_replicates | cut -f1-3 | sort -k1,1 -k2,2n | bedtools merge | \
    bedtools intersect -c -a stdin -b $all_replicates | \
    awk -v n=$N_INTERSECTS '$4 >= n' | awk -v OFS='\t' '{print $1,$2,$3}' > $tmp_output

done

# merge intervals for all tmp files and export as consensus peak.
all_temp_files=$(find data/counts -name "*$MARK*.tmp.bed" | sort | tr '\n' ' ')
cat $all_temp_files | sort -k1,1 -k2,2n | bedtools merge > data/counts/${MARK}_consensus.bed
rm $all_temp_files

# count the number of reads per union peak
bedtools multicov -bams data/markd/*${MARK}*.bam -bed data/counts/${MARK}_consensus.bed -D > ${OUTFILE}_tmp

# label the counts table
ls data/markd/*${MARK}*.bam  | sed 's!.*/!!' | cut -d. -f1 | xargs |  tr ' ' '\t' | awk '{print "chrom\tstart\tend\t" $0}' | cat - ${OUTFILE}_tmp > ${OUTFILE}

# remove tmp 
rm ${OUTFILE}_tmp
  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
library("tidyverse")
library("vsn")
library("pheatmap")
library("ggplot2")
library("RColorBrewer")
library("DESeq2")
library("PoiClaClu")
library("gplots")
library("plyranges")
library("tidyr")
library("dendextend")
library("ggrepel")
library("tools")


message("setting snakemake parameters")

exp_prefix = snakemake@params[["mark"]]
outdir = snakemake@params[["outdir"]]
numclusters = snakemake@params[["numclusters"]]
message(paste0("using numclusters = ", numclusters))
stopifnot(is.numeric(numclusters))

input = snakemake@input[["counts"]]
meta = snakemake@input[["meta"]]
genes = snakemake@input[["genes"]]
pcaPlot = snakemake@output[["pcaPlot"]]
pcaPlotVsd = snakemake@output[["pcaPlotVsd"]]
normCount = snakemake@output[["normCounts"]]
lnormCount = snakemake@output[["lnormCounts"]]
sdMeanRld = snakemake@output[["sdMeanRld"]]
sdMeanVsd = snakemake@output[["sdMeanVsd"]]
sampleDistPlotVsd = snakemake@output[["sampleDistVsd"]]
sampleDistPlotRld = snakemake@output[["sampleDistRld"]]
rds = snakemake@output[["rds"]]

if (!dir.exists(outdir)) {
    dir.create(outdir)
}

# read in genes file
genestab = read_tsv(genes, col_names=c("seqnames", "start", "end", "name", "score", "strand")) %>% GRanges()

# counts join
peaks = read_tsv(input) %>% 
    mutate(row=row_number()) %>%
    mutate(name=paste0("peak", as.character(row)))


# read in counts table
counts = read.delim(input, header=T, stringsAsFactors = F, check.names=F) %>%
    mutate(row=row_number()) %>%
    mutate(peak=paste0("peak", as.character(row)))
rownames(counts) = counts$peak
counts = counts[,4:ncol(counts)]

# read in metadata
meta <- read.csv(meta, header = T, stringsAsFactors = F)

# make the meta reflect the available sample names
meta = meta[meta$sample %in% colnames(counts),]

# make sure meta rownames and counts colnames in the same order.
rownames(meta) <- meta$sample
counts <- counts[, rownames(meta)]
stopifnot(rownames(meta) == colnames(counts))

message('calculating deseq...')
print(head(counts))
print(head(meta))
# make deseqdataset
ddsMat <- DESeqDataSetFromMatrix(countData = counts, colData = meta, design = ~condition)
dds <- DESeq(ddsMat)

# write normalized counts tables
normCounts <- counts(dds, normalized=TRUE)
normCounts <- 0.00001+(normCounts) # ensure nonzero 
lnormCounts <- log2(normCounts)
lnormCounts <- as.data.frame(lnormCounts)
lnormCounts$ID <- counts$peak
print(head(lnormCounts))
write.csv(normCounts, normCount)
write.csv(lnormCounts, lnormCount) 

# calculate rlog transform
rld <- rlog(dds, blind=FALSE)
vsd <- varianceStabilizingTransformation(dds, blind = TRUE,
  fitType = "parametric")
ntd <- normTransform(dds)

message('plotting count transforms...')
save.image()
# plot the transform for the first two samples
pdf(sdMeanRld)
rldCounts <- meanSdPlot(assay(rld))
rldCounts$gg + labs(title="meanSdPlot - rlog transformed")
dev.off()

pdf(sdMeanVsd)
vsdCounts <- meanSdPlot(assay(vsd))
vsdCounts$gg + labs(title="meanSdPlot - vsd transformed")
dev.off()

message('plotting poisson distance sample cross correlation...')
# plot sample distances 
pdf(sampleDistPlotVsd)
sampleDists <- dist(t(assay(vsd)))
sampleDistMatrix <- as.matrix(sampleDists)
rownames(sampleDistMatrix) <- paste(meta$sample, meta$condition, sep="-")
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix,
         clustering_distance_rows=sampleDists,
         clustering_distance_cols=sampleDists,
         col=colors,
         main=paste(exp_prefix, "vsd sample distance matrix"))
dev.off()

pdf(sampleDistPlotRld)
sampleDists <- dist(t(assay(rld)))
sampleDistMatrix <- as.matrix(sampleDists)
rownames(sampleDistMatrix) <- paste(meta$sample, meta$condition, sep="-")
colnames(sampleDistMatrix) <- NULL
colors <- colorRampPalette( rev(brewer.pal(9, "Blues")) )(255)
pheatmap(sampleDistMatrix,
         clustering_distance_rows=sampleDists,
         clustering_distance_cols=sampleDists,
         col=colors,
         main=paste(exp_prefix, "rld sample distance matrix"))
dev.off()

# plot PCA rld
pdf(pcaPlot)
plotPCA(rld, intgroup = c("condition")) + labs(title=paste0(exp_prefix,"-rld")) + geom_text_repel(aes(label=name))
dev.off()

# plot PCA vsd
pdf(pcaPlotVsd)
plotPCA(rld, intgroup = c("condition")) + labs(title=paste0(exp_prefix,"-vsd")) + geom_text_repel(aes(label=name))
dev.off()

# make contrasts
c = combn(unique(meta$condition), 2)
lsc=split(c, rep(1:ncol(c), each = nrow(c)))
message('writing contrast results...')
# write differential results for each contrast
for (k in 1:length(lsc)) {
    cl=lsc[[k]]
    rname=paste0(cl[1],"vs",cl[2])
    res=results(dds, 
            independentFiltering = TRUE,
            cooksCutoff = FALSE,
            contrast = c("condition", cl[1], cl[2]),
            alpha=0.05)

    # estimate shrinkage 
    resLFC <- lfcShrink(dds, coef=2, type="apeglm")

    #plotMA
    maplot=paste0(outdir,"/",exp_prefix,"/",exp_prefix,"-",rname,"plotMA.pdf")
    pdf(maplot)
    par(mfrow=c(1,2), mar=c(4,4,2,1))
    xlim <- c(1,1e5); ylim <- c(-2,2)
    plotMA(resLFC, xlim=xlim, ylim=ylim, alpha=0.05, main=paste(rname, "apeglm"))
    plotMA(res, xlim=xlim, ylim=ylim, alpha=0.05, main=paste(rname, "normal"))
    dev.off()

    tableName=paste0(outdir,"/",exp_prefix,"/",cl[1],'-',cl[2],'-',exp_prefix,'-','peaks-annot.tsv')

    # annottate peaks
    diffexp=res %>% 
        as_tibble(rownames='name') %>%
        left_join(peaks) %>%
        GRanges() %>%
        join_nearest(genestab, suffix=c(".peak", ".gene")) %>%
        as_tibble() 
    write_tsv(diffexp, path=tableName)

    # differential peak files lfc > 0 | lfc > 0  padj < 0.05 | padj < 0.01
    deup05=paste0(outdir,"/",exp_prefix,"/",cl[1],'-',cl[2],'-',exp_prefix,'-','differential-up-05.bed')
    dedown05=paste0(outdir,"/",exp_prefix,"/",cl[1],'-',cl[2],'-',exp_prefix,'-','differential-down-05.bed')
    deup01=paste0(outdir,"/",exp_prefix,"/",cl[1],'-',cl[2],'-',exp_prefix,'-','differential-up-01.bed')
    dedown01=paste0(outdir,"/",exp_prefix,"/",cl[1],'-',cl[2],'-',exp_prefix,'-','differential-down-01.bed')
    # sig up peaks
    diffexp %>%
        filter(log2FoldChange > 0, padj < 0.05) %>%
        select(seqnames, start, end, name.peak,name.gene,score,strand) %>%
        write_tsv(deup05, col_names=FALSE)

    # sig down peaks
    diffexp %>%
        filter(log2FoldChange < 0, padj < 0.05) %>%
        select(seqnames, start, end, name.peak,name.gene,score,strand) %>%
        write_tsv(dedown05, col_names=FALSE)

    # sig up peaks
    diffexp %>%
        filter(log2FoldChange > 0, padj < 0.01) %>%
        select(seqnames, start, end, name.peak,name.gene,score,strand) %>%
        write_tsv(deup01, col_names=FALSE)

    # sig down peaks
    diffexp %>%
        filter(log2FoldChange < 0, padj < 0.01) %>%
        select(seqnames, start, end, name.peak,name.gene,score,strand) %>%
        write_tsv(dedown01, col_names=FALSE)

    summaryName=paste0(outdir,"/",exp_prefix,"/",cl[1],"-",cl[2],"-",exp_prefix,"-diffexp-summary.txt")
    diffexp %>%
        summarise(Condition=paste0(exp_prefix,"-",rname),
                  DP05=nrow(filter(.,padj<0.05)),
                  DP01=nrow(filter(.,padj<0.01))) %>%
        write_tsv(summaryName)

    message("creating heatmap...")

    sig = diffexp %>% filter(padj < 0.05)

    # calculate heatmap if there are more than 10 sig peaks
    if (nrow(sig) > 10) {
        annot_filename=gsub(".tsv", "-05-clust.tsv", basename(tableName))

        # make direction annotation df 
        direction <- sig %>%
            select(name.peak,log2FoldChange) %>%
            mutate(direction=ifelse(log2FoldChange>0,"up","down")) %>%
            data.frame()

        rownames(direction) = direction[,1]
        direction[,1]=NULL

        # subset counts by sig peaks
        print(head(lnormCounts))
        counts <- rownames_to_column(lnormCounts, var="peak")  %>% as_tibble()
        print(head(counts))
        sigcounts <- counts %>% filter(peak %in% sig$name.peak)

        # organize samples by condition                               
        meta_conditions <- meta %>% group_by(condition) %>% 
            summarise(sample=paste(sample, collapse = ","))

        # rowMean counts for each condition
        for (c in meta_conditions$condition) {
            samples = filter(meta_conditions, condition==c)$sample
            s=unlist(strsplit(samples, ","))
            sigcounts <- sigcounts %>% mutate(!!c := rowMeans(sigcounts[s]))
        }

        # retain mean counts per condition
        sigcounts <- sigcounts[append("peak", meta_conditions$condition)]

        # scale the counts
        scaled <- t(apply(as.matrix(sigcounts[,-1]), 1, scale))
        colnames(scaled) = colnames(sigcounts[-1])

        rownames(scaled) <- sigcounts$peak
        # generate clusters
        clust=hclust(dist(scaled), method="ward.D2")
        clustdf=data.frame(cluster=cutree(as.dendrogram(clust), k = numclusters))
        annots=merge.data.frame(clustdf, direction, by=0, all = T) %>% select(-log2FoldChange)
        rownames(annots)=annots[,1]
        annots[,1]=NULL

        # get a numebr of unique colors
        cols=RColorBrewer::brewer.pal(numclusters, name="Set1")

        # name the colors by cluster
        names(cols) = unique(clustdf$cluster)

        # set cluster colors
        colors = list(
        cluster=cols
        )	

        heattitle=paste0(exp_prefix,"-",rname)
        # create heatmap
        heat <- pheatmap(scaled,
            main=heattitle,
            show_rownames = F,
            show_colnames = T,
            cluster_rows = T,
            cluster_method = "ward.D2",
            annotation_row = annots,
            annotation_colors = colors)

        save_pheatmap <- function(x, filename, width=7, height=7) {
            if (tolower(tools::file_ext(filename)) == "png") {
                png(filename = filename, width = width, height = height)
            }
            else {
                pdf(filename, width = width, height = height)
            }
            grid::grid.newpage()
            grid::grid.draw(x$gtable)
            dev.off()
        }

        heatmap_filename=paste0(outdir,"/",exp_prefix,"/",cl[1],"-",cl[2],"-",exp_prefix,"-heatmap.pdf")
        save_pheatmap(heat, heatmap_filename)

        annot_filename=gsub(".tsv", "-05-clust.tsv", tableName)
        # annotate cluster and direction
        annots[["name.peak"]] = rownames(annots)
        sig %>% left_join(annots) %>% write_delim(annot_filename, delim="\t")
    }
}

# save RDS of deseq object
saveRDS(dds, rds)
4
samtools view $1 | awk '$9>0' | cut -f 9 | sort | uniq -c | sort -b -k2,2n | sed -e 's/^[ \t]*//' | sed 's/ /\t/g' | awk -v OFS='\t' '{print $2,$1}' > $2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
while getopts "m:g:s:p:" op
do
	case "$op" in
		m)  mark="$OPTARG";;
		g)  genome="$OPTARG";;
		s)  slurm="$OPTARG";;
		p)  p="$OPTARG";;
		\?) exit 1;;
	esac
done

if [ -f $genome ];
then
	echo "I found the FASTA file!"
else
	echo -e "$g does not exist. Exiting program."
	exit
fi

if [[ $slurm == 1 || $slurm == 0 ]]; then
	true
else
	echo "ERROR: -s must be either 0 or 1 for SLURM submission."
	exit 1
fi

log_file_exists() {

	if [ -f $1 ]; then
		rm $1
		touch $1
	fi

	if [ ! -f $1 ]; then
		touch $1
	fi

}

# detect contrast combinations per mark
contrasts=$(find "data/deseq2/${mark}" -name "*.bed" | cut -d/ -f4 | sed -E "s/([a-zA-Z0-9.-]+)-($mark)-([a-zA-Z0-9-]+)(.bed)/\1/" | sort | uniq)

# HOMER analysis per contrast -----------------------------------------------------------

for contrast in $contrasts; do

	echo "assessing contrast $contrast in mark $mark"

	up_05="data/deseq2/${mark}/${contrast}-${mark}-differential-up-05.bed"
	up_01="data/deseq2/${mark}/${contrast}-${mark}-differential-up-01.bed"
	down_05="data/deseq2/${mark}/${contrast}-${mark}-differential-down-05.bed"
	down_01="data/deseq2/${mark}/${contrast}-${mark}-differential-down-01.bed"

	declare -a differential_peaks=($up_05 $up_01 $down_05 $down_01)

	echo -e "Differential peaks files:\n${differential_peaks[@]}"

	# assess and run HOMER per bed file
	for file in ${differential_peaks[@]}; do

		# check if file exists
		if [ -f "$file" ]; then

			echo "$file detected"
			peak_count=$(cat $file | wc -l)

			echo "Peak count: $peak_count"

			# run HOMER if >= 10 peaks
			if [ "$peak_count" -ge 10 ]; then

				# extract dir,sig info for the run.
				echo "Running HOMER for $peak_count peaks in $file"
				directionality=$(basename $file | sed -E "s/([a-zA-Z0-9._-]+)-(down|up)-([0-9]+)(.bed)/\2/") # "up" or "down"
				sig=$(basename $file | sed -E "s/([a-zA-Z0-9._-]+)-(down|up)-([0-9]+)(.bed)/\3/") # e.g. significance level (01 or 05)

				# file I/O
				output_dir="data/homer/${contrast}-${mark}-${directionality}-${sig}"
				log="data/logs/homer_${contrast}-${mark}-${directionality}-${sig}.log"
				log_file_exists $log

				# HOMER local run/submission to SLURM
				if [ $slurm == 0 ]; then
					findMotifsGenome.pl $file $genome $output_dir -size 200 -p $p > $log 2>&1 &
				fi

				if [ $slurm == 1 ]; then
					if [ ! -d "jobs/homer" ]; then
						mkdir -p jobs/homer
					fi
					job_out="jobs/homer/homer-${contrast}-${mark}_%j.out"
					job_err="jobs/homer/homer-${contrast}-${mark}_%j.err"
					sbatch -e $job_err -o $job_out --job-name "HOMER" --time "02:00:00" --mem="8G" --cpus-per-task=$p --wait --wrap="findMotifsGenome.pl $file $genome $output_dir -size 200 -p $p > $log 2>&1" &
				fi
			fi
		fi
	done
done

wait
touch data/homer/$mark.done

exit
Shell From line 3 of src/homer.sh
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
usage(){
	echo "Usage: " 
        echo "  $0 -c chrom_sizes_file -o output_file rep1.bw rep2.bw rep3.bw"
	exit 1
}

file_exits(){
	local f="$1"
	[[ -f "$f" ]] && return 0 || return 1
}

cleanup() {
    rm ${out_name}.tmp.bdg &> /dev/null
}

[[ $# -eq 0 ]] && usage

script_args=()
while [ $OPTIND -le "$#" ]
do
    if getopts c:o: option
    then
        case $option
        in
            c) chrom_size="$OPTARG";;
            o) out_name="$OPTARG";;
        esac
    else
        script_args+=("${!OPTIND}")
        ((OPTIND++))
    fi
done


if [ -z "$chrom_size" ] 
then 
    echo "Please supply the chrom.sizes file." 
    usage
else 
    if ( ! file_exits "$chrom_size" )
    then 
        echo "error: file $chrom_size not found"
        usage 
    fi
fi


if [ -z "$out_name" ] 
then 
    echo "Please supply an output file name."
    usage
fi

echo "Using chrom.sizes file: $chrom_size" 
echo "Output file name will be: $out_name" 

# require bigWigMerge
if ! command -v wiggletools &> /dev/null
then
    echo "wiggletools not found"
    exit
fi

# require bigWigToBedGraph
if ! command -v bigWigToBedGraph &> /dev/null
then 
    echo "bigWigToBedGraph no found"
    exit
fi


echo "merging the following files..."
echo ${script_args[@]}

cmd="wiggletools mean ${script_args[@]} | grep -vE '_|EBV|GL|JH' | awk 'NF==4{print}' | sort -S 8G -k1,1 -k2,2n > ${out_name}.tmp.bdg"
echo $cmd
eval $cmd

if [ $? -eq 0 ]; then
    echo "..."
else
    echo "Command wiggletools failed to complete..."
    echo "quitting.."
    cleanup
    exit 1
fi

# convert to bigwig
bedGraphToBigWig ${out_name}.tmp.bdg $chrom_size $out_name

if [ $? -eq 0 ]; then
    echo "..."
else
    echo "Command bedGraphToBigWig failed..."
    echo "quitting.."
    cleanup
    exit 1
fi

cleanup
echo "done"
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
input_bed=$1
input_bam=$2
output_png=$3
output_tsv=$4
logfile=$5

if [ -s ${input_bed} ]
then
	plotEnrichment -b ${input_bam} --BED ${input_bed} --regionLabels 'frip' --outRawCounts ${output_tsv} -o ${output_png} > ${logfile} 2>&1
else
	total_read_count=$(samtools view ${input_bam} | wc -l)
	echo "file	featureType	percent	featureReadCount	totalReadCount" > ${output_tsv} #creates output file with column headers
	echo "${input_bam}	frip	0	0	${total_read_count}" >> ${output_tsv} #appends to existing output file with name of sample and 0 counts
	touch ${output_png}
fi
Shell From line 6 of src/skip_frip.sh
ShowHide 14 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/maxsonBraunLab/cutTag-pipeline
Name: cuttag-pipeline
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...