Multimodal chromatin profiling using nanobody-based single-cell CUT&Tag

public public 1yr ago 0 bookmarks

Multimodal chromatin profiling using nanobody-based single-cell CUT&Tag

Marek Bartosovic, Goncalo Castelo-Branco

Code repository related to preprint.

Multimodal chromatin profiling using nanobody-based single-cell CUT&Tag Marek Bartosovic, Gonçalo Castelo-Branco bioRxiv 2022.03.08.483459; doi: https://doi.org/10.1101/2022.03.08.483459

Data availability

Processed files - Seurat objects, fragments file (cellranger), bigwig tracks per cluster and .h5 matrices are available as supplementary files in the GEO repository https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE198467

Reproducing the analysis

Step 1: prepare environment

conda environment is provided in env/environment.yaml

conda env create -f env/environment.yaml

Additional package dependencies that need to be installed:

R
install.packages(c('argparse','ggplot2','funr','Signac','scales','Seurat','rmarkdown','mclust','GGally','BiocManager','patchwork','markdown','UpSetR','pheatmap','viridis','purrr','Rmagic','devtools','raster'))
BiocManager::install(c('ensembldb','EnsDb.Mmusculus.v79','GenomeInfoDb', 'GenomicRanges', 'IRanges', 'Rsamtools','BiocGenerics','rtracklayer','limma','slingshot','BiocGenerics', 'DelayedArray', 'DelayedMatrixStats','limma', 'S4Vectors', 'SingleCellExperiment','SummarizedExperiment', 'batchelor', 'Matrix.utils')))

Have seqtk installed in $PATH

https://github.com/lh3/seqtk

Install papermill for cli for jupyter notebooks

python3 -m pip install papermill

Step 2: Download the data

use fasterq-dump or alternative to download the fastq files

https://github.com/ncbi/sra-tools/wiki/HowTo:-fasterq-dump

GEO repository

https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE198467

Step 3: Clone the github repo with analysis code

git clone https://github.com/mardzix/bcd_nano_CUTnTag

Step 4: Modify config

Change config/config.yaml and specify path to

  1. Absolute path to fastq files

  2. Specific path to the tmp folder (Create any folder, e.g. in home)

  3. Specify conda environment to use

  4. Modify path to cellranger reference

Step 5: Run the pipeline

Pipeline is implemented in workflow management software Snakemake

Change the cluster-specific profile to your preference (e.g. slurm, condor etc.), or run without profile

For some example profiles see:

  • https://snakemake.readthedocs.io/en/stable/executing/cli.html#profiles

  • https://github.com/Snakemake-Profiles/slurm

  • https://github.com/Snakemake-Profiles/htcondor

snakemake --snakefile code/workflow/Snakefile_single_modality.smk --cores 16 --profile htcondor -p

Johannes Köster, Sven Rahmann, Snakemake—a scalable bioinformatics workflow engine, Bioinformatics, Volume 28, Issue 19, 1 October 2012, Pages 2520–2522, https://doi.org/10.1093/bioinformatics/bts480

Code Snippets

25
26
shell:
    'Rscript {input.script} -i {input.seurat} -m {wildcards.combination} -o {output}'
39
40
41
42
43
shell:
    "Rscript -e \"rmarkdown::render(input='{input.notebook}', "
    "                                output_file = '{params.report}', "
    "                                params=list(seurat = '{params.seurat_in}', "
    "                                           output = '{params.seurat_out}'))\" "
52
53
shell:
    "Rscript {input.script} --seurat {input.seurat} --clusters OPC MOL --reduction wnn.umap --out {output.seurat}"
68
69
shell:
    'Rscript {input.script} -i {input.seurat} -o {output.seurat} {params.clusters} -d {params.idents} -m {params.modalities} -a {params.assay} -t {input.seurat_pt}'
39
40
shell:
    "fasterq-dump -t {params.tmp} -f -e {threads} --split-files --include-technical -o {params.out} {wildcards.SRA}"
47
48
shell:
    "gzip {input}"
55
56
shell:
    "mv {input} {output}"
70
71
72
73
shell:
    'rm -r results/nbiotech_data/cellranger/{wildcards.sample}/; '
    'cd results/nbiotech_data/cellranger/; '
    '/data/bin/cellranger-atac count --id {wildcards.sample} --sample {params.sample} --reference {params.cellranger_ref} --fastqs {params.fastq_dir}'
82
83
84
shell:
    'bamCoverage -b {input.cellranger_bam} -o {output.bigwig} -p {threads} --minMappingQuality 5 '
    ' --binSize 50 --centerReads --smoothLength 250 --normalizeUsing RPKM --ignoreDuplicates --extendReads'
93
94
95
shell:
    'macs2 callpeak -t {input.cellranger_bam} -g mm -f BAMPE -n {wildcards.sample} '
    '--outdir {params.macs_outdir} --keep-dup=1 --llocal 100000 --cutoff-analysis --min-length 1000 --max-gap 1000  2>&1 '
104
105
106
shell:
    'macs2 callpeak -t {input.cellranger_bam} -g mm -f BAMPE -n {wildcards.sample} '
    '--outdir {params.macs_outdir} --keep-dup=1 --llocal {wildcards.llocal} 2>&1 '
115
116
117
118
shell:
    'macs2 callpeak -t {input} -g mm -f BAMPE -n {wildcards.sample} '
    '--outdir {params.macs_outdir} --llocal 100000 --keep-dup=1 --broad-cutoff=0.1 ' 
    '--min-length 1000 --max-gap 1000 --broad 2>&1 '
125
126
shell:
    "wget -O {output} {params.url}"
134
135
shell:
    "bedtools genomecov -bg -g {input.genome} -i {input.fragments} > {output}"
144
145
shell:
    "~/bin/SEACR/SEACR_1.3.sh {input} 0.01 norm relaxed {params.out_prefix}"
155
156
157
shell:
    'python3 {params.script} {input.fragments} {wildcards.sample} | bgzip > {output.fragments}; '
    'tabix -p bed {output.fragments}'
169
170
171
172
shell:
    'bedtools intersect -abam {input.bam} -b {input.peaks} -u | samtools view -f2 | '
    'awk -f {params.get_cell_barcode} | sed "s/CB:Z://g" | python3 {params.add_sample_to_list} {wildcards.sample} | '
    'sort -T {params.tmpdir} | uniq -c > {output.overlap} && [[ -s {output.overlap} ]] ; '
183
184
185
186
shell:
  ' samtools view -f2 {input.bam}| '
  'awk -f {params.get_cell_barcode} | sed "s/CB:Z://g" | python3 {params.add_sample_to_list} {wildcards.sample} | '
  'sort -T {params.tmpdir} | uniq -c > {output.all_bcd} && [[ -s {output.all_bcd} ]] ; '
205
206
shell:
    "Rscript {params.script} --metadata {input.metadata} --fragments {input.fragments} --bcd_all {input.bcd_all} --bcd_peak {input.bcd_peak} --sample {wildcards.sample} --sample {wildcards.sample} --out_prefix {params.out_prefix}"
220
221
shell:
    "Rscript {input.script} --sample {wildcards.sample} --antibody {params.antibody} --metadata {input.metadata}  --fragments {input.fragments} --peaks {input.peaks} --out_prefix {params.out_prefix} --window {wildcards.binwidth} --genome_version {params.genome}"
228
229
shell:
    'wget -O {output} {params.url}'
238
239
240
shell:
    'cd {params.dirname}; '
    'tar -xvzf `basename {input.archive}`'
247
248
shell:
    'wget -O {output.bw_tar} {params.url}'
287
288
289
shell:
    'cd {params.dirname}; '
    'tar -xvzf `basename {input.bw_tar}`'
64
65
shell:
    "python3 {input.script} -i {input.fastq} -o {params.out_folder} --single_cell --barcode {wildcards.barcode} 2>&1"
80
81
82
83
shell:
    'rm -rf results/multimodal_data/{wildcards.sample}/cellranger/{wildcards.sample}_{wildcards.antibody}_{wildcards.barcode}/; '
    'cd results/multimodal_data/{wildcards.sample}/cellranger/; '
    '/data/bin/cellranger-atac count --id {wildcards.sample}_{wildcards.antibody}_{wildcards.barcode} --reference {params.cellranger_ref} --fastqs {params.fastq_folder}'
91
92
93
shell:
    'bamCoverage -b {input.cellranger_bam} -o {output.bigwig} -p {threads} --minMappingQuality 5 '
    ' --binSize 50 --centerReads --smoothLength 250 --normalizeUsing RPKM --ignoreDuplicates --extendReads'
102
103
104
shell:
    'macs2 callpeak -t {input.cellranger_bam} -g mm -f BAMPE -n {wildcards.antibody} '
    '--outdir {params.macs_outdir} --keep-dup=1 --llocal 100000 --cutoff-analysis --min-length 1000 --max-gap 1000  2>&1 '
113
114
115
shell:
    'macs2 callpeak -t {input.cellranger_bam} -g mm -f BAMPE -n {wildcards.antibody} '
    '--outdir {params.macs_outdir} --keep-dup=1 --llocal {wildcards.llocal} 2>&1 '
124
125
126
127
shell:
    'macs2 callpeak -t {input} -g mm -f BAMPE -n {wildcards.antibody} '
    '--outdir {params.macs_outdir} --llocal 100000 --keep-dup=1 --broad-cutoff=0.1 ' 
    '--min-length 1000 --max-gap 1000 --broad 2>&1 '
134
135
shell:
    'cut -f1,2 {params.faidx} > {output}'
143
144
shell:
    "bedtools genomecov -bg -g {input.genome} -i {input.fragments} > {output}"
153
154
shell:
    "~/bin/SEACR/SEACR_1.3.sh {input} 0.01 norm relaxed {params.out_prefix}"
164
165
166
shell:
    'python3 {params.script} {input.fragments} {wildcards.sample} | bgzip > {output.fragments}; '
    'tabix -p bed {output.fragments}'
178
179
180
181
shell:
    'bedtools intersect -abam {input.bam} -b {input.peaks} -u | samtools view -f2 | '
    'awk -f {params.get_cell_barcode} | sed "s/CB:Z://g" | python3 {params.add_sample_to_list} {wildcards.sample} | '
    'sort -T {params.tmpdir} | uniq -c > {output.overlap} && [[ -s {output.overlap} ]] ; '
192
193
194
195
shell:
  ' samtools view -f2 {input.bam}| '
  'awk -f {params.get_cell_barcode} | sed "s/CB:Z://g" | python3 {params.add_sample_to_list} {wildcards.sample} | '
  'sort -T {params.tmpdir} | uniq -c > {output.all_bcd} && [[ -s {output.all_bcd} ]] ; '
214
215
shell:
    "Rscript {params.script} --metadata {input.metadata} --fragments {input.fragments} --bcd_all {input.bcd_all} --bcd_peak {input.bcd_peak} --antibody {wildcards.modality} --sample {wildcards.sample} --out_prefix {params.out_prefix}"
227
228
shell:
    "Rscript {input.script} --sample {wildcards.sample}   --antibody {wildcards.antibody} --metadata {input.metadata} --fragments {input.fragments} --out_prefix {params.out_prefix} --window {wildcards.binwidth} --genome_version {params.genome}"
242
243
244
shell:
    "Rscript {input.script} --sample {wildcards.sample}   --antibody {wildcards.modality} --metadata {input.metadata} --fragments {input.fragments} " \ 
    " --peaks {input.peaks} --out_prefix {params.out_prefix} --genome_version {params.genome}"
255
256
shell:
    'zcat {input.fragments} | sort -T {params.tmpdir} -k1,1 -k2,2n | bgzip > {output.fragments_merged} && tabix -p bed {output.fragments_merged}'
268
269
270
271
272
shell:
    "bedToBam -i {input.fragments} -g {input.chrom_sizes} > {output.bam} && "
    "samtools sort -@ {threads} -o {output.bam_sorted} {output.bam} &&"
    "samtools index {output.bam_sorted} && "
    "bamCoverage -b {output.bam_sorted} -o {output.bigwig} -p {threads} --minMappingQuality 5 --binSize 50 --smoothLength 250 --normalizeUsing RPKM --ignoreDuplicates"
281
282
283
284
shell:
    'macs2 callpeak -t {input} -g mm -f BED -n {wildcards.modality} '
    '--outdir {params.macs_outdir} --llocal 100000 --keep-dup=1 --broad-cutoff=0.1 ' 
    '--min-length 1000 --max-gap 1000 --broad --nomodel 2>&1 '
293
294
shell:
    'Rscript {input.script} -i {input.seurat} -o {output.seurat}'
307
308
shell:
    'Rscript {params.script} -i {input.seurat} -o {output.seurat} -a {wildcards.feature} -d 40 -g {params.plot_group} '
54
55
shell:
    "Rscript {input.script} -i {input.seurat} -r {input.rna} -o {output}"
68
69
70
71
72
73
74
75
76
shell:
    "Rscript -e \"rmarkdown::render(input='{input.notebook}', "
    "                                output_file = '{params.report}', "
    "                                params=list(out_prefix = '{params.out_prefix}', "
    "                                           modality = '{wildcards.modality}', "
    "                                           feature = '{wildcards.feature}', "
    "                                           input = '{params.out_prefix}{input.seurat}', "
    "                                           integrated = '{params.out_prefix}{input.integrated}', "                # TODO - fix absolute paths integration                         
    "                                           output = '{params.out_prefix}{output.seurat}'))\" "
85
86
shell:
    'Rscript {input.script} --input {input.seurat} --output {output.seurat}'
95
96
shell:
    "Rscript {input.script}  --input {input.seurat} --fragments {input.fragments} --output_folder {output.bigwig} --idents {wildcards.idents} "
105
106
shell:
    "Rscript {input.script} -i {input.seurat} -o {output.markers} --idents {wildcards.idents}"
114
115
shell:
    'Rscript {input.script} -i {input.seurat} -o {output}'
123
124
shell:
    "Rscript {input.script} -i {input.seurat} -o {output.csv} -d {wildcards.ident}"
132
133
shell:
    "python3 {input.script} {input.bam} {wildcards.sample} {output.bam}"
151
152
shell:
    "samtools merge -@ {threads} {output.bam} {input.bam}"
160
161
162
shell:
    'samtools index {input.bam}; '
    'bamCoverage -b {input.bam} -o {output.bw} -p {threads} --normalizeUsing RPKM'
172
173
shell:
    "python3 {input.script} {input.bam} {input.table} NA {output.bam_files}"
183
184
shell:
    'sh {input.script} {input.bam} {output.bw} {threads}'
194
195
196
shell:
    'cat {input.csv} | grep {wildcards.sample} > {output.csv_per_sample}; '
    'python3 {input.script} {input.bam} {output.csv_per_sample} NA {output.bam_per_cluster}'
205
206
shell:
    'sh {input.script} {input.bam} {output.bw} {threads}'
214
215
shell:
    'sh {input.script}  {input.bam} {output.peaks}'
225
226
shell:
    'Rscript {input.script} --input {input.csv} --nmarkers {params.nmarkers} --output {output.bed}'
236
237
shell:
    'python3 {params.script} {input} > {output}'
ShowHide 56 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/mardzix/bcd_nano_CUTnTag
Name: bcd_nano_cutntag
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...