SpaceMicrobe Snakemake Workflow for 10X Visium Spatial Gene Expression Data

public public 1yr ago 0 bookmarks

The SpaceMicrobe Snakemake workflow is part of the SpaceMicrobe computational framework to detect microbial reads in 10X Visium Spatial Gene Expression data.

The Snakemake workflow requires that spaceranger count has already been run on the spatial transcriptomcics dataset. The input file required for the Snakemake workflow is the possorted_genome_bam.bam file from the spaceranger count outs folder.

The Snakemake workflow outputs the taxonomic classifications of the reads (a modified Kraken2 output file), that have to be further processed with the R package microbiome10XVisium .

Graph

Overview of the Snakemake workflow:

Reads that did not align to the host transcriptome/genome are extracted (samtools_view), the molecular (UMI) and spatial (10X barcode) information of the reads are preserved in read2 (umi) and quality control on read2 is performed (cutadapt and fastp), in order to remove adapters and poly-A tails, perform quality trimming and enforce a minimum read length. Then the metagenomic profiler Kraken2 is used to perform taxonomic classification of the reads (classify).

Snakemake rule graph


Code Snippets

71
72
73
74
75
76
77
78
79
shell:
    """
    samtools view \
        --output-fmt BAM \
        --bam \
        --require-flags 4 \
        --threads {threads} \
        {input} > {output}
    """
93
94
95
96
97
98
99
shell:
    """
    samtools sort -n \
        --output-fmt BAM \
        --threads {threads} \
        {input} > {output}
    """
118
119
120
121
122
123
124
125
126
127
shell:
    """
    rm -rf {params.dir}/fastq && \
    mkdir -p {params.dir} && \
    bamtofastq \
        --nthreads={threads} \
        {input} \
        {params.dir}/fastq \
    && touch {output[0]}
    """
SnakeMake From line 118 of main/Snakefile
144
145
146
147
148
149
150
shell:
    """
    rm -f {output}
    for name in {params.dir}/*/*_R1_*.fastq.gz; do 
       cat $name >> {output}
    done
    """
SnakeMake From line 144 of main/Snakefile
166
167
168
169
170
171
172
shell:
    """
    rm -f {output}
    for name in {params.dir}/*/*_R2_*.fastq.gz; do 
       cat $name >> {output}
    done
    """        
SnakeMake From line 166 of main/Snakefile
190
191
192
193
194
195
196
197
198
shell:
    """
    umi_tools extract \
    --bc-pattern={params.bc} \
    --stdin {input.fq1} \
    --stdout {output.fq1} \
    --read2-in {input.fq2} \
    --read2-out={output.fq2}
    """
222
223
224
225
226
227
228
229
230
231
232
shell:
    """
    cutadapt \
        --front {params.adaptor} \
        --front {params.polyA} \
        --times 2 \
        --minimum-length 31 \
        --cores {threads} \
        --output {output.fq} \
        {input} > {output.report}
    """
257
258
259
260
261
262
263
264
265
266
267
268
269
270
shell:
    """
     fastp --in1 {input} \
         --cut_right_window_size 4 \
         --cut_right_mean_quality 20 \
         --disable_adapter_trimming \
         --length_required 31 \
         --thread {threads} \
         --dont_eval_duplication \
         --trim_poly_x \
         --out1 {output.fq} \
         --html {output.html} \
         --json {output.json}
    """
291
292
293
294
295
296
297
298
299
300
301
302
303
shell:
    """
    kraken2 --db {params.db} \
        --memory-mapping \
        --confidence 0.1 \
        --threads {threads} \
        --use-names \
        --gzip-compressed \
        --report-minimizer-data \
        --output {output.txt} \
        --report {output.report} \
        {input}
    """
318
319
320
321
322
323
324
325
326
327
328
shell:
    """
    ### extract only classified reads & remove all human reads
    sed -e '/^U/d' -e '/sapiens/d' {input} > {output.f}
    ### extract spatial BC & UMIs into separate tabs
    sed -i 's/\_/\t/g' {output.f}
    ### extract taxid into separate tab
    sed -i -e 's/ (taxid /\t/g' -e 's/)//g' {output.f}
    ### extract only CB, UMI, taxid columns
    awk -F \"\t\" '{{ print $3,\"\t\",$4,\"\t\",$6 }}' {output.f} > {output.p}
    """
SnakeMake From line 318 of main/Snakefile
350
351
352
353
354
355
356
357
358
359
360
shell:
    """
    multiqc \
        --force \
        --module fastp \
        --module kraken \
        --module cutadapt \
        --outdir {params.outdir} \
        {params.indir} \
    && rm -rf {BAMDIR}
    """
ShowHide 5 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/bedapub/space-microbe
Name: space-microbe
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...