Single-Cell Transposable Element Analysis Workflow (scTE-snakemake)

public public 1yr ago 0 bookmarks

Alec Pankow 2022-10-04

A Snakemake workflow for running scTE (single-cell transposable element) quantitation from scRNAseq data with a single command. The scTE pipeline is as described in He et al. 2021 (Nature Communications) .

Quick start

Clone and install snakemake (with conda environment)

git clone https://github.com/alecpnkw/starsolo-scte-snakemake.git
cd starsolo-scte-snakemake
conda env create --file environment.yaml

Modify the configuration file ( config/config.yaml ) to suit your run:

# options include hg38 for human, mm10 for mouse
# for others see https://github.com/JiekaiLab/scTE
genome: 
 name: "hg38"
 fasta: "resources/genome.fa"
 gtf: "resources/annot.gtf"
# by default, assumes R2 fastq contains the cell barcode / UMI
R1_fastqs: "resources/<path-to-R1.fq>"
R2_fastqs: "resources/<path-to-R2.fq>"
# scte params
scte_min_counts: 3000
scte_expect_cells: 30000
# starsolo cell barcode / UMI configuration
soloCBstart: 1
soloCBlen: 16
soloUMIstart: 17
soloUMIlen: 12
umi_whitelist: "<path-to-umi-whitelist>"

Preview and run snakemake (see documentation for full list of options)

# preview
snakemake --dry-run
# run on cluster using --profile with conda envs
snakemake \
 --jobs <n> \
 --use-conda \
 --profile <cluster-profile> \
 --keep-going \
 --conda-prefix <path-to-conda-envs-dir>

Code Snippets

 6
 7
 8
 9
10
11
shell:
    """
    git clone https://github.com/alecpnkw/scTE.git "resources/scTE"
    cd resources/scTE
    python setup.py install
    """
26
27
28
29
30
31
32
shell:
    """
    scTE_build \
        -g {wildcards.genome} \
        -m {wildcards.mode} \
        -o {params.prefix}
    """
41
42
43
44
shell:
    """
    samtools view {input} -h | awk '/^@/ || /CB:/' | samtools view -h -b > {output}
    """
62
63
64
65
66
67
68
69
70
71
shell:
    """
    scTE -i {input.bam} \
        -o {params.prefix} \
        -x {input.index} \
        -p 96 \
        --keeptmp True \
        --expect-cells {params.expect} \
        --min_counts {params.min_counts}
    """
12
13
14
15
16
17
18
19
20
shell:
    """
    STAR --runMode genomeGenerate \
        --runThreadN {threads} \
        --genomeDir {output} \
        --genomeFastaFiles {input.fasta} \
        --sjdbGTFfile {input.gtf} \
        --genomeSAsparseD 3
    """
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
shell:
    """
    STAR --runThreadN 48 \
        --soloType CB_UMI_Simple \
        --soloCBwhitelist {params.whitelist} \
        --soloCBstart {params.CBstart} \
        --soloCBlen {params.CBlen} \
        --soloUMIstart {params.UMIstart} \
        --soloUMIlen {params.UMIlen} \
        --genomeDir {input.genome} \
        --readFilesIn {params.fastq_str} \
        --readFilesCommand zcat \
        --outSAMattributes NH HI nM AS CR CY UR UY CB UB GX GN sS sQ sM \
        --outSAMtype BAM SortedByCoordinate \
        --soloUMIfiltering MultiGeneUMI \
        --soloCBmatchWLtype 1MM_multi_pseudocounts \
        --limitBAMsortRAM 16111457846 \
        --outFilterMultimapNmax 100 \
        --winAnchorMultimapNmax 100 \
        --outSAMmultNmax 1 \
        --twopassMode Basic \
        --runRNGseed 42 \
        --outFileNamePrefix {params.prefix}
    """
ShowHide 3 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/alecpnkw/starsolo-scte-snakemake
Name: starsolo-scte-snakemake
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: MIT License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...