Single-Cell RNA Sequencing Analysis Workflow Using Drop-seq Tools on SGE Cluster Environment

public public 1yr ago 0 bookmarks

This pipeline is based on snakemake and the dropseq tools provided by the McCarroll Lab . It starts from raw sequencing data and outputs digital gene expression matrices (DGEs).

WE ARE CURRENTLY USING AN UPDATED VERSION HERE

Updates

We have since moved on from using this method and now use an adapted version of dropSeqPipe which we have found to be much more sophisticated and flexible than our snakemake implementation of Drop-seq Tools .

We have made a number of changes to the original workflow:

1). We have adapted the dropSeqPipe workflow to work on a SGE Cluster Environment , specifically the UCLA Hoffman2 Cluster .

2). We have made the workflow compatible with the new pipeline dropEST which is capable of handling reads aligning to intronic regions and results in a large boost in cell number and genes/UMIs per cell in our hands.

If you would like to use this updated snakemake workflow, please go here .

Requirements

snakemake is the workflow management which wraps Drop-seq Tools which runs everything

Drop-seq Tools can be downloaded from here and a manual of the various functions is available here

STAR is used to map the reads.

Changes to config file

In order to run the workflow, you must modify the config.yaml file with your sample information. Simply provide the "SampleName" for each sample for read1 of your paired end fastq files.

Changes to Snakefile

The following changes must be made to the Snakefile

1). If you are running in a SGE Cluster Environment and you would like email notifications of your job status, add your email to the email parameter.

2). If you would like to give your cluster jobs a specific prefix, you can specify that with job_label

3). Provide a path to your STAR Index for your reference genome with: STARREFDIR

4). Give the paths to your reference genome fasta file: REF_SEQ and gtf file: GTF_FILE

5). Provide the path to the Drop-seq Tools directory: DropSeqTools

6). Provide the path to your STAR executable: STAR

7). If necessary, provide the path to your particular java version executable: JAVA_distro

Running the Workflow

Run the snakemake workflow with the following command:

snakemake -j 100 --cluster "qsub {params.sge_opts}"

Code Snippets

52
53
54
55
56
57
58
59
60
shell:
    """
    {JAVA_distro} -jar {picard_jar} FastqToSam \
    F1={input.fastq1} \
    F2={input.fastq2} \
    O={output.bam} \
    SM={wildcards.sample} \
    SORT_ORDER=queryname
    """
71
72
73
74
75
shell:
    """
    {TagBamWithReadSequenceExtended} BASE_RANGE=1-12 BASE_QUALITY=10 BARCODED_READ=1 DISCARD_READ=false TAG_NAME=XC NUM_BASES_BELOW_QUALITY=1 \
    INPUT={input.bam} SUMMARY={output.summary} OUTPUT={output.tagged_cell}
    """
85
86
87
88
89
shell:
    """
    {TagBamWithReadSequenceExtended} BASE_RANGE=13-20 BASE_QUALITY=10 BARCODED_READ=1 DISCARD_READ=true TAG_NAME=XM NUM_BASES_BELOW_QUALITY=1 \
    INPUT={input.tagged_cell} SUMMARY={output.summary} OUTPUT={output.tagged_molecule} 
    """
 98
 99
100
101
102
shell:
    """
    {FilterBAM} TAG_REJECT=XQ \
    INPUT={input.tagged_molecule} OUTPUT={output.tagged_filtered}
    """
112
113
114
115
116
shell:
    """
    {TrimStartingSequence} SEQUENCE=AAGCAGTGGTATCAACGCAGAGTGAATGGG MISMATCHES=0 NUM_BASES=5 \
    INPUT={input.tagged_filtered} OUTPUT_SUMMARY={output.summary} OUTPUT={output.trimmed_smart}
    """
SnakeMake From line 112 of master/Snakefile
126
127
128
129
130
shell:
    """
    {PolyATrimmer} MISMATCHES=0 NUM_BASES=6 \
    INPUT={input.trimmed_smart} OUTPUT_SUMMARY={output.summary} OUTPUT={output.unmapped_bam}
    """
SnakeMake From line 126 of master/Snakefile
140
141
142
143
144
shell:
    """
    java -Xmx500m -jar {picard_jar} SamToFastq \
    INPUT={input.unmapped_bam} FASTQ={output.fastq}
    """
SnakeMake From line 140 of master/Snakefile
154
155
156
157
158
shell:
    """
    {STAR} --genomeDir {input.starref} --runThreadN 6 \
    --readFilesIn {input.fastq} --outFileNamePrefix Output/{wildcards.sample}/star.
    """
SnakeMake From line 154 of master/Snakefile
169
170
171
172
173
shell:
    """
    java -Dsamjdk.buffer_size=131072 -XX:GCTimeLimit=50 -XX:GCHeapFreeLimit=10 -Xmx4000m -jar {picard_jar} SortSam \
    INPUT={input.aligned_sam} OUTPUT={output.aligned_sorted_bam} SORT_ORDER=queryname TMP_DIR={output.temp_dir}
    """
SnakeMake From line 169 of master/Snakefile
185
186
187
188
189
shell:
    """
    java -Xmx4000m -jar {picard_jar} MergeBamAlignment INCLUDE_SECONDARY_ALIGNMENTS=false PAIRED_RUN=false \
    REFERENCE_SEQUENCE={input.reference_seq} UNMAPPED_BAM={input.unmapped_bam} ALIGNED_BAM={input.aligned_sorted_bam} OUTPUT={output.merged}
    """
SnakeMake From line 185 of master/Snakefile
199
200
201
202
203
shell:
    """
    {TagReadWithGeneExon} TAG=GE CREATE_INDEX=true \
    INPUT={input.merged} ANNOTATIONS_FILE={input.gtf} O={output.gene_exon_tagged}
    """
SnakeMake From line 199 of master/Snakefile
214
215
216
217
218
shell:
    """
    {DetectBeadSynthesisErrors} NUM_BARCODES={Number_Barcodes} PRIMER_SEQUENCE=AAGCAGTGGTATCAACGCAGAGTAC \
    I={input.gene_exon_tagged} O={output.gene_exon_tagged_clean} OUTPUT_STATS={output.stats} SUMMARY={output.summary}
    """
SnakeMake From line 214 of master/Snakefile
227
228
229
230
231
shell:
    """
    {GatherMolecularBarcodeDistributionByGene} MIN_NUM_GENES_PER_CELL=500 \
    I={input.gene_exon_tagged_clean} O={output.duplicateUMI}
    """
SnakeMake From line 227 of master/Snakefile
241
242
243
244
245
shell:
    """
    {BAMTagHistogram} TAG=XC \
    I={input.gene_exon_tagged_clean} O={output.cell_readcounts}
    """
SnakeMake From line 241 of master/Snakefile
258
259
260
261
262
shell:
    """
    {DigitalExpression} NUM_CORE_BARCODES={Number_Core_Barcodes} \
    INPUT={input.gene_exon_tagged_clean} OUTPUT={output.dge} SUMMARY={output.summary}
    """
SnakeMake From line 258 of master/Snakefile
ShowHide 15 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/darneson/DropSeq
Name: dropseq
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...