scRNA-Seq pipelines
Here we forge the tools to analyze single cell RNA-Seq experiments. The analysis workflow is based on the Bioconductor packages scater and scran as well as the Bioconductor workflows by Lun ATL, McCarthy DJ, & Marioni JC A step-by-step workflow for low-level analysis of single-cell RNA-seq data. F1000Res. 2016 Aug 31 [revised 2016 Oct 31];5:2122 and Amezquita RA, Lun ATL et al. Orchestrating Single-Cell Analysis with Bioconductor Nat Methods. 2020 Feb;17(2):137-145.
Pipeline Workflow
All analysis steps are illustrated in the pipeline flowchart . Specify desired analysis details for your data in the respective essential.vars.groovy file (see below) and run the selected pipeline marsseq.pipeline.groovy or smartsseq.pipeline.groovy as described here . The analysis allows further parameter fine-tuning subsequent the initial analysis e.g. for plotting and QC thresholding. Therefore, a customisable sc.report.Rmd file will be generated in the output reports folder after running the pipeline. Go through the steps and modify the default settings where appropriate. Subsequently, the sc.report.Rmd file can be converted to a final html report using the knitr R-package.
The pipelines includes:
- FastQC, MultiQC and other tools for rawdata quality control
- Adapter trimming with Cutadapt
- Mapping to the genome using STAR
- generation of bigWig tracks for visualisation of alignment
- Quantification with featureCounts (Subread) and UMI-tools (if UMIs are used for deduplication)
- Downstream analysis in R using a pre-designed markdown report file ( sc.report.Rmd ). Modify this file to fit your custom parameter and thresholds and render it to your final html report. The Rmd file uses, among others, the following tools and methods:
Code Snippets
20 21 22 23 24 25 | exec """ ${TOOL_ENV} && ${PREAMBLE} && bamCoverage $BAMCOVERAGE_FLAGS --bam $input -o ${output}; ""","bamCoverage" |
15 16 17 18 19 20 | exec """ ${TOOL_ENV} && ${PREAMBLE} && samtools index $input ""","BAMindexer" |
50 51 52 53 54 55 56 57 58 | exec """ ${TOOL_ENV} && ${PREAMBLE} && cutadapt $CUTADAPT_FLAGS $CUTADAPT_FLAGS_PAIRED $CUTADAPT_FLAGS_TOOLONG --too-short-output=\${TMP}/${SAMPLENAME_BASE}.cutadapt_discarded.fastq.gz --output=\${TMP}/${SAMPLENAME_BASE}.cutadapt.fastq.gz $input1 $CUTADAPT_INPUT_SECOND 1> ${CUTADAPT_STATSDIR}/${SAMPLENAME_BASE_PRUNED}.cutadapt.log && mv -t ${CUTADAPT_DISCARDED_DIR}/ \${TMP}/${SAMPLENAME_BASE_PRUNED}*cutadapt_discarded*.fastq.gz && mv -t $output.dir \${TMP}/${SAMPLENAME_BASE_PRUNED}*cutadapt.fastq.gz ""","Cutadapt" |
18 19 20 21 22 23 | exec """ ${TOOL_ENV} && ${PREAMBLE} && fastqc --extract $FASTQC_FLAGS -o $output.dir $inputs ""","FastQC" |
18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | exec """ ${TOOL_ENV} && ${PREAMBLE} && if [ ! -e "$output.prefix" ]; then mkdir $output.prefix; fi && fastqreference=${FastqScreen_vars.conf}; references=(\${fastqreference//,/ }); for i in "\${!references[@]}"; do reference=(\${references[i]//::/ }); echo -e "DATABASE\t\${reference[0]}\t\${reference[1]}" >> $output.prefix/fastqscreen.conf; done; fastq_screen $FASTQSCREEN_FLAGS --conf $output.prefix/fastqscreen.conf --outdir $output.prefix $inputs; touch $outputs ""","FastqScreen" |
14 15 16 17 18 19 | exec """ ${TOOL_ENV} && ${PREAMBLE} && bam dedup --in $input --out $output --log ${input.prefix}_dupmetrics.log --noPhoneHome ""","MarkDups2" |
16 17 18 19 20 21 | exec """ ${TOOL_ENV} && ${PREAMBLE} && multiqc $ESSENTIAL_PROJECT $MultiQC_FLAGS -o $output.dir ""","MULTIQC" |
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | exec """ ${TOOL_ENV} && ${PREAMBLE} && base=`basename $input` && bamdupmark=\${TMP}/\${base%.bam}.dupmarked.bam && bam dedup --in $input --out \${bamdupmark} --log ${output.dir}/\${base%.bam}_dupmetrics.log --noPhoneHome && samtools index \${bamdupmark} && if [[ "${dupRadar_vars.paired}" == "yes" ]]; then echo "We are resorting and doing the repair\n" && bamrepair=\${TMP}/\${base%.bam}.dupmarked.repair.bam && repair -i \${bamdupmark} -T ${dupRadar_vars.threads} -o \${bamrepair} && Rscript ${PIPELINE_ROOT}/tools/dupRadar/dupRadar.R bam=\${bamrepair} $DUPRADAR_FLAGS; else Rscript ${PIPELINE_ROOT}/tools/dupRadar/dupRadar.R bam=\${bamdupmark} $DUPRADAR_FLAGS; fi ""","dupRadar" |
22 23 24 25 26 27 28 29 30 31 | exec """ ${TOOL_ENV} && ${PREAMBLE} && if [[ ! -e "$output.dir" ]]; then mkdir -p "$output.dir"; fi && Rscript ${PIPELINE_ROOT}/tools/geneBodyCov/geneBodyCov.R bam=$input $GENEBODYCOV2_FLAGS ""","geneBodyCov2" |
19 20 21 22 23 24 | exec """ ${TOOL_ENV} && ${PREAMBLE} && infer_experiment.py -i $input $INFEREXPERIMENT_FLAGS > $output ""","inferexperiment" |
25 26 27 28 29 30 31 32 | exec """ ${TOOL_ENV} && ${PREAMBLE} && unset DISPLAY; echo $output.prefix; qualimap rnaseq -bam $input -outdir ${output.prefix}_qualimap -outformat html -gtf ${qualimap_vars.genesgtf} -oc $output -p ${qualimap_vars.protocol} ${qualimap_vars.extra} ""","qualimap" |
54 55 56 57 58 59 60 61 62 63 64 65 66 67 | exec """ ${TOOL_ENV} && ${PREAMBLE} && if [ -e \${TMP}/$OUTPUTFILE ]; then echo 'removing old STAR tmp folder'; rm -r \${TMP}/$OUTPUTFILE*; fi && STAR $STAR_FLAGS --outTmpDir \${TMP}/$OUTPUTFILE --readFilesIn $inputs | samtools view $SAMTOOLS_VIEW_FLAGS - | samtools sort $SAMTOOLS_SORT_FLAGS -T \${TMP}/${OUTPUTFILE}_sort - > $output1 && mv ${STAR_vars.logdir}/${OUTPUTFILE}SJ.out.tab $output.dir && ln -s ${STAR_vars.logdir}/${OUTPUTFILE}Log.final.out $output.dir ""","STAR" |
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | exec """ ${TOOL_ENV} && ${PREAMBLE} && base=`basename $input` && if [[ "${subread2rnatypes_vars.paired}" == "true" ]]; then echo "We are resorting and doing the repair\n" && repair -i $input -T ${subread2rnatypes_vars.threads} -o \${TMP}/\${base} && featureCounts $RNATYPES_FLAGS -o \${TMP}/\${base} \${TMP}/\${base} 2> ${output.prefix}_rnatypeslog.stderr; else featureCounts $RNATYPES_FLAGS -o \${TMP}/\${base} $input 2> ${output.prefix}_rnatypeslog.stderr; fi && cut -f1,6,7 \${TMP}/\${base} > $output && rm \${TMP}/\${base} ""","subread2rnatypes" |
22 23 24 25 26 27 28 29 30 | exec """ ${TOOL_ENV} && ${PREAMBLE} && umi_tools extract $umi_tools_FLAGS -I $input2 --stdout \${TMP}/\$(basename ${input2.prefix}).barcode.fastq.gz --read2-in $input1 --read2-out=\${TMP}/\$(basename ${OUTPUTFILE}).umibarcode.fastq.gz && rm \${TMP}/\$(basename ${input2.prefix}).barcode.fastq.gz && mv \${TMP}/\$(basename ${OUTPUTFILE}).umibarcode.fastq.gz $output ""","AddUMIBarcodeToFastq" |
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 | exec """ ${TOOL_ENV} && ${PREAMBLE} && bamarray=($inputs.bam) && echo "sample_id,molecule_h5" > aggr.csv && for file in "\${bamarray[@]}"; do echo \$(echo \$(basename \${file}) | sed 's#.bam##g'),\$(echo \${file} | sed 's#.bam#/outs/molecule_info.h5#g') >> aggr.csv; done && cellranger aggr --csv=aggr.csv $CELLRANGER_AGGR_FLAGS && mv $cellranger_aggr_vars.id $output.dir && touch $output ""","cellranger_aggr" |
31 32 33 34 35 36 37 38 39 40 | exec """ ${TOOL_ENV} && ${PREAMBLE} && cellranger count $CELLRANGER_FLAGS && mv ${SAMPLENAME_BASE}/outs/possorted_genome_bam.bam $output && mv ${SAMPLENAME_BASE}/outs/possorted_genome_bam.bam.bai ${output.dir}/${SAMPLENAME_BASE}.bam.bai && mv ${SAMPLENAME_BASE} $output.dir ""","cellranger_count" |
13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | exec """ ${PREAMBLE} && cp ${PIPELINE_ROOT}/tools/reports/shiny_scrnaseq_reporting_tool/server.R ${REPORTS} && cp ${PIPELINE_ROOT}/tools/reports/shiny_scrnaseq_reporting_tool/ui.R ${REPORTS} && cp ${PIPELINE_ROOT}/tools/reports/shiny_scrnaseq_reporting_tool/sc.shinyrep.helpers.R ${REPORTS} && cp ${PIPELINE_ROOT}/tools/reports/shiny_scrnaseq_reporting_tool/styles.css ${REPORTS} && cp ${PIPELINE_ROOT}/tools/reports/shiny_scrnaseq_reporting_tool/app.R ${REPORTS} && if [ -e "${REPORTS}/${shinyReports_vars.report}" ]; then echo "${shinyReports_vars.report} already exists. Older copy will be kept and not overwritten"; else cp ${PIPELINE_ROOT}/tools/reports/shiny_scrnaseq_reporting_tool/${shinyReports_vars.report} ${REPORTS}; if [ "${shinyReports_vars.seqtype}" == "tenXmultiome" ]; then cp ${PIPELINE_ROOT}/tools/reports/shiny_scrnaseq_reporting_tool/sc.report.Rmd ${REPORTS}; cp ${PIPELINE_ROOT}/tools/reports/shiny_scrnaseq_reporting_tool/scatac.report.Rmd ${REPORTS}; fi fi && PROJECT=\$(basename ${shinyReports_vars.project}) && sed -i "2,2s/SHINYREPS_PROJECT/\${PROJECT}/" ${REPORTS}/${shinyReports_vars.report} && echo "SHINYREPS_PROJECT=${shinyReports_vars.project}" > $output && echo "SHINYREPS_ORG=${shinyReports_vars.org}" >> $output && echo "SHINYREPS_DB=${shinyReports_vars.db}" >> $output && echo "SHINYREPS_LOG=${shinyReports_vars.log}" >> $output && echo "SHINYREPS_PAIRED=${shinyReports_vars.paired}" >> $output && echo "SHINYREPS_QC=${shinyReports_vars.qc}" >> $output && echo "SHINYREPS_RES=${shinyReports_vars.res}" >> $output && echo "SHINYREPS_CELLRANGERAGGR_ID=${shinyReports_vars.cellranger_aggr_id}" >> $output && echo "SHINYREPS_RUN_DEMUX=${shinyReports_vars.run_demux}" >> $output && echo "SHINYREPS_DEMUX_OUT=${shinyReports_vars.demux_out}" >> $output && echo "SHINYREPS_DEMUXCLUSTER_OUT=${shinyReports_vars.demuxCluster_out}" >> $output && echo "SHINYREPS_PREFIX=${shinyReports_vars.prefix}" >> $output && echo "SHINYREPS_STRANDEDNESS=${shinyReports_vars.strandedness}" >> $output && echo "SHINYREPS_STAR_LOG=${shinyReports_vars.star_log}" >> $output && echo "SHINYREPS_STAR_SUFFIX=${shinyReports_vars.star_suffix}" >> $output && echo "SHINYREPS_STARparms_SUFFIX=${shinyReports_vars.starparms_suffix}" >> $output && echo "SHINYREPS_FASTQC_OUT=${shinyReports_vars.fastqc_out}" >> $output && echo "SHINYREPS_FASTQC_LOG=${shinyReports_vars.fastqc_log}" >> $output && echo "SHINYREPS_FASTQC_SUMMARIZED=${shinyReports_vars.fastqc_summarized}" >> $output && echo "SHINYREPS_FASTQSCREEN_OUT=${shinyReports_vars.fastqscreen_out}" >> $output && echo "SHINYREPS_FASTQSCREEN_LOG=${shinyReports_vars.fastqscreen_log}" >> $output && echo "SHINYREPS_FASTQSCREEN_PERC=${shinyReports_vars.fastqscreen_perc}" >> $output && echo "SHINYREPS_BAMINDEX_LOG=${shinyReports_vars.bamindex_log}" >> $output && echo "SHINYREPS_DUPRADAR_LOG=${shinyReports_vars.dupradar_log}" >> $output && echo "SHINYREPS_RNATYPES_LOG=${shinyReports_vars.rnatypes_log}" >> $output && echo "SHINYREPS_RNATYPES=${shinyReports_vars.rnatypes}" >> $output && echo "SHINYREPS_RNATYPES_SUFFIX=${shinyReports_vars.rnatypes_suffix}" >> $output && echo "SHINYREPS_GENEBODYCOV_LOG=${shinyReports_vars.genebodycov_log}" >> $output && echo "SHINYREPS_BUSTARD=${shinyReports_vars.bustard}" >> $output && echo "SHINYREPS_SUBREAD=${shinyReports_vars.subread}" >> $output && echo "SHINYREPS_SUBREAD_SUFFIX=${shinyReports_vars.subread_suffix}" >> $output && echo "SHINYREPS_SUBREAD_LOG=${shinyReports_vars.subread_log}" >> $output && echo "SHINYREPS_UMICOUNT=${shinyReports_vars.umicount}" >> $output && echo "SHINYREPS_UMICOUNT_LOG=${shinyReports_vars.umicount_log}" >> $output && echo "SHINYREPS_BAM2BW_LOG=${shinyReports_vars.bam2bw_log}" >> $output && echo "SHINYREPS_MARKDUPS_LOG=${shinyReports_vars.markdups_log}" >> $output && echo "SHINYREPS_PLOTS_COLUMN=${shinyReports_vars.plots_column}" >> $output && echo "SHINYREPS_SORT_ALPHA=${shinyReports_vars.sort_alpha}" >> $output && echo "SHINYREPS_INFEREXPERIMENT_LOGS=${shinyReports_vars.inferexperiment_logs}" >> $output && echo "SHINYREPS_QUALIMAP_LOGS=${shinyReports_vars.qualimap_logs}" >> $output && echo "SHINYREPS_INSERTSIZE=${shinyReports_vars.insertsize}" >> $output && echo "SHINYREPS_TRACKHUB_DONE=${shinyReports_vars.trackhub_done}" >> $output && echo "SHINYREPS_TOOL_VERSIONS=${shinyReports_vars.tool_versions}" >> $output && echo "SHINYREPS_RUN_CUTADAPT=${shinyReports_vars.run_cutadapt}" >> $output && echo "SHINYREPS_CUTADAPT_STATS=${shinyReports_vars.cutadapt_stats}" >> $output && echo "SHINYREPS_GTF=${shinyReports_vars.gtf}" >> $output && echo "SHINYREPS_TARGET=${shinyReports_vars.target}" >> $output && echo "SHINYREPS_CONTRASTS=${shinyReports_vars.contrasts}" >> $output && echo "SHINYREPS_MTGENES=${shinyReports_vars.mtgenes}" >> $output && echo "SHINYREPS_SAMPLEPATTERN1=${shinyReports_vars.samplepattern1}" >> $output && echo "SHINYREPS_SAMPLEPATTERN2=${shinyReports_vars.samplepattern2}" >> $output && echo "SHINYREPS_COLORBYFACTOR=${shinyReports_vars.colorByFactor}" >> $output && echo "SHINYREPS_MAXNO=${shinyReports_vars.maxno}" >> $output && echo "SHINYREPS_SEQTYPE=${shinyReports_vars.seqtype}" >> $output && echo "SHINYREPS_TYPE_OF_THRESHOLD=${shinyReports_vars.type_of_threshold}" >> $output && echo "SHINYREPS_THRESHOLD_TOTAL_COUNTS_MIN=${shinyReports_vars.threshold_total_counts_min}" >> $output && echo "SHINYREPS_THRESHOLD_TOTAL_COUNTS_MAX=${shinyReports_vars.threshold_total_counts_max}" >> $output && echo "SHINYREPS_THRESHOLD_TOTAL_FEATURES_DETECTED=${shinyReports_vars.threshold_total_features_detected}" >> $output && echo "SHINYREPS_THRESHOLD_PCT_COUNTS_MT=${shinyReports_vars.threshold_pct_counts_Mt}" >> $output && echo "SHINYREPS_NMADS=${shinyReports_vars.nmads}" >> $output && echo "SHINYREPS_APPLY_QCFILTER_BY_FACTOR=${shinyReports_vars.apply_QCfilter_by_factor}" >> $output ""","shinyReports" |
24 25 26 27 28 29 30 31 32 | exec """ ${TOOL_ENV} && ${PREAMBLE} && base=`basename ${input}` && featureCounts $SUBREAD_FLAGS --Rpath \${TMP} --tmpDir \${TMP} -o $output2 $input 2> ${output1.prefix}_subreadlog.stderr && samtools sort -T \${TMP}/\${base}_tmp \${TMP}/\${base}.featureCounts.bam > $output1 && rm \${TMP}/\${base}* ""","subread_count" |
28 29 30 31 32 33 34 | exec """ ${TOOL_ENV} && ${PREAMBLE} && SAMPLENAME_BASE=\$(basename ${SAMPLENAME}) && umi_tools count $umicount_FLAGS -I $input -S $output -L ${umicount_LOGDIR}/\${SAMPLENAME_BASE}.umicount.log -E ${umicount_LOGDIR}/\${SAMPLENAME_BASE}.umicount.error ""","umicount" |
20 21 22 23 24 25 | exec """ ${TOOL_ENV} && ${PREAMBLE} && umi_tools dedup $umidedup_FLAGS -I $input -S $output --output-stats=${output.prefix}.stats ""","umidedup" |
Support
- Future updates
Related Workflows





