scRNA-seq MARS-seq pipeline

public public 1yr ago Version: Version 1 0 bookmarks

scRNA-Seq pipelines

Here we forge the tools to analyze single cell RNA-Seq experiments. The analysis workflow is based on the Bioconductor packages scater and scran as well as the Bioconductor workflows by Lun ATL, McCarthy DJ, & Marioni JC A step-by-step workflow for low-level analysis of single-cell RNA-seq data. F1000Res. 2016 Aug 31 [revised 2016 Oct 31];5:2122 and Amezquita RA, Lun ATL et al. Orchestrating Single-Cell Analysis with Bioconductor Nat Methods. 2020 Feb;17(2):137-145.

Pipeline Workflow

All analysis steps are illustrated in the pipeline flowchart . Specify desired analysis details for your data in the respective essential.vars.groovy file (see below) and run the selected pipeline marsseq.pipeline.groovy or smartsseq.pipeline.groovy as described here . The analysis allows further parameter fine-tuning subsequent the initial analysis e.g. for plotting and QC thresholding. Therefore, a customisable sc.report.Rmd file will be generated in the output reports folder after running the pipeline. Go through the steps and modify the default settings where appropriate. Subsequently, the sc.report.Rmd file can be converted to a final html report using the knitr R-package.

The pipelines includes:

  • FastQC, MultiQC and other tools for rawdata quality control
  • Adapter trimming with Cutadapt
  • Mapping to the genome using STAR
  • generation of bigWig tracks for visualisation of alignment
  • Quantification with featureCounts (Subread) and UMI-tools (if UMIs are used for deduplication)
  • Downstream analysis in R using a pre-designed markdown report file ( sc.report.Rmd ). Modify this file to fit your custom parameter and thresholds and render it to your final html report. The Rmd file uses, among others, the following tools and methods:
    • QC: the scater package.
    • Normalization: the scran package.
    • Differential expression analysis: the scde package.
    • Trajectory analysis (pseudotime): the monocle package.

Code Snippets

20
21
22
23
24
25
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    bamCoverage $BAMCOVERAGE_FLAGS --bam $input -o ${output};
""","bamCoverage"
15
16
17
18
19
20
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    samtools index $input
""","BAMindexer"
50
51
52
53
54
55
56
57
58
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    cutadapt $CUTADAPT_FLAGS $CUTADAPT_FLAGS_PAIRED $CUTADAPT_FLAGS_TOOLONG --too-short-output=\${TMP}/${SAMPLENAME_BASE}.cutadapt_discarded.fastq.gz --output=\${TMP}/${SAMPLENAME_BASE}.cutadapt.fastq.gz $input1 $CUTADAPT_INPUT_SECOND 1> ${CUTADAPT_STATSDIR}/${SAMPLENAME_BASE_PRUNED}.cutadapt.log &&

    mv -t ${CUTADAPT_DISCARDED_DIR}/ \${TMP}/${SAMPLENAME_BASE_PRUNED}*cutadapt_discarded*.fastq.gz &&
    mv -t $output.dir \${TMP}/${SAMPLENAME_BASE_PRUNED}*cutadapt.fastq.gz
""","Cutadapt"
18
19
20
21
22
23
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    fastqc --extract $FASTQC_FLAGS -o $output.dir $inputs
""","FastQC"
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    if [ ! -e "$output.prefix" ]; then
        mkdir $output.prefix;
    fi &&
    fastqreference=${FastqScreen_vars.conf};
    references=(\${fastqreference//,/ });
    for i in "\${!references[@]}"; do
        reference=(\${references[i]//::/ });
        echo -e "DATABASE\t\${reference[0]}\t\${reference[1]}" >> $output.prefix/fastqscreen.conf;
    done;
    fastq_screen $FASTQSCREEN_FLAGS --conf $output.prefix/fastqscreen.conf --outdir $output.prefix $inputs;
    touch $outputs
""","FastqScreen"
14
15
16
17
18
19
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    bam dedup --in $input --out $output --log ${input.prefix}_dupmetrics.log --noPhoneHome
""","MarkDups2"
16
17
18
19
20
21
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    multiqc $ESSENTIAL_PROJECT $MultiQC_FLAGS -o $output.dir
""","MULTIQC"
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    base=`basename $input` &&
    bamdupmark=\${TMP}/\${base%.bam}.dupmarked.bam &&

    bam dedup --in $input --out \${bamdupmark} --log ${output.dir}/\${base%.bam}_dupmetrics.log --noPhoneHome &&
    samtools index \${bamdupmark} &&

    if [[ "${dupRadar_vars.paired}" == "yes" ]]; then
        echo "We are resorting and doing the repair\n" &&
        bamrepair=\${TMP}/\${base%.bam}.dupmarked.repair.bam &&
        repair -i \${bamdupmark} -T ${dupRadar_vars.threads} -o \${bamrepair} &&
        Rscript ${PIPELINE_ROOT}/tools/dupRadar/dupRadar.R bam=\${bamrepair} $DUPRADAR_FLAGS;
    else
        Rscript ${PIPELINE_ROOT}/tools/dupRadar/dupRadar.R bam=\${bamdupmark} $DUPRADAR_FLAGS;
    fi
""","dupRadar"
22
23
24
25
26
27
28
29
30
31
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    if [[ ! -e "$output.dir" ]]; then
        mkdir -p "$output.dir";
    fi &&

    Rscript ${PIPELINE_ROOT}/tools/geneBodyCov/geneBodyCov.R bam=$input $GENEBODYCOV2_FLAGS
""","geneBodyCov2"
19
20
21
22
23
24
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    infer_experiment.py -i $input $INFEREXPERIMENT_FLAGS > $output
""","inferexperiment"
25
26
27
28
29
30
31
32
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    unset DISPLAY;
    echo $output.prefix;
    qualimap rnaseq -bam $input -outdir ${output.prefix}_qualimap -outformat html -gtf ${qualimap_vars.genesgtf} -oc $output -p ${qualimap_vars.protocol} ${qualimap_vars.extra}
""","qualimap"
54
55
56
57
58
59
60
61
62
63
64
65
66
67
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    if [ -e \${TMP}/$OUTPUTFILE ]; then
        echo 'removing old STAR tmp folder';
        rm -r \${TMP}/$OUTPUTFILE*;
    fi &&

    STAR $STAR_FLAGS --outTmpDir \${TMP}/$OUTPUTFILE --readFilesIn $inputs | samtools view $SAMTOOLS_VIEW_FLAGS - | samtools sort $SAMTOOLS_SORT_FLAGS -T \${TMP}/${OUTPUTFILE}_sort - > $output1 &&

    mv ${STAR_vars.logdir}/${OUTPUTFILE}SJ.out.tab $output.dir &&
    ln -s ${STAR_vars.logdir}/${OUTPUTFILE}Log.final.out $output.dir
""","STAR"
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    base=`basename $input` &&
    if [[ "${subread2rnatypes_vars.paired}" == "true" ]]; then            
        echo "We are resorting and doing the repair\n" &&
        repair -i $input -T ${subread2rnatypes_vars.threads} -o \${TMP}/\${base} &&
        featureCounts $RNATYPES_FLAGS -o \${TMP}/\${base} \${TMP}/\${base} 2> ${output.prefix}_rnatypeslog.stderr;
    else
        featureCounts $RNATYPES_FLAGS -o \${TMP}/\${base} $input 2> ${output.prefix}_rnatypeslog.stderr;
    fi &&
    cut -f1,6,7 \${TMP}/\${base} > $output &&
    rm \${TMP}/\${base}
""","subread2rnatypes"
22
23
24
25
26
27
28
29
30
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    umi_tools extract $umi_tools_FLAGS -I $input2 --stdout \${TMP}/\$(basename ${input2.prefix}).barcode.fastq.gz --read2-in $input1 --read2-out=\${TMP}/\$(basename ${OUTPUTFILE}).umibarcode.fastq.gz &&

    rm \${TMP}/\$(basename ${input2.prefix}).barcode.fastq.gz &&
    mv \${TMP}/\$(basename ${OUTPUTFILE}).umibarcode.fastq.gz $output
""","AddUMIBarcodeToFastq"
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    bamarray=($inputs.bam) &&
    echo "sample_id,molecule_h5" > aggr.csv &&
    for file in "\${bamarray[@]}"; do
     echo \$(echo \$(basename \${file}) | sed 's#.bam##g'),\$(echo \${file} | sed 's#.bam#/outs/molecule_info.h5#g') >> aggr.csv;
    done &&

    cellranger aggr --csv=aggr.csv $CELLRANGER_AGGR_FLAGS &&
    mv $cellranger_aggr_vars.id $output.dir &&
    touch $output

""","cellranger_aggr"
31
32
33
34
35
36
37
38
39
40
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    cellranger count $CELLRANGER_FLAGS &&
    mv ${SAMPLENAME_BASE}/outs/possorted_genome_bam.bam $output && 
    mv ${SAMPLENAME_BASE}/outs/possorted_genome_bam.bam.bai ${output.dir}/${SAMPLENAME_BASE}.bam.bai &&
    mv ${SAMPLENAME_BASE} $output.dir 

""","cellranger_count"
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
exec """
    ${PREAMBLE} &&

    cp ${PIPELINE_ROOT}/tools/reports/shiny_scrnaseq_reporting_tool/server.R ${REPORTS}                &&
    cp ${PIPELINE_ROOT}/tools/reports/shiny_scrnaseq_reporting_tool/ui.R ${REPORTS}                    &&
    cp ${PIPELINE_ROOT}/tools/reports/shiny_scrnaseq_reporting_tool/sc.shinyrep.helpers.R ${REPORTS}   &&
    cp ${PIPELINE_ROOT}/tools/reports/shiny_scrnaseq_reporting_tool/styles.css ${REPORTS}              &&
    cp ${PIPELINE_ROOT}/tools/reports/shiny_scrnaseq_reporting_tool/app.R ${REPORTS}                   &&

    if [ -e "${REPORTS}/${shinyReports_vars.report}" ]; then
        echo "${shinyReports_vars.report} already exists. Older copy will be kept and not overwritten";
    else
        cp ${PIPELINE_ROOT}/tools/reports/shiny_scrnaseq_reporting_tool/${shinyReports_vars.report} ${REPORTS};
        if [ "${shinyReports_vars.seqtype}" == "tenXmultiome" ]; then
            cp ${PIPELINE_ROOT}/tools/reports/shiny_scrnaseq_reporting_tool/sc.report.Rmd ${REPORTS};
            cp ${PIPELINE_ROOT}/tools/reports/shiny_scrnaseq_reporting_tool/scatac.report.Rmd ${REPORTS};
        fi
    fi &&

    PROJECT=\$(basename ${shinyReports_vars.project})                     &&
    sed -i "2,2s/SHINYREPS_PROJECT/\${PROJECT}/" ${REPORTS}/${shinyReports_vars.report} &&

    echo "SHINYREPS_PROJECT=${shinyReports_vars.project}" >  $output &&
    echo "SHINYREPS_ORG=${shinyReports_vars.org}"         >> $output &&
    echo "SHINYREPS_DB=${shinyReports_vars.db}"           >> $output &&
    echo "SHINYREPS_LOG=${shinyReports_vars.log}"         >> $output &&
    echo "SHINYREPS_PAIRED=${shinyReports_vars.paired}"   >> $output &&
    echo "SHINYREPS_QC=${shinyReports_vars.qc}"           >> $output &&
    echo "SHINYREPS_RES=${shinyReports_vars.res}"         >> $output &&
    echo "SHINYREPS_CELLRANGERAGGR_ID=${shinyReports_vars.cellranger_aggr_id}"  >> $output &&
    echo "SHINYREPS_RUN_DEMUX=${shinyReports_vars.run_demux}" >> $output &&
    echo "SHINYREPS_DEMUX_OUT=${shinyReports_vars.demux_out}" >> $output &&
    echo "SHINYREPS_DEMUXCLUSTER_OUT=${shinyReports_vars.demuxCluster_out}" >> $output &&
    echo "SHINYREPS_PREFIX=${shinyReports_vars.prefix}"   >> $output &&
    echo "SHINYREPS_STRANDEDNESS=${shinyReports_vars.strandedness}" >> $output &&
    echo "SHINYREPS_STAR_LOG=${shinyReports_vars.star_log}"       >> $output &&
    echo "SHINYREPS_STAR_SUFFIX=${shinyReports_vars.star_suffix}" >> $output &&
    echo "SHINYREPS_STARparms_SUFFIX=${shinyReports_vars.starparms_suffix}" >> $output &&
    echo "SHINYREPS_FASTQC_OUT=${shinyReports_vars.fastqc_out}"     >> $output &&
    echo "SHINYREPS_FASTQC_LOG=${shinyReports_vars.fastqc_log}"     >> $output &&
    echo "SHINYREPS_FASTQC_SUMMARIZED=${shinyReports_vars.fastqc_summarized}" >> $output &&
    echo "SHINYREPS_FASTQSCREEN_OUT=${shinyReports_vars.fastqscreen_out}" >> $output &&
    echo "SHINYREPS_FASTQSCREEN_LOG=${shinyReports_vars.fastqscreen_log}" >> $output &&
    echo "SHINYREPS_FASTQSCREEN_PERC=${shinyReports_vars.fastqscreen_perc}" >> $output &&
    echo "SHINYREPS_BAMINDEX_LOG=${shinyReports_vars.bamindex_log}" >> $output &&
    echo "SHINYREPS_DUPRADAR_LOG=${shinyReports_vars.dupradar_log}" >> $output &&
    echo "SHINYREPS_RNATYPES_LOG=${shinyReports_vars.rnatypes_log}" >> $output &&
    echo "SHINYREPS_RNATYPES=${shinyReports_vars.rnatypes}" >> $output &&
    echo "SHINYREPS_RNATYPES_SUFFIX=${shinyReports_vars.rnatypes_suffix}" >> $output &&
    echo "SHINYREPS_GENEBODYCOV_LOG=${shinyReports_vars.genebodycov_log}" >> $output &&
    echo "SHINYREPS_BUSTARD=${shinyReports_vars.bustard}"         >> $output &&
    echo "SHINYREPS_SUBREAD=${shinyReports_vars.subread}"         >> $output &&
    echo "SHINYREPS_SUBREAD_SUFFIX=${shinyReports_vars.subread_suffix}" >> $output &&
    echo "SHINYREPS_SUBREAD_LOG=${shinyReports_vars.subread_log}"       >> $output &&
    echo "SHINYREPS_UMICOUNT=${shinyReports_vars.umicount}"             >> $output &&
    echo "SHINYREPS_UMICOUNT_LOG=${shinyReports_vars.umicount_log}"     >> $output &&
    echo "SHINYREPS_BAM2BW_LOG=${shinyReports_vars.bam2bw_log}"         >> $output &&
    echo "SHINYREPS_MARKDUPS_LOG=${shinyReports_vars.markdups_log}"     >> $output &&
    echo "SHINYREPS_PLOTS_COLUMN=${shinyReports_vars.plots_column}"     >> $output &&
    echo "SHINYREPS_SORT_ALPHA=${shinyReports_vars.sort_alpha}"         >> $output &&
    echo "SHINYREPS_INFEREXPERIMENT_LOGS=${shinyReports_vars.inferexperiment_logs}" >> $output &&
    echo "SHINYREPS_QUALIMAP_LOGS=${shinyReports_vars.qualimap_logs}" >> $output &&
    echo "SHINYREPS_INSERTSIZE=${shinyReports_vars.insertsize}" >> $output &&
    echo "SHINYREPS_TRACKHUB_DONE=${shinyReports_vars.trackhub_done}" >> $output &&
    echo "SHINYREPS_TOOL_VERSIONS=${shinyReports_vars.tool_versions}" >> $output &&
    echo "SHINYREPS_RUN_CUTADAPT=${shinyReports_vars.run_cutadapt}"  >> $output &&
    echo "SHINYREPS_CUTADAPT_STATS=${shinyReports_vars.cutadapt_stats}" >> $output &&
    echo "SHINYREPS_GTF=${shinyReports_vars.gtf}"         >> $output &&
    echo "SHINYREPS_TARGET=${shinyReports_vars.target}"   >> $output &&
    echo "SHINYREPS_CONTRASTS=${shinyReports_vars.contrasts}"   >> $output &&
    echo "SHINYREPS_MTGENES=${shinyReports_vars.mtgenes}" >> $output &&
    echo "SHINYREPS_SAMPLEPATTERN1=${shinyReports_vars.samplepattern1}" >> $output &&
    echo "SHINYREPS_SAMPLEPATTERN2=${shinyReports_vars.samplepattern2}" >> $output &&
    echo "SHINYREPS_COLORBYFACTOR=${shinyReports_vars.colorByFactor}" >> $output &&
    echo "SHINYREPS_MAXNO=${shinyReports_vars.maxno}" >> $output &&
    echo "SHINYREPS_SEQTYPE=${shinyReports_vars.seqtype}" >> $output &&
    echo "SHINYREPS_TYPE_OF_THRESHOLD=${shinyReports_vars.type_of_threshold}" >> $output &&
    echo "SHINYREPS_THRESHOLD_TOTAL_COUNTS_MIN=${shinyReports_vars.threshold_total_counts_min}" >> $output &&
    echo "SHINYREPS_THRESHOLD_TOTAL_COUNTS_MAX=${shinyReports_vars.threshold_total_counts_max}" >> $output &&
    echo "SHINYREPS_THRESHOLD_TOTAL_FEATURES_DETECTED=${shinyReports_vars.threshold_total_features_detected}" >> $output &&
    echo "SHINYREPS_THRESHOLD_PCT_COUNTS_MT=${shinyReports_vars.threshold_pct_counts_Mt}" >> $output &&
    echo "SHINYREPS_NMADS=${shinyReports_vars.nmads}" >> $output &&
    echo "SHINYREPS_APPLY_QCFILTER_BY_FACTOR=${shinyReports_vars.apply_QCfilter_by_factor}" >> $output
""","shinyReports"
24
25
26
27
28
29
30
31
32
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    base=`basename ${input}` &&
    featureCounts $SUBREAD_FLAGS --Rpath \${TMP} --tmpDir \${TMP} -o $output2 $input 2> ${output1.prefix}_subreadlog.stderr &&
    samtools sort -T \${TMP}/\${base}_tmp \${TMP}/\${base}.featureCounts.bam > $output1 &&
    rm \${TMP}/\${base}*
""","subread_count"
28
29
30
31
32
33
34
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    SAMPLENAME_BASE=\$(basename ${SAMPLENAME}) &&
    umi_tools count $umicount_FLAGS -I $input -S $output -L ${umicount_LOGDIR}/\${SAMPLENAME_BASE}.umicount.log -E ${umicount_LOGDIR}/\${SAMPLENAME_BASE}.umicount.error 
""","umicount"
20
21
22
23
24
25
exec """
    ${TOOL_ENV} &&
    ${PREAMBLE} &&

    umi_tools dedup $umidedup_FLAGS -I $input -S $output --output-stats=${output.prefix}.stats
""","umidedup"
ShowHide 11 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://gitlab.rlp.net/imbforge/NGSpipe2go/-/tree/master/pipelines/scRNAseq
Name: scrna-seq-mars-seq
Version: Version 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...