A single-cell RNAseq pipeline for 10X genomics data

public public 1yr ago Version: 2.3.2 0 bookmarks

Introduction

nf-core/scrnaseq is a bioinformatics best-practice analysis pipeline for processing 10x Genomics single-cell RNA-seq data.

This is a community effort in building a pipeline capable to support:

  • Alevin-Fry + AlevinQC

  • STARSolo

  • Kallisto + BUStools

  • Cellranger

  • UniverSC

Documentation

The nf-core/scrnaseq pipeline comes with documentation about the pipeline usage , parameters and output .

scrnaseq workflow

Usage

Note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv :

sample,fastq_1,fastq_2,expected_cells
pbmc8k,pbmc8k_S1_L007_R1_001.fastq.gz,pbmc8k_S1_L007_R2_001.fastq.gz,"10000"
pbmc8k,pbmc8k_S1_L008_R1_001.fastq.gz,pbmc8k_S1_L008_R2_001.fastq.gz,"10000"

Each row represents a fastq file (single-end) or a pair of fastq files (paired end).

Now, you can run the pipeline using:

nextflow run nf-core/scrnaseq \
 -profile <docker/singularity/.../institute> \
 --input samplesheet.csv \
 --genome_fasta GRCm38.p6.genome.chr19.fa \
 --gtf gencode.vM19.annotation.chr19.gtf \
 --protocol 10XV2 \
 --aligner <alevin/kallisto/star/cellranger/universc> \
 --outdir <OUTDIR>

Warning: Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters ; see docs .

For more details and further functionality, please refer to the usage documentation and the parameter documentation .

Decision Tree for users

The nf-core/scrnaseq pipeline features several paths to analyze your single cell data. Future additions will also be done soon, e.g. the addition of multi-ome analysis types. To aid users in analyzing their data, we have added a decision tree to help people decide on what type of analysis they want to run and how to choose appropriate parameters for that.

graph TD
 A[sc RNA] -->|alevin-fry| B(h5ad/seurat/mtx matrices)
 A[sc RNA] -->|CellRanger| B(h5ad/seurat/mtx matrices)
 A[sc RNA] -->|kbpython| B(h5ad/seurat/mtx matrices)
 A[sc RNA] -->|STARsolo| B(h5ad/seurat/mtx matrices)
 A[sc RNA] -->|Universc| B(h5ad/seurat/mtx matrices)

Options for the respective alignment method can be found here to choose between methods.

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation .

Credits

nf-core/scrnaseq was originally written by Bailey PJ, Botvinnik O, Marques de Almeida F, Gabernet G, Peltzer A, Sturm G.

We thank the following people for their extensive assistance in the development of this pipeline:

  • @heylf

  • @KevinMenden

  • @FloWuenne

  • @rob-p

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines .

For further information or help, don't hesitate to get in touch on the Slack #scrnaseq channel (you can join with this invite ).

Citations

If you use nf-core/scrnaseq for your analysis, please cite it using the following doi: 10.5281/zenodo.3568187

The basic benchmarks that were used as motivation for incorporating the three available modular workflows can be found in this publication .

We offer all three paths for the processing of scRNAseq data so it remains up to the user to decide which pipeline workflow is chosen for a particular analysis question.

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

Code Snippets

23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
"""
#!/usr/bin/env Rscript
require(alevinQC)
alevinFryQCReport(
    mapDir = "${alevin_results}/af_map",
    quantDir = "${alevin_results}/af_quant",
    permitDir= "${alevin_results}/af_quant",
    sampleId = "${prefix}",
    outputFile = "alevin_report_${meta.id}.html",
    outputFormat = "html_document",
    outputDir = "./",
    forceOverwrite = TRUE
)

yaml::write_yaml(
    list(
        '${task.process}'=list(
            'alevinqc' = paste(packageVersion('alevinQC'), collapse='.')
        )
    ),
    "versions.yml"
)
"""
20
21
22
23
24
25
"""
concat_h5ad.py \\
    --input $samplesheet \\
    --out combined_matrix.h5ad \\
    --suffix "_matrix.h5ad"
"""
28
29
30
"""
touch combined_matrix.h5ad
"""
30
31
32
33
"""
$unzip
cat $name | t2g.py --use_version > transcripts_to_genes.txt
"""
22
23
24
25
26
27
28
29
"""
gffread -F $gtf -w "${genome_fasta}.transcriptome.fa" -g $genome_fasta

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gffread: \$(gffread --version 2>&1)
END_VERSIONS
"""
22
23
24
25
26
27
28
29
30
31
"""
filter_gtf_for_genes_in_genome.py \\
    --gtf $gtf \\
    --fasta $fasta \\
    -o ${fasta.baseName}_genes.gtf
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""
45
46
47
48
49
50
51
52
"""
# convert file types
mtx_to_h5ad.py \\
    --aligner ${params.aligner} \\
    --input filtered_feature_bc_matrix.h5 \\
    --sample ${meta.id} \\
    --out ${meta.id}/${meta.id}_matrix.h5ad
"""
55
56
57
58
59
60
61
62
63
64
65
66
67
68
"""
# convert file types
for input_type in spliced unspliced ; do
    mtx_to_h5ad.py \\
        --aligner ${params.aligner} \\
        --sample ${meta.id} \\
        --input *count/counts_unfiltered/\${input_type}.mtx \\
        --barcode *count/counts_unfiltered/\${input_type}.barcodes.txt \\
        --feature *count/counts_unfiltered/\${input_type}.genes.txt \\
        --txp2gene ${txp2gene} \\
        --star_index ${star_index} \\
        --out ${meta.id}/${meta.id}_\${input_type}_matrix.h5ad ;
done
"""
71
72
73
74
75
76
77
78
79
80
81
82
83
"""
# convert file types
mtx_to_h5ad.py \\
    --task_process ${task.process} \\
    --aligner ${params.aligner} \\
    --sample ${meta.id} \\
    --input $mtx_matrix \\
    --barcode $barcodes_tsv \\
    --feature $features_tsv \\
    --txp2gene ${txp2gene} \\
    --star_index ${star_index} \\
    --out ${meta.id}/${meta.id}_matrix.h5ad
"""
86
87
88
89
90
"""
mkdir ${meta.id}
touch ${meta.id}/${meta.id}_matrix.h5ad
touch versions.yml
"""
39
40
41
"""
mkdir ${meta.id}
"""
44
45
46
47
48
49
50
51
52
53
54
"""
# convert file types
for input_type in spliced unspliced ; do
    mtx_to_seurat.R \\
        *count/counts_unfiltered/\${input_type}.mtx \\
        *count/counts_unfiltered/\${input_type}.barcodes.txt \\
        *count/counts_unfiltered/\${input_type}.genes.txt \\
        ${meta.id}/${meta.id}_\${input_type}_matrix.rds \\
        ${aligner}
done
"""
57
58
59
60
61
62
63
64
"""
mtx_to_seurat.R \\
    $matrix \\
    $barcodes \\
    $features \\
    ${meta.id}/${meta.id}_matrix.rds \\
    ${aligner}
"""
67
68
69
70
71
"""
mkdir ${meta.id}
touch ${meta.id}/${meta.id}_matrix.rds
touch versions.yml
"""
21
22
23
24
25
26
27
28
29
30
"""
check_samplesheet.py \\
    $samplesheet \\
    samplesheet.valid.csv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
"""
# export required var
export ALEVIN_FRY_HOME=.

# prep simpleaf
simpleaf set-paths

# run simpleaf index
simpleaf \\
    index \\
    --threads $task.cpus \\
    --fasta $genome_fasta \\
    $seq_inputs \\
    $args \\
    -o salmon

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    simpleaf: \$(simpleaf -V | tr -d '\\n' | cut -d ' ' -f 2)
    salmon: \$(salmon --version | sed -e "s/salmon //g")
END_VERSIONS
"""
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
"""
# export required var
export ALEVIN_FRY_HOME=.

# prep simpleaf
simpleaf set-paths

# run simpleaf quant
gzip -dcf $whitelist > whitelist.uncompressed.txt
simpleaf quant \\
    -1 ${forward.join( "," )} \\
    -2 ${reverse.join( "," )} \\
    -i ${index} \\
    -o ${prefix}_alevin_results \\
    -m $txp2gene \\
    -t $task.cpus \\
    -c $protocol \\
    $expect_cells \\
    $unfiltered_command \\
    $args

$save_whitelist
[[ ! -f ${prefix}_alevin_results/af_quant/all_freq.bin ]] && cp ${prefix}_alevin_results/af_quant/permit_freq.bin ${prefix}_alevin_results/af_quant/all_freq.bin

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    simpleaf: \$(simpleaf -V | tr -d '\\n' | cut -d ' ' -f 2)
    salmon: \$(salmon --version | sed -e "s/salmon //g")
END_VERSIONS
"""
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
"""
STAR \\
    --genomeDir $index \\
    --readFilesIn ${reverse.join( "," )} ${forward.join( "," )} \\
    --runThreadN $task.cpus \\
    --outFileNamePrefix $prefix. \\
    --soloCBwhitelist <(gzip -cdf $whitelist) \\
    --soloType $protocol \\
    --soloFeatures $star_feature \\
    $other_10x_parameters \\
    $out_sam_type \\
    $ignore_gtf \\
    $seq_center \\
    $cell_filter \\
    $args \\

$mv_unsorted_bam

if [ -f ${prefix}.Unmapped.out.mate1 ]; then
    mv ${prefix}.Unmapped.out.mate1 ${prefix}.unmapped_1.fastq
    gzip ${prefix}.unmapped_1.fastq
fi
if [ -f ${prefix}.Unmapped.out.mate2 ]; then
    mv ${prefix}.Unmapped.out.mate2 ${prefix}.unmapped_2.fastq
    gzip ${prefix}.unmapped_2.fastq
fi

if [ -d ${prefix}.Solo.out ]; then
    # Backslashes still need to be escaped (https://github.com/nextflow-io/nextflow/issues/67)
    find ${prefix}.Solo.out \\( -name "*.tsv" -o -name "*.mtx" \\) -exec gzip {} \\;
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
END_VERSIONS
"""
33
34
35
36
37
38
39
40
41
"""
mkdir -p "${prefix}/outs/"
touch ${prefix}/outs/fake_file.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cellranger: \$(echo \$( cellranger --version 2>&1) | sed 's/^.*[^0-9]\\([0-9]*\\.[0-9]*\\.[0-9]*\\).*\$/\\1/' )
END_VERSIONS
"""
NextFlow From line 33 of count/main.nf
24
25
26
27
28
29
30
31
32
33
34
35
"""
cellranger \\
    mkgtf \\
    $gtf \\
    ${prefix}.gtf \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cellranger: \$(echo \$( cellranger --version 2>&1) | sed 's/^.*[^0-9]\\([0-9]*\\.[0-9]*\\.[0-9]*\\).*\$/\\1/' )
END_VERSIONS
"""
NextFlow From line 24 of mkgtf/main.nf
25
26
27
28
29
30
31
32
33
34
35
36
37
"""
cellranger \\
    mkref \\
    --genome=$reference_name \\
    --fasta=$fasta \\
    --genes=$gtf \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cellranger: \$(echo \$( cellranger --version 2>&1) | sed 's/^.*[^0-9]\\([0-9]*\\.[0-9]*\\.[0-9]*\\).*\$/\\1/' )
END_VERSIONS
"""
NextFlow From line 25 of mkref/main.nf
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
"""
printf "%s %s\\n" $rename_to | while read old_name new_name; do
    [ -f "\${new_name}" ] || ln -s \$old_name \$new_name
done

fastqc \\
    $args \\
    --threads $task.cpus \\
    $renamed_files

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""
46
47
48
49
50
51
52
53
54
"""
touch ${prefix}.html
touch ${prefix}.zip

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""
23
24
25
26
27
28
29
30
31
32
"""
gffread \\
    $gff \\
    $args \\
    -o ${prefix}.gtf
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gffread: \$(gffread --version 2>&1)
END_VERSIONS
"""
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
"""
# Not calling gunzip itself because it creates files
# with the original group ownership rather than the
# default one for that user / the work directory
gzip \\
    -cd \\
    $args \\
    $archive \\
    > $gunzip

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//')
END_VERSIONS
"""
NextFlow From line 23 of gunzip/main.nf
41
42
43
44
45
46
47
"""
touch $gunzip
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//')
END_VERSIONS
"""
NextFlow From line 41 of gunzip/main.nf
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
"""
kb \\
    count \\
    -t $task.cpus \\
    -i $index \\
    -g $t2g \\
    $cdna \\
    $introns \\
    -x $technology \\
    $args \\
    -o ${prefix}.count \\
    ${reads.join( " " )} \\
    -m ${memory}G

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    kallistobustools: \$(echo \$(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*\$//')
END_VERSIONS
"""
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
"""
kb \\
    ref \\
    -i kb_ref_out.idx \\
    -g t2g.txt \\
    -f1 cdna.fa \\
    --workflow $workflow_mode \\
    $fasta \\
    $gtf

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    kallistobustools: \$(echo \$(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*\$//')
END_VERSIONS
"""
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
"""
kb \\
    ref \\
    -i kb_ref_out.idx \\
    -g t2g.txt \\
    -f1 cdna.fa \\
    -f2 intron.fa \\
    -c1 cdna_t2c.txt \\
    -c2 intron_t2c.txt \\
    --workflow $workflow_mode \\
    $fasta \\
    $gtf

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    kallistobustools: \$(echo \$(kb --version 2>&1) | sed 's/^.*kb_python //;s/positional arguments.*\$//')
END_VERSIONS
"""
28
29
30
31
32
33
34
35
36
37
38
39
40
"""
multiqc \\
    --force \\
    $args \\
    $config \\
    $extra_config \\
    .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" )
END_VERSIONS
"""
43
44
45
46
47
48
49
50
51
52
"""
touch multiqc_data
touch multiqc_plots
touch multiqc_report.html

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" )
END_VERSIONS
"""
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
"""
mkdir star
STAR \\
    --runMode genomeGenerate \\
    --genomeDir star/ \\
    --genomeFastaFiles $fasta \\
    --sjdbGTFfile $gtf \\
    --runThreadN $task.cpus \\
    $memory \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
"""
samtools faidx $fasta
NUM_BASES=`gawk '{sum = sum + \$2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' ${fasta}.fai`

mkdir star
STAR \\
    --runMode genomeGenerate \\
    --genomeDir star/ \\
    --genomeFastaFiles $fasta \\
    --sjdbGTFfile $gtf \\
    --runThreadN $task.cpus \\
    --genomeSAindexNbases \$NUM_BASES \\
    $memory \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
"""
mkdir star
touch star/Genome
touch star/Log.out
touch star/SA
touch star/SAindex
touch star/chrLength.txt
touch star/chrName.txt
touch star/chrNameLength.txt
touch star/chrStart.txt
touch star/exonGeTrInfo.tab
touch star/exonInfo.tab
touch star/geneInfo.tab
touch star/genomeParameters.txt
touch star/sjdbInfo.txt
touch star/sjdbList.fromGTF.out.tab
touch star/sjdbList.out.tab
touch star/transcriptInfo.tab

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
"""
universc \\
    --id 'sample-${meta.id}' \\
    ${input_reads} \\
    --technology '${meta.technology}' \\
    --chemistry '${meta.chemistry}' \\
    --reference ${reference_name} \\
    --description ${sample_arg} \\
    --jobmode "local" \\
    --localcores ${task.cpus} \\
    --localmem ${task.memory.toGiga()} \\
    --per-cell-data \\
    $args 1> _log 2> _err

# save log files
echo !! > sample-${meta.id}/outs/_invocation
cp _log sample-${meta.id}/outs/_log
cp _err sample-${meta.id}/outs/_err

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cellranger:  \$(echo \$(cellranger count --version 2>&1 | head -n 2 | tail -n 1 | sed 's/^.* //g' | sed 's/(//g' | sed 's/)//g' ))
    universc:  \$(echo \$(bash /universc/launch_universc.sh --version | grep version | grep universc  | sed 's/^.* //g' ))
END_VERSIONS
"""
66
67
68
69
70
71
72
73
74
75
"""
mkdir -p "sample-${meta.id}/outs/"
touch sample-${meta.id}/outs/fake_file.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cellranger:  \$(echo \$(cellranger count --version 2>&1 | head -n 2 | tail -n 1 | sed 's/^.* //g' | sed 's/(//g' | sed 's/)//g' ))
    universc:  \$(echo \$(bash /universc/launch_universc.sh --version | grep version | grep universc | sed 's/^.* //g' ))
END_VERSIONS
"""
ShowHide 26 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://nf-co.re/scrnaseq
Name: scrnaseq
Version: 2.3.2
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: None
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...