MARS-seq v2 pre-processing pipeline with velocity

public 1yr ago Version: 1.0.1 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Introduction

nf-core/marsseq is a bioinformatics single-cell preprocessing pipeline for MARS-seq v2.0 experiments. MARS-seq is a plate-based technique that can be combined with FACS in order to study rare populations of cells. On top of the pre-existing pipeline, we have developed an RNA velocity workflow that can be used to study cell dynamics using StarSolo . We do so by converting the raw FASTQ reads into 10X v2 format.

Workflow

Usage

Note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

To run the pipeline you have create experiment metadata files:

and samplesheet ( samplesheet.csv ). We provide test example here .

Next, you have to generate genome references to incorporate ERCC spike-ins. References are downloaded from GENCODE database.

nextflow run nf-core/marsseq \
 -profile <docker/singularity/.../institute> \
 --genome <mm10,mm9,GRCh38_v43> \
 --build_references \
 --input samplsheet.csv \
 --outdir <OUTDIR>

Now, you can run the pipeline using:

nextflow run nf-core/marsseq \
 -profile <docker/singularity/.../institute> \
 --genome <mm10,mm9,GRCh38_v43> \
 --input samplesheet.csv \
 --outdir <OUTDIR>

Warning: Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters ; see docs .

For more details and further functionality, please refer to the usage documentation and the parameter documentation .

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation .

Credits

nf-core/marsseq was originally written by Martin Proks .

We thank the following people for their extensive assistance in the development of this pipeline:

Jose Alejandro Romero Herrera ( @joseale2310 )
Maxime Garcia ( @maxulysse )

Keren-Shaul, H., Kenigsberg, E., Jaitin, D.A. et al. MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing. Nat Protoc 14, 1841–1862 (2019). https://doi.org/10.1038/s41596-019-0164-4

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines .

For further information or help, don't hesitate to get in touch on the Slack #marsseq channel (you can join with this invite ).

Citations

If you use nf-core/marsseq for your analysis, please cite it using the following doi: 10.5281/zenodo.8063539

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x .

Code Snippets

"""
cut -f1-9,12- $read > $filename

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cut: \$( cut --version 2>&1 | sed -n 1p | sed 's/cut (GNU coreutils) //g' )
END_VERSIONS
"""

NextFlow From line 25 of sam/main.nf

"""
touch ${filename}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cut: \$( cut --version 2>&1 | sed -n 1p | sed 's/cut (GNU coreutils) //g' )
END_VERSIONS
"""

NextFlow From line 36 of sam/main.nf

"""
mkdir -p output/umi.tab/
mkdir -p output/offset.tab/
mkdir -p output/singleton_offset.tab/
mkdir -p output/QC/read_stats/
mkdir -p output/QC/read_stats_amp_batch/
mkdir -p output/QC/umi_stats/
mkdir -p output/QC/noffsets_per_umi_distrib/
mkdir -p output/QC/nreads_per_umi_distrib/
mkdir -p output/QC/umi_nuc_per_pos/
mkdir -p _debug/${meta.amp_batch}/

demultiplex.pl \\
    ${meta.amp_batch} \\
    ${meta.pool_barcode} \\
    $wells_cells \\
    $gene_intervals \\
    $spike_seq \\
    $oligos \\
    $read \\
    . \\
    $args

mv _debug output/
ln -s output output_tmp

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    demultiplex.pl: \$( demultiplex.pl --version )
END_VERSIONS
"""

NextFlow demultiplexer From line 40 of demultiplex/main.nf

"""
mkdir -p output/umi.tab/
mkdir -p output/offset.tab/
mkdir -p output/singleton_offset.tab/
mkdir -p output/QC/{read_stats,read_stats_amp_batch,umi_stats,noffsets_per_umi_distrib,nreads_per_umi_distrib,umi_nuc_per_pos}

touch output/umi.tab/${meta.amp_batch}.txt
touch output/offset.tab/${meta.amp_batch}.txt
touch output/singleton_offset.tab/${meta.amp_batch}.txt
touch output/QC/{read_stats,read_stats_amp_batch,umi_stats,noffsets_per_umi_distrib,nreads_per_umi_distrib,umi_nuc_per_pos}/${meta.amp_batch}.txt

mkdir -p output/_debug/${meta.amp_batch}/
touch output/_debug/${meta.amp_batch}/{offsets,UMIs}.txt

ln -s output output_tmp

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    demultiplex.pl: \$( demultiplex.pl --version )
END_VERSIONS
"""

NextFlow demultiplexer From line 73 of demultiplex/main.nf

"""
create_ercc_fasta.py --input $spikeins --output ercc.fa

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    ERCC_CREATE: \$( create_ercc_fasta.py --version )
END_VERSIONS
"""

NextFlow From line 22 of ercc/main.nf

"""
touch ercc.fa

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    ERCC_CREATE: \$( create_ercc_fasta.py --version )
END_VERSIONS
"""

NextFlow From line 32 of ercc/main.nf

"""
gunzip -f $reads
mkdir labeled_reads

extract_labels.pl \\
    $r1 \\
    $r2 \\
    $meta.id \\
    $seq_batches \\
    $oligos \\
    $amp_batches \\
    labeled_reads/$r1 \\
    labeled_reads/$qc \\
    .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    extract_labels.pl: \$( extract_labels.pl --version )
END_VERSIONS
"""

NextFlow From line 30 of extract/main.nf

"""
mkdir labeled_reads
touch labeled_reads/${r1}
touch labeled_reads/${qc}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    extract_labels.pl: \$( extract_labels.pl --version )
END_VERSIONS
"""

NextFlow From line 55 of extract/main.nf

"""
mkdir raw_reads/
fastp \
    -i ${reads[0]} \\
    -I ${reads[1]} \\
    -o raw_reads/${reads[0]} \\
    -O raw_reads/${reads[1]} \\
    --thread $task.cpus \\
    --disable_quality_filtering \\
    --disable_length_filtering \\
    --disable_adapter_trimming \\
    --disable_trim_poly_g \\
    --json ${meta.id}.fastp.json \\
    $args \\
    2> raw_reads/${meta.id}.fastp.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastp: \$(fastp --version 2>&1 | sed -e "s/fastp //g")
END_VERSIONS
"""

NextFlow fastp From line 30 of split/main.nf

"""
touch ${meta.id}.fastp.json
mkdir raw_reads/
touch raw_reads/000{1..3}.${reads[0]}
touch raw_reads/000{1..3}.${reads[1]}
touch raw_reads/${meta.id}.fastp.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastp: \$(fastp --version 2>&1 | sed -e "s/fastp //g")
END_VERSIONS
"""

NextFlow fastp From line 53 of split/main.nf

"""
prepare_pipeline.py \\
    --batch ${meta.id} \\
    --amp_batches $amp_batches \\
    --seq_batches $seq_batches \\
    --well_cells $well_cells \\
    --gtf $gtf \\
    --output .
cat $ercc_regions >> gene_intervals.txt
validate_data.py --input .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    prepare_pipeline.py: \$( prepare_pipeline.py --version )
    validate_data.py: \$( validate_data.py --version )
END_VERSIONS
"""

NextFlow From line 38 of prepare/main.nf

"""
cat <<AMP_BATCH > amp_batches.txt
Amp_batch_ID\tSeq_batch_ID\tPool_barcode\tSpike_type\tSpike_dilution\tSpike_volume_ul\tExperiment_ID\tOwner\tDescription
AB339\tSB26\tTGAT\tERCC_mix1\t2.5e-05\t0.01\tTECH_ES\tHadas\tES#7_poolA
AMP_BATCH

cat <<SEQ_BATCHES > seq_batches.txt
Seq_batch_ID\tRun_name\tDate\tR1_design\tI5_design\tR2_design\tNotes
SB26\tsc_v3_Hadas_Diego_05042015\t150405\t5I.4P.51M\t7W.8R\t\tmm10
SEQ_BATCHES

cat <<WELLS_CELLS > wells_cells.txt
Well_ID\tWell_coordinates\tplate_ID\tSubject_ID\tAmp_batch_ID\tCell_barcode\tNumber_of_cells
TW1\tA1\t154\t35\tAB339\tCTATTCG\t1
WELLS_CELLS

cat <<GENE_INTERVALS > gene_intervals.txt
chrom\tstart\tend\tstrand\tgene_name
chr1\t3143476\t3144545\t1\t4933401J01Rik
GENE_INTERVALS

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    prepare_pipeline.py: \$( prepare_pipeline.py --version )
    validate_data.py: \$( validate_data.py --version )
END_VERSIONS
"""

NextFlow From line 57 of prepare/main.nf

"""
qc_align.r \\
    $sam \\
    $labeled_qc

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qc_align.r: \$( qc_align.r --version )
END_VERSIONS
"""

NextFlow From line 26 of align/main.nf

"""
touch _${labeled_qc.baseName}.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qc_align.r: \$( qc_align.r --version )
END_VERSIONS
"""

NextFlow From line 38 of align/main.nf

"""
mkdir report_per_amp_batch/ rd/

qc_batch.r \\
    $meta.amp_batch \\
    $wells_cells \\
    $amp_batches \\
    $seq_batches \\
    $spike_concentrations \\
    $folder

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qc_batch.r: \$( qc_batch.r --version )
END_VERSIONS
"""

NextFlow From line 25 of batch/main.nf

"""
mkdir report_per_amp_batch/ rd/
touch report_per_amp_batch/${meta.amp_batch}.pdf
touch rd/${meta.amp_batch}.rd

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qc_batch.r: \$( qc_batch.r --version )
END_VERSIONS
"""

NextFlow From line 43 of batch/main.nf

"""
mkdir -p output/QC_reports
mkdir _temp/

export TMPDIR=/tmp
qc_report.r \\
    $wells_cells \\
    $amp_batches \\
    .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qc_report.r: \$( qc_report.r --version )
END_VERSIONS
"""

NextFlow From line 29 of report/main.nf

"""
touch amp_batches_summary.txt
touch amp_batches_stats.txt
mkdir -p output/QC_reports/
touch output/QC_reports/qc_${meta.id}.pdf

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qc_report.r: \$( qc_report.r --version )
END_VERSIONS
"""

NextFlow From line 46 of report/main.nf

"""
check_samplesheet.py \\
    $samplesheet \\
    samplesheet.valid.csv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""

NextFlow From line 21 of local/samplesheet_check.nf

"""
velocity.py convert \\
    --input $reads \\
    --output _temp/ \\
    --threads $task.cpus

for f in _temp/*R1*.fastq.gz; do cat \$f >> Undetermined_S0_R1_001.fastq.gz; done
for f in _temp/*R2*.fastq.gz; do cat \$f >> Undetermined_S0_R2_001.fastq.gz; done

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    velocity.py: \$( velocity.py --version )
END_VERSIONS
"""

NextFlow From line 24 of convert/main.nf

"""
touch Undetermined_S0_R{1,2}_001.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    velocity.py: \$( velocity.py --version )
END_VERSIONS
"""

NextFlow From line 40 of convert/main.nf

"""
velocity.py whitelist \\
    --batch $meta.id \\
    --amp_batches $amp_batches \\
    --well_cells $well_cells

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    velocity.py: \$( velocity.py --version )
END_VERSIONS
"""

NextFlow From line 26 of whitelist/main.nf

"""
touch whitelist.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    velocity.py: \$( velocity.py --version )
END_VERSIONS
"""

NextFlow From line 39 of whitelist/main.nf

"""
wget $args $url -O $outfile

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    wget: \$(echo wget -V 2>&1 | grep "GNU Wget" | cut -d" " -f3 > versions.yml)
END_VERSIONS
"""

NextFlow From line 27 of wget/main.nf

"""
touch ${outfile}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    wget: \$(echo wget -V 2>&1 | grep "GNU Wget" | cut -d" " -f3 > versions.yml)
END_VERSIONS
"""

NextFlow From line 42 of wget/main.nf

"""
INDEX=`find -L ./ -name "*.rev.1.bt2" | sed "s/\\.rev.1.bt2\$//"`
[ -z "\$INDEX" ] && INDEX=`find -L ./ -name "*.rev.1.bt2l" | sed "s/\\.rev.1.bt2l\$//"`
[ -z "\$INDEX" ] && echo "Bowtie2 index files not found" 1>&2 && exit 1

bowtie2 \\
    -x \$INDEX \\
    $reads_args \\
    --threads $task.cpus \\
    $unaligned \\
    $args \\
    2> ${prefix}.bowtie2.log \\
    | samtools $samtools_command $args2 --threads $task.cpus -o ${prefix}.${extension} -

if [ -f ${prefix}.unmapped.fastq.1.gz ]; then
    mv ${prefix}.unmapped.fastq.1.gz ${prefix}.unmapped_1.fastq.gz
fi

if [ -f ${prefix}.unmapped.fastq.2.gz ]; then
    mv ${prefix}.unmapped.fastq.2.gz ${prefix}.unmapped_2.fastq.gz
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//')
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    pigz: \$( pigz --version 2>&1 | sed 's/pigz //g' )
END_VERSIONS
"""

NextFlow SAMtools Bowtie 2 From line 44 of align/main.nf

"""
touch ${prefix}.${extension}
touch ${prefix}.bowtie2.log
touch ${prefix}.unmapped_1.fastq.gz
touch ${prefix}.unmapped_2.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//')
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    pigz: \$( pigz --version 2>&1 | sed 's/pigz //g' )
END_VERSIONS
"""

NextFlow From line 80 of align/main.nf

"""
mkdir bowtie2
bowtie2-build $args --threads $task.cpus $fasta bowtie2/${fasta.baseName}
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//')
END_VERSIONS
"""

NextFlow Bowtie 2 From line 22 of build/main.nf

"""
mkdir bowtie2
touch bowtie2/${fasta.baseName}.{1..4}.bt2
touch bowtie2/${fasta.baseName}.rev.{1,2}.bt2

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//')
END_VERSIONS
"""

NextFlow Bowtie 2 From line 32 of build/main.nf

"""
$command1 \\
    $args \\
    ${file_list.join(' ')} \\
    $command2 \\
    > ${prefix}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    pigz: \$( pigz --version 2>&1 | sed 's/pigz //g' )
END_VERSIONS
"""

NextFlow From line 38 of cat/main.nf

"""
touch $prefix

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    pigz: \$( pigz --version 2>&1 | sed 's/pigz //g' )
END_VERSIONS
"""

NextFlow From line 54 of cat/main.nf

"""
cutadapt \\
    --cores $task.cpus \\
    $args \\
    $trimmed \\
    $reads \\
    > ${prefix}.cutadapt.log
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cutadapt: \$(cutadapt --version)
END_VERSIONS
"""

NextFlow Cutadapt From line 25 of cutadapt/main.nf

"""
touch ${prefix}.cutadapt.log
touch ${trimmed}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cutadapt: \$(cutadapt --version)
END_VERSIONS
"""

NextFlow From line 41 of cutadapt/main.nf

"""
printf "%s %s\\n" $rename_to | while read old_name new_name; do
    [ -f "\${new_name}" ] || ln -s \$old_name \$new_name
done

fastqc \\
    $args \\
    --threads $task.cpus \\
    $renamed_files

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""

NextFlow FastQC From line 28 of fastqc/main.nf

"""
touch ${prefix}.html
touch ${prefix}.zip

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""

NextFlow FastQC From line 46 of fastqc/main.nf

"""
gunzip \\
    -f \\
    $args \\
    $archive

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//')
END_VERSIONS
"""

NextFlow From line 23 of gunzip/main.nf

"""
touch $gunzip
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//')
END_VERSIONS
"""

NextFlow From line 37 of gunzip/main.nf

"""
multiqc \\
    --force \\
    $args \\
    $config \\
    $extra_config \\
    .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" )
END_VERSIONS
"""

NextFlow MultiQC From line 28 of multiqc/main.nf

"""
touch multiqc_data
touch multiqc_plots
touch multiqc_report.html

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" )
END_VERSIONS
"""

NextFlow MultiQC From line 43 of multiqc/main.nf

"""
STAR \\
    --genomeDir $index \\
    --readFilesIn $in_reads \\
    --runThreadN $task.cpus \\
    --outFileNamePrefix $prefix. \\
    $out_sam_type \\
    $ignore_gtf \\
    $solo_whitelist \\
    $attrRG \\
    $args

$mv_unsorted_bam

if [ -f ${prefix}.Unmapped.out.mate1 ]; then
    mv ${prefix}.Unmapped.out.mate1 ${prefix}.unmapped_1.fastq
    gzip ${prefix}.unmapped_1.fastq
fi
if [ -f ${prefix}.Unmapped.out.mate2 ]; then
    mv ${prefix}.Unmapped.out.mate2 ${prefix}.unmapped_2.fastq
    gzip ${prefix}.unmapped_2.fastq
fi

if [ -d ${prefix}.Solo.out ]; then
    # Backslashes still need to be escaped (https://github.com/nextflow-io/nextflow/issues/67)
    find ${prefix}.Solo.out \\( -name "*.tsv" -o -name "*.mtx" \\) -exec gzip {} \\;
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow STAR From line 57 of align/main.nf

"""
touch ${prefix}Xd.out.bam
touch ${prefix}.Log.final.out
touch ${prefix}.Log.out
touch ${prefix}.Log.progress.out
touch ${prefix}.sortedByCoord.out.bam
mkdir ${prefix}.Solo.out
touch ${prefix}.toTranscriptome.out.bam
touch ${prefix}.Aligned.unsort.out.bam
touch ${prefix}.Aligned.sortedByCoord.out.bam
touch ${prefix}.unmapped_1.fastq.gz
touch ${prefix}.unmapped_2.fastq.gz
touch ${prefix}.tab
touch ${prefix}.SJ.out.tab
touch ${prefix}.ReadsPerGene.out.tab
touch ${prefix}.Chimeric.out.junction
touch ${prefix}.out.sam
touch ${prefix}.Signal.UniqueMultiple.str1.out.wig
touch ${prefix}.Signal.UniqueMultiple.str1.out.bg

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow From line 95 of align/main.nf

"""
mkdir star
STAR \\
    --runMode genomeGenerate \\
    --genomeDir star/ \\
    --genomeFastaFiles $fasta \\
    --sjdbGTFfile $gtf \\
    --runThreadN $task.cpus \\
    $memory \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow STAR From line 26 of genomegenerate/main.nf

"""
samtools faidx $fasta
NUM_BASES=`gawk '{sum = sum + \$2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' ${fasta}.fai`

mkdir star
STAR \\
    --runMode genomeGenerate \\
    --genomeDir star/ \\
    --genomeFastaFiles $fasta \\
    --sjdbGTFfile $gtf \\
    --runThreadN $task.cpus \\
    --genomeSAindexNbases \$NUM_BASES \\
    $memory \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow SAMtools STAR From line 45 of genomegenerate/main.nf

"""
mkdir star
touch star/Genome
touch star/Log.out
touch star/SA
touch star/SAindex
touch star/chrLength.txt
touch star/chrName.txt
touch star/chrNameLength.txt
touch star/chrStart.txt
touch star/exonGeTrInfo.tab
touch star/exonInfo.tab
touch star/geneInfo.tab
touch star/genomeParameters.txt
touch star/sjdbInfo.txt
touch star/sjdbList.fromGTF.out.tab
touch star/sjdbList.out.tab
touch star/transcriptInfo.tab

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""