rnasplice is a bioinformatics pipeline for RNA-seq alternative splicing analysis

public 1yr ago Version: dev 0 bookmarks

View Workflow

Introduction

nf-core/rnasplice is a bioinformatics pipeline for alternative splicing analysis of RNA sequencing data obtained from organisms with a reference genome and annotation.

nf-core/rnasplice metro map

Merge re-sequenced FastQ files ( cat )
Read QC ( FastQC )
Adapter and quality trimming ( TrimGalore )
Alignment with STAR
Choice of quantification depending on analysis type:
1. STAR -> Salmon
2. STAR -> featureCounts
3. STAR -> HTSeq (DEXSeq count)
Sort and index alignments ( SAMtools )
Create bigWig coverage files ( BEDTools , bedGraphToBigWig )
Pseudo-alignment and quantification ( Salmon ; optional )
Summarize QC ( MultiQC )
Differential Exon Usage (DEU):
1. HTSeq -> DEXSeq
2. featureCounts -> edgeR
3. Quantification with featureCounts or HTSeq
4. Differential exon usage with DEXSeq or edgeR
Differential Transcript Usage (DTU):
1. Salmon -> DRIMSeq -> DEXSeq
2. Filtering with DRIMSeq
3. Differential transcript usage with DEXSeq
Event-based splicing analysis:
1. STAR -> rMATS
2. Salmon -> SUPPA2

Usage

Note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv :

sample,fastq_1,fastq_2,strandedness,condition
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,forward,CONTROL
CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,forward,CONTROL
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,forward,CONTROL

Each row represents a fastq file (single-end) or a pair of fastq files (paired end). Rows with the same sample identifier are considered technical replicates and merged automatically. The strandedness refers to the library preparation and should be specified by the user.

Warning: Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters ; see docs .

Now, you can run the pipeline using:

nextflow run nf-core/rnasplice \
 --input samplesheet.csv \
 --contrasts contrastsheet.csv \
 --genome GRCh37 \
 --outdir <OUTDIR> \
 -profile <docker/singularity/.../institute>

For more details and further functionality, please refer to the usage documentation and the parameter documentation .

Pipeline output

To see the results of an example test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation .

Online videos

You can find numerous talks on the nf-core events page from various topics including writing pipelines/modules in Nextflow DSL2, using nf-core tooling, running nf-core pipelines as well as more generic content like contributing to Github. Please check them out!

Credits

nf-core/rnasplice was originally written by the bioinformatics team from Zifo RnD Solutions :

We thank Harshil Patel ( @drpatelh ) and Seqera Labs ( seqeralabs ) for their assistance in the development of this pipeline.

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines .

For further information or help, don't hesitate to get in touch on the Slack #rnasplice channel (you can join with this invite ).

Citations

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x .

Code Snippets

"""
bedtools \\
    genomecov \\
    -ibam $bam \\
    -bg \\
    -strand + \\
    $args \\
    | bedtools sort > ${prefix_forward}.bedGraph
bedtools \\
    genomecov \\
    -ibam $bam \\
    -bg \\
    -strand - \\
    $args \\
    | bedtools sort > ${prefix_reverse}.bedGraph
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""

NextFlow BEDTools From line 31 of local/bedtools_genomecov.nf

"""
dexseq_prepare_annotation.py $gtf ${prefix}.gff $aggregation

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    htseq: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('htseq').version)")
END_VERSIONS
"""

NextFlow HTSeq DESeq From line 28 of local/dexseq_annotation.nf

"""
dexseq_count.py $gff $read_type -f bam $bam -r pos ${prefix}.clean.count.txt $alignment_quality $strandedness

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    htseq: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('htseq').version)")
END_VERSIONS
"""

NextFlow HTSeq From line 39 of local/dexseq_count.nf

"""
run_dexseq_dtu.R $drimseq_sample_data $drimseq_contrast_data $drimseq_d_counts $ntop

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
    bioconductor-dexseq:  \$(Rscript -e "library(DEXSeq); cat(as.character(packageVersion('DEXSeq')))")
END_VERSIONS
"""

NextFlow From line 27 of local/dexseq_dtu.nf

"""
run_drimseq_filter.R $txi $tximport_tx2gene $samplesheet \\
    $min_samps_gene_expr \\
    $min_samps_feature_expr \\
    $min_samps_feature_prop \\
    $min_feature_expr \\
    $min_feature_prop \\
    $min_gene_expr

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
    bioconductor-drimseq: \$(Rscript -e "library(DRIMSeq); cat(as.character(packageVersion('DRIMSeq')))")
END_VERSIONS
"""

NextFlow DRIMSeq From line 32 of local/drimseq_filter.nf

"""
run_edger_exon.R featurecounts $samplesheet $contrastsheet $n_edger_plot

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
    bioconductor-edger:  \$(Rscript -e "library(edgeR); cat(as.character(packageVersion('edgeR')))")
END_VERSIONS
"""

NextFlow From line 28 of local/edger_exon.nf

"""
flattenGTF $args -a $annotation -o annotation.saf

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    subread: \$( echo \$(flattenGTF -v 2>&1) | sed -e "s/flattenGTF v//g")
END_VERSIONS
"""

NextFlow From line 22 of local/flattengtf.nf

"""
gffread $args $gtf | sort -u 1> ${prefix}.tx2gene.tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gffread: \$(gffread --version 2>&1)
END_VERSIONS
"""

NextFlow gffread From line 23 of local/gffread_tx2gene.nf

"""
gffread $gtf -L --keep-genes | awk -F'\\t' -vOFS='\\t' '{ gsub("transcript", "mRNA", \$3); print}' > ${gtf.baseName}_genes.gff3

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gffread: \$(gffread --version 2>&1)
END_VERSIONS
"""

NextFlow gffread From line 20 of local/gtf_2_gff3.nf

"""
filter_gtf_for_genes_in_genome.py \\
    --gtf $gtf \\
    --fasta $fasta \\
    -o ${fasta.baseName}_genes.gtf
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""

NextFlow From line 22 of local/gtf_gene_filter.nf

"""
index_gff --index $gff3 $index
parse_miso_index.py -p $index

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed "s/Python //g")
    misopy: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('misopy').version)")
END_VERSIONS
"""

NextFlow misopy From line 21 of local/miso_index.nf

"""
miso --run ${miso_index} $bams --output-dir miso_data/${meta.id} --read-len $miso_read_len

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    misopy: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('misopy').version)")
END_VERSIONS
"""

NextFlow misopy From line 22 of local/miso_run.nf

"""
sashimi_plot --plot-event $miso_gene $index_path $miso_settings --output-dir sashimi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed "s/Python //g")
    misopy: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('misopy').version)")
END_VERSIONS
"""

NextFlow misopy From line 23 of local/miso_sashimi.nf

"""
create_miso_settings.py \\
    $args \\
    --bams $bams \\
    --name $miso \\
    --width $fig_width \\
    --height $fig_height \\
    --output 'miso_settings.txt'

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    parsimonious: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('parsimonious').version)")
END_VERSIONS
"""

NextFlow From line 24 of local/miso_settings.nf

"""
$command $fasta | cut -d "|" -f1 > ${outfile}.fixed.fa

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    sed: \$(echo \$(sed --version 2>&1) | sed 's/^.*GNU sed) //; s/ .*\$//')
END_VERSIONS
"""

NextFlow From line 23 of local/preprocess_transcripts_fasta_gencode.nf

"""
mkdir -p $prefix/rmats_post

rmats.py \\
    --gtf $gtf \\
    --b1 $bam1_text \\
    --b2 $bam2_text \\
    --od $prefix/rmats_post \\
    --tmp $prefix/rmats_temp \\
    -t $read_type \\
    --libType $strandedness \\
    --readLength $rmats_read_len \\
    --variable-read-length \\
    --nthread $task.cpus \\
    --tstat $task.cpus \\
    --cstat $rmats_splice_diff_cutoff \\
    --task post \\
    $paired_stats \\
    $novel_splice_sites \\
    $min_intron_len \\
    $max_exon_len \\
    --allow-clipping \\
    1> $prefix/rmats_post.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rmats: \$(echo \$(rmats.py --version) | sed -e "s/v//g")
END_VERSIONS
"""

NextFlow rmats From line 65 of local/rmats_post.nf

"""
rmats.py \\
    --b1 $bam1 \\
    -t $read_type \\
    --libType $strandedness \\
    --nthread $task.cpus \\
    --gtf $gtf \\
    --allow-clipping \\
    --readLength $rmats_read_len \\
    --variable-read-length \\
    --cstat $rmats_splice_diff_cutoff \\
    --task post \\
    $paired_stats \\
    $novel_splice_sites \\
    $min_intron_len \\
    $max_exon_len \\
    --tmp rmats_temp \\
    --od rmats_post 1> rmats_post.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rmats: \$(echo \$(rmats.py --version) | sed -e "s/v//g")
END_VERSIONS
"""

NextFlow rmats From line 63 of local/rmats_post_single.nf

"""
mkdir -p $prefix/rmats_temp

mkdir -p $prefix/rmats_prep

rmats.py \\
    --gtf $gtf \\
    --b1 $bam1_text \\
    --b2 $bam2_text \\
    --od $prefix/rmats_prep \\
    --tmp $prefix/rmats_temp \\
    -t $read_type \\
    --libType $strandedness \\
    --readLength $rmats_read_len \\
    --variable-read-length \\
    --nthread $task.cpus \\
    --tstat $task.cpus \\
    --cstat $rmats_splice_diff_cutoff \\
    --task prep \\
    $novel_splice_sites \\
    $min_intron_len \\
    $max_exon_len \\
    --allow-clipping \\
    1> $prefix/rmats_prep.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rmats: \$(echo \$(rmats.py --version) | sed -e "s/v//g")
END_VERSIONS
"""

NextFlow rmats From line 59 of local/rmats_prep.nf

"""
rmats.py \\
    --b1 $bam_group1 \\
    -t $read_type \\
    --libType $strandedness \\
    --nthread $task.cpus \\
    --gtf $gtf \\
    --allow-clipping \\
    --readLength $rmats_read_len \\
    --variable-read-length \\
    --cstat $rmats_splice_diff_cutoff \\
    --task prep \\
    $novel_splice_sites \\
    $min_intron_len \\
    $max_exon_len \\
    --tmp rmats_temp \\
    --od rmats_prep 1> rmats_prep.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rmats: \$(echo \$(rmats.py --version) | sed -e "s/v//g")
END_VERSIONS
"""

NextFlow rmats From line 58 of local/rmats_prep_single.nf

"""
check_samplesheet_fastq.py $samplesheet samplesheet.valid.csv
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""

NextFlow From line 25 of local/samplesheet_check.nf

"""
check_samplesheet_genome_bam.py $samplesheet samplesheet.valid.csv
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""

NextFlow From line 34 of local/samplesheet_check.nf

"""
check_samplesheet_transcriptome_bam.py $samplesheet samplesheet.valid.csv
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""

NextFlow From line 43 of local/samplesheet_check.nf

"""
check_samplesheet_salmon_results.py $samplesheet samplesheet.valid.csv
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""

NextFlow From line 52 of local/samplesheet_check.nf

"""
run_stager.R $contrast $feature_tsv $gene_tsv $analysis_type

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
    bioconductor-stager: \$(Rscript -e "library(stageR); cat(as.character(packageVersion('stageR')))")
END_VERSIONS
"""

NextFlow stageR From line 24 of local/stager.nf

"""
STAR \\
    --genomeDir $index \\
    --readFilesIn $reads  \\
    --runThreadN $task.cpus \\
    --outFileNamePrefix $prefix. \\
    $out_sam_type \\
    $ignore_gtf \\
    $seq_center \\
    $args
$mv_unsorted_bam
if [ -f ${prefix}.Unmapped.out.mate1 ]; then
    mv ${prefix}.Unmapped.out.mate1 ${prefix}.unmapped_1.fastq
    gzip ${prefix}.unmapped_1.fastq
fi
if [ -f ${prefix}.Unmapped.out.mate2 ]; then
    mv ${prefix}.Unmapped.out.mate2 ${prefix}.unmapped_2.fastq
    gzip ${prefix}.unmapped_2.fastq
fi
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow STAR From line 44 of local/star_align_igenomes.nf

"""
mkdir star
STAR \\
    --runMode genomeGenerate \\
    --genomeDir star/ \\
    --genomeFastaFiles $fasta \\
    --sjdbGTFfile $gtf \\
    --runThreadN $task.cpus \\
    $memory \\
    $args
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow STAR From line 28 of local/star_genomegenerate_igenomes.nf

"""
samtools faidx $fasta
NUM_BASES=`gawk '{sum = sum + \$2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' ${fasta}.fai`
mkdir star
STAR \\
    --runMode genomeGenerate \\
    --genomeDir star/ \\
    --genomeFastaFiles $fasta \\
    --sjdbGTFfile $gtf \\
    --runThreadN $task.cpus \\
    --genomeSAindexNbases \$NUM_BASES \\
    $memory \\
    $args
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow SAMtools STAR From line 46 of local/star_genomegenerate_igenomes.nf

"""
suppa.py \\
    clusterEvents \\
    --dpsi $dpsi \\
    --psivec $psivec \\
    --dpsi-threshold $clusterevents_dpsithreshold \\
    --eps $clusterevents_eps \\
    --metric $clusterevents_metric \\
    --min-pts $clusterevents_min_pts \\
    --groups $group_ranges \\
    --clustering $clusterevents_method \\
    $clusterevents_sigthreshold $clusterevents_separation -o ${cond1}-${cond2}_${prefix}_cluster

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    suppa: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('suppa').version)")
END_VERSIONS
"""

NextFlow SUPPA From line 37 of local/suppa_clusterevents.nf

"""
suppa_groups.py $psivec
"""

NextFlow From line 20 of local/suppa_clustergroups.nf

"""
suppa.py \\
    diffSplice \\
    -m $diffsplice_method \\
    $gc $pa -s -c $median \\
    -a $diffsplice_area \\
    -l $diffsplice_lower_bound \\
    -al $diffsplice_alpha \\
    -th $diffsplice_tpm_threshold \\
    -nan $diffsplice_nan_threshold \\
    -i $events \\
    -p $psi1 $psi2 \\
    -e $tpm1 $tpm2 \\
    -o ${cond1}-${cond2}_${prefix}_diffsplice

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    suppa: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('suppa').version)")
END_VERSIONS
"""

NextFlow SUPPA From line 39 of local/suppa_diffsplice.nf

"""
suppa.py \\
    generateEvents \\
    -i $gtf \\
    -f $file_type \\
    -o events \\
    -e $generateevents_event_type \\
    -b $generateevents_boundary \\
    -t $generateevents_threshold \\
    -l $generateevents_exon_length \\
    $poolgenes

awk 'FNR==1 && NR!=1 { while (/^seqname/) getline; }  1 {print}' *.ioe > events.ioe

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    suppa: \$(pip show suppa| sed -e '/Version/!d'| sed 's/Version: //g')
END_VERSIONS
"""

NextFlow SUPPA From line 35 of local/suppa_generateevents.nf

"""
suppa.py \\
    generateEvents \\
    -i $gtf \\
    -f $file_type \\
    -o events \\
    $poolgenes

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    suppa: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('suppa').version)")
END_VERSIONS
"""

NextFlow SUPPA From line 57 of local/suppa_generateevents.nf

"""
suppa.py \\
    psiPerEvent \\
    -i $ioe \\
    -e $tpm \\
    -f $psiperevent_total_filter \\
    -o suppa_local

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    suppa: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('suppa').version)")
END_VERSIONS
"""

NextFlow SUPPA From line 24 of local/suppa_psiperevent.nf

"""
suppa.py \\
    psiPerIsoform \\
    -g $gtf \\
    -e $tpm \\
    -o suppa

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    suppa: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('suppa').version)")
END_VERSIONS
"""

NextFlow SUPPA From line 22 of local/suppa_psiperisoform.nf

"""
suppa_split_file.R \\
    $tpm_psi \\
    $samplesheet \\
    $output_type \\
    $calc_ranges \\
    $prefix

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
END_VERSIONS
"""

NextFlow From line 27 of local/suppa_split_files.nf

"""
tximport.R $tx2gene salmon salmon.merged

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
    bioconductor-tximeta: \$(Rscript -e "library(tximeta); cat(as.character(packageVersion('tximeta')))")
END_VERSIONS
"""

NextFlow Salmon bioconductor-tximeta From line 52 of local/tximport.nf

"""
cat ${readList.join(' ')} > ${prefix}.merged.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
END_VERSIONS
"""

NextFlow From line 26 of fastq/main.nf

"""
cat ${read1.join(' ')} > ${prefix}_1.merged.fastq.gz
cat ${read2.join(' ')} > ${prefix}_2.merged.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
END_VERSIONS
"""

NextFlow From line 40 of fastq/main.nf

"""
touch ${prefix}.merged.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
END_VERSIONS
"""

NextFlow From line 57 of fastq/main.nf

"""
touch ${prefix}_1.merged.fastq.gz
touch ${prefix}_2.merged.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
END_VERSIONS
"""

NextFlow From line 68 of fastq/main.nf

"""
samtools faidx $fasta
cut -f 1,2 ${fasta}.fai > ${fasta}.sizes

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    getchromsizes: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow SAMtools From line 24 of getchromsizes/main.nf

"""
touch ${fasta}.fai
touch ${fasta}.sizes

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    getchromsizes: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow From line 35 of getchromsizes/main.nf

"""
printf "%s %s\\n" $rename_to | while read old_name new_name; do
    [ -f "\${new_name}" ] || ln -s \$old_name \$new_name
done
fastqc $args --threads $task.cpus $renamed_files

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""

NextFlow FastQC From line 28 of fastqc/main.nf

"""
touch ${prefix}.html
touch ${prefix}.zip

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""

NextFlow FastQC From line 42 of fastqc/main.nf

"""
gffread \\
    $gff \\
    $args \\
    -o ${prefix}.gtf
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gffread: \$(gffread --version 2>&1)
END_VERSIONS
"""

NextFlow gffread From line 23 of gffread/main.nf

"""
gunzip \\
    -f \\
    $args \\
    $archive

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//')
END_VERSIONS
"""

NextFlow From line 23 of gunzip/main.nf

"""
touch $gunzip
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//')
END_VERSIONS
"""

NextFlow From line 37 of gunzip/main.nf

"""
multiqc \\
    --force \\
    $args \\
    $config \\
    $extra_config \\
    .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" )
END_VERSIONS
"""

NextFlow MultiQC From line 28 of multiqc/main.nf

"""
touch multiqc_data
touch multiqc_plots
touch multiqc_report.html

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" )
END_VERSIONS
"""

NextFlow MultiQC From line 43 of multiqc/main.nf

"""
STAR \\
    --runMode genomeGenerate \\
    --genomeDir rsem/ \\
    --genomeFastaFiles $fasta \\
    --sjdbGTFfile $gtf \\
    --runThreadN $task.cpus \\
    $memory \\
    $args2

rsem-prepare-reference \\
    --gtf $gtf \\
    --num-threads $task.cpus \\
    ${args_list.join(' ')} \\
    $fasta \\
    rsem/genome

cp rsem/genome.transcripts.fa .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rsem: \$(rsem-calculate-expression --version | sed -e "s/Current version: RSEM v//g")
    star: \$(STAR --version | sed -e "s/STAR_//g")
END_VERSIONS
"""

NextFlow STAR RSEM From line 29 of preparereference/main.nf

"""
rsem-prepare-reference \\
    --gtf $gtf \\
    --num-threads $task.cpus \\
    $args \\
    $fasta \\
    rsem/genome

cp rsem/genome.transcripts.fa .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rsem: \$(rsem-calculate-expression --version | sed -e "s/Current version: RSEM v//g")
    star: \$(STAR --version | sed -e "s/STAR_//g")
END_VERSIONS
"""

NextFlow RSEM From line 55 of preparereference/main.nf

"""
$get_decoy_ids
sed -i.bak -e 's/>//g' decoys.txt
cat $transcript_fasta $genome_fasta > $gentrome

salmon \\
    index \\
    --threads $task.cpus \\
    -t $gentrome \\
    -d decoys.txt \\
    $args \\
    -i salmon

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    salmon: \$(echo \$(salmon --version) | sed -e "s/salmon //g")
END_VERSIONS
"""

NextFlow Salmon From line 29 of index/main.nf

"""
salmon quant \\
    --geneMap $gtf \\
    --threads $task.cpus \\
    --libType=$strandedness \\
    $reference \\
    $input_reads \\
    $args \\
    -o $prefix

if [ -f $prefix/aux_info/meta_info.json ]; then
    cp $prefix/aux_info/meta_info.json "${prefix}_meta_info.json"
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    salmon: \$(echo \$(salmon --version) | sed -e "s/salmon //g")
END_VERSIONS
"""

NextFlow Quant Salmon From line 58 of quant/main.nf

"""
samtools \\
    flagstat \\
    --threads ${task.cpus} \\
    $bam \\
    > ${prefix}.flagstat

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow SAMtools From line 23 of flagstat/main.nf

"""
samtools \\
    idxstats \\
    --threads ${task.cpus-1} \\
    $bam \\
    > ${prefix}.idxstats

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow SAMtools From line 24 of idxstats/main.nf

"""
samtools \\
    index \\
    -@ ${task.cpus-1} \\
    $args \\
    $input

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow SAMtools From line 24 of index/main.nf

"""
touch ${input}.bai
touch ${input}.crai
touch ${input}.csi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow From line 38 of index/main.nf

"""
samtools sort \\
    $args \\
    -@ $task.cpus \\
    -m ${sort_memory}M \\
    -o ${prefix}.bam \\
    -T $prefix \\
    $bam

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow SAMtools From line 26 of sort/main.nf

"""
touch ${prefix}.bam

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow From line 43 of sort/main.nf

"""
samtools \\
    stats \\
    --threads ${task.cpus} \\
    ${reference} \\
    ${input} \\
    > ${prefix}.stats

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow SAMtools From line 25 of stats/main.nf

"""
touch ${prefix}.stats

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow From line 41 of stats/main.nf

"""
STAR \\
    --genomeDir $index \\
    --readFilesIn ${reads1.join(",")} ${reads2.join(",")} \\
    --runThreadN $task.cpus \\
    --outFileNamePrefix $prefix. \\
    $out_sam_type \\
    $ignore_gtf \\
    $attrRG \\
    $args

$mv_unsorted_bam

if [ -f ${prefix}.Unmapped.out.mate1 ]; then
    mv ${prefix}.Unmapped.out.mate1 ${prefix}.unmapped_1.fastq
    gzip ${prefix}.unmapped_1.fastq
fi
if [ -f ${prefix}.Unmapped.out.mate2 ]; then
    mv ${prefix}.Unmapped.out.mate2 ${prefix}.unmapped_2.fastq
    gzip ${prefix}.unmapped_2.fastq
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow STAR From line 51 of align/main.nf

"""
touch ${prefix}Xd.out.bam
touch ${prefix}.Log.final.out
touch ${prefix}.Log.out
touch ${prefix}.Log.progress.out
touch ${prefix}.sortedByCoord.out.bam
touch ${prefix}.toTranscriptome.out.bam
touch ${prefix}.Aligned.unsort.out.bam
touch ${prefix}.Aligned.sortedByCoord.out.bam
touch ${prefix}.unmapped_1.fastq.gz
touch ${prefix}.unmapped_2.fastq.gz
touch ${prefix}.tab
touch ${prefix}.SJ.out.tab
touch ${prefix}.ReadsPerGene.out.tab
touch ${prefix}.Chimeric.out.junction
touch ${prefix}.out.sam
touch ${prefix}.Signal.UniqueMultiple.str1.out.wig
touch ${prefix}.Signal.UniqueMultiple.str1.out.bg

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow From line 83 of align/main.nf

"""
mkdir star
STAR \\
    --runMode genomeGenerate \\
    --genomeDir star/ \\
    --genomeFastaFiles $fasta \\
    --sjdbGTFfile $gtf \\
    --runThreadN $task.cpus \\
    $memory \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow STAR From line 26 of genomegenerate/main.nf

"""
samtools faidx $fasta
NUM_BASES=`gawk '{sum = sum + \$2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' ${fasta}.fai`

mkdir star
STAR \\
    --runMode genomeGenerate \\
    --genomeDir star/ \\
    --genomeFastaFiles $fasta \\
    --sjdbGTFfile $gtf \\
    --runThreadN $task.cpus \\
    --genomeSAindexNbases \$NUM_BASES \\
    $memory \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow SAMtools STAR From line 45 of genomegenerate/main.nf

"""
mkdir star
touch star/Genome
touch star/Log.out
touch star/SA
touch star/SAindex
touch star/chrLength.txt
touch star/chrName.txt
touch star/chrNameLength.txt
touch star/chrStart.txt
touch star/exonGeTrInfo.tab
touch star/exonInfo.tab
touch star/geneInfo.tab
touch star/genomeParameters.txt
touch star/sjdbInfo.txt
touch star/sjdbList.fromGTF.out.tab
touch star/sjdbList.out.tab
touch star/transcriptInfo.tab

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow From line 70 of genomegenerate/main.nf

"""
featureCounts \\
    $args \\
    $paired_end \\
    -T $task.cpus \\
    -a $annotation \\
    -s $strandedness \\
    -o ${prefix}.featureCounts.txt \\
    ${bams.join(' ')}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    subread: \$( echo \$(featureCounts -v 2>&1) | sed -e "s/featureCounts v//g")
END_VERSIONS
"""

NextFlow FeatureCounts From line 32 of featurecounts/main.nf

"""
[ ! -f  ${prefix}.fastq.gz ] && ln -s $reads ${prefix}.fastq.gz
trim_galore \\
    ${args_list.join(' ')} \\
    --cores $cores \\
    --gzip \\
    ${prefix}.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    trimgalore: \$(echo \$(trim_galore --version 2>&1) | sed 's/^.*version //; s/Last.*\$//')
    cutadapt: \$(cutadapt --version)
END_VERSIONS
"""

NextFlow Trim_Galore From line 42 of trimgalore/main.nf

"""
[ ! -f  ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz
[ ! -f  ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz
trim_galore \\
    $args \\
    --cores $cores \\
    --paired \\
    --gzip \\
    ${prefix}_1.fastq.gz \\
    ${prefix}_2.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    trimgalore: \$(echo \$(trim_galore --version 2>&1) | sed 's/^.*version //; s/Last.*\$//')
    cutadapt: \$(cutadapt --version)
END_VERSIONS
"""

NextFlow Trim_Galore From line 57 of trimgalore/main.nf

"""
bedClip \\
    $bedgraph \\
    $sizes \\
    ${prefix}.bedGraph

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    ucsc: $VERSION
END_VERSIONS
"""

NextFlow ucsc-bedclip From line 26 of bedclip/main.nf

"""
bedGraphToBigWig \\
    $bedgraph \\
    $sizes \\
    ${prefix}.bigWig

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    ucsc: $VERSION
END_VERSIONS
"""

NextFlow bedGraphToBigWig From line 26 of bedgraphtobigwig/main.nf

"""
umi_tools \\
    extract \\
    -I $reads \\
    -S ${prefix}.umi_extract.fastq.gz \\
    $args \\
    > ${prefix}.umi_extract.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    umitools: \$(umi_tools --version 2>&1 | sed 's/^.*UMI-tools version://; s/ *\$//')
END_VERSIONS
"""

NextFlow umi_tools From line 26 of extract/main.nf

"""
umi_tools \\
    extract \\
    -I ${reads[0]} \\
    --read2-in=${reads[1]} \\
    -S ${prefix}.umi_extract_1.fastq.gz \\
    --read2-out=${prefix}.umi_extract_2.fastq.gz \\
    $args \\
    > ${prefix}.umi_extract.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    umitools: \$(umi_tools --version 2>&1 | sed 's/^.*UMI-tools version://; s/ *\$//')
END_VERSIONS
"""

NextFlow umi_tools From line 40 of extract/main.nf

"""
mkdir $prefix

## Ensures --strip-components only applied when top level of tar contents is a directory
## If just files or multiple directories, place all in prefix
if [[ \$(tar -taf ${archive} | grep -o -P "^.*?\\/" | uniq | wc -l) -eq 1 ]]; then
    tar \\
        -C $prefix --strip-components 1 \\
        -xavf \\
        $args \\
        $archive \\
        $args2
else
    tar \\
        -C $prefix \\
        -xavf \\
        $args \\
        $archive \\
        $args2
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    untar: \$(echo \$(tar --version 2>&1) | sed 's/^.*(GNU tar) //; s/ Copyright.*\$//')
END_VERSIONS
"""

NextFlow From line 25 of untar/main.nf

"""
mkdir $prefix
touch ${prefix}/file.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    untar: \$(echo \$(tar --version 2>&1) | sed 's/^.*(GNU tar) //; s/ Copyright.*\$//')
END_VERSIONS
"""

NextFlow From line 54 of untar/main.nf

"""
check_samplesheet.py \\
    $samplesheet \\
    samplesheet.valid.csv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""

NextFlow From line 17 of local/samplesheet_check.nf

"""
[ ! -f  ${prefix}.fastq.gz ] && ln -s $reads ${prefix}.fastq.gz
fastqc $args --threads $task.cpus ${prefix}.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""

NextFlow FastQC From line 26 of fastqc/main.nf

"""
[ ! -f  ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz
[ ! -f  ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz
fastqc $args --threads $task.cpus ${prefix}_1.fastq.gz ${prefix}_2.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""

NextFlow FastQC From line 36 of fastqc/main.nf

"""
multiqc -f $args .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" )
END_VERSIONS
"""