circRNA quantification, differential expression analysis and miRNA target prediction of RNA-Seq data

public 1yr ago Version: Version 1 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

nf-core/circrna is a best-practice analysis pipeline for the quantification, miRNA target prediction and differential expression analysis of circular RNAs in paired-end RNA sequencing data.

The pipeline is built using Nextflow , a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It comes with docker containers making installation trivial and results highly reproducible.

Install Nextflow ( >=21.04.0 )
Install any of Docker , Singularity , Podman , Shifter or Charliecloud for full pipeline reproducibility (please only use Conda as a last resort; see docs )
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run nf-core/circrna -profile test,
Start running your own analysis!
nextflow run nf-core/circrna -profile --module 'circrna_discovery, mirna_prediction, differential_expression' --tool 'circexplorer2' --input 'samples.csv' --input_type 'fastq' --phenotype 'phenotype.csv'

Code Snippets

"""
grep -vf ${workflow.projectDir}/bin/unwanted_biotypes.txt $gtf > filt.gtf
mv $bed circs.bed

annotate_outputs.sh $exon_boundary &> ${prefix}.log
mv master_bed12.bed ${prefix}.bed.tmp

awk -v FS="\t" '{print \$11}' ${prefix}.bed.tmp > mature_len.tmp
awk -v FS="," '{for(i=t=0;i<NF;) t+=\$++i; \$0=t}1' mature_len.tmp > mature_length

paste ${prefix}.bed.tmp mature_length > ${prefix}.bed

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
    ucsc: $VERSION
END_VERSIONS
"""

NextFlow BEDTools From line 27 of full_annotation/main.nf

"""
# remove redundant biotypes from GTF.
grep -vf ${workflow.projectDir}/bin/unwanted_biotypes.txt $gtf > filt.gtf

# generate circrna BED file.
tail -n +2 $circrna_matrix | awk '{print \$1}' > IDs.txt
ID_to_BED.sh IDs.txt
cat *.bed > merged.txt && rm IDs.txt && rm *.bed && mv merged.txt circs.bed

# Re-use annotation script to identify the host gene.
annotate_outputs.sh $exon_boundary &> annotation.log
awk -v OFS="\t" '{print \$4, \$14}' master_bed12.bed > circrna_host-gene.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
    ucsc: $VERSION
END_VERSIONS
"""

NextFlow BEDTools From line 23 of parent_gene/main.nf

"""
awk '{if(\$13 >= ${bsj_reads}) print \$0}' ${prefix}.txt | awk -v OFS="\t" '{print \$1,\$2,\$3,\$6,\$13}' > ${prefix}_${meta.tool}.bed

awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_${meta.tool}.bed > ${prefix}_${meta.tool}_circs.bed
"""

NextFlow From line 19 of filter/main.nf

"""
gtfToGenePred \
    $args \
    $gtf \
    ${prefix}.genepred

awk -v OFS="\t" '{print \$12, \$1, \$2, \$3, \$4, \$5, \$6, \$7, \$8, \$9, \$10}' ${prefix}.genepred > ${prefix}.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    ucsc: $VERSION
END_VERSIONS
"""

NextFlow gtftogenepred From line 24 of reference/main.nf

"""
mkdir -p star_dir && mv *.tab *.junction *.sam star_dir
postProcessStarAlignment.pl --starDir star_dir/ --outDir ./

awk '{if(\$5 >= ${bsj_reads}) print \$0}' ${prefix}.filteredJunctions.bed | awk  -v OFS="\t" -F"\t" '{print \$1,\$2,\$3,\$6,\$5}' > ${prefix}_circrna_finder.bed

awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_circrna_finder.bed > ${prefix}_circrna_finder_circs.bed

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    circRNA_finder: $VERSION
END_VERSIONS
"""

NextFlow From line 27 of filter/main.nf

"""
prepare_circ_test.R

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
END_VERSIONS
"""

NextFlow From line 24 of prepare/main.nf

"""
circ_test.R $circ_csv $linear_csv $phenotype

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
    aod: \$(Rscript -e "library(aod); cat(as.character(packageVersion('aod')))")
    ggplot2: \$(Rscript -e "library(ggplot2); cat(as.character(packageVersion('ggplot2')))")
    plyr: \$(Rscript -e "library(plyr); cat(as.character(packageVersion('plyr')))")
END_VERSIONS
"""

NextFlow ggplot2 From line 22 of test/main.nf

"""
CIRIquant \\
    -t ${task.cpus} \\
    -1 ${reads[0]} \\
    -2 ${reads[1]} \\
    --config $yml \\
    --no-gene \\
    -o ${prefix} \\
    -p ${prefix}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bwa: \$(echo \$(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*\$//')
    ciriquant : \$(echo \$(CIRIquant --version 2>&1) | sed 's/CIRIquant //g' )
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    stringtie: \$(stringtie --version 2>&1)
    hisat2: $VERSION
END_VERSIONS
"""

NextFlow ciriquant From line 26 of ciriquant/main.nf

"""
grep -v "#" ${prefix}.gtf | awk '{print \$14}' | cut -d '.' -f1 > counts
grep -v "#" ${prefix}.gtf | awk -v OFS="\t" '{print \$1,\$4,\$5,\$7}' > ${prefix}.tmp
paste ${prefix}.tmp counts > ${prefix}_unfilt.bed

awk '{if(\$5 >= ${bsj_reads}) print \$0}' ${prefix}_unfilt.bed > ${prefix}_filt.bed
grep -v '^\$' ${prefix}_filt.bed > ${prefix}_ciriquant

awk -v OFS="\t" '{\$2-=1;print}' ${prefix}_ciriquant > ${prefix}_ciriquant.bed
rm ${prefix}.gtf

awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_ciriquant.bed > ${prefix}_ciriquant_circs.bed
"""

NextFlow From line 18 of filter/main.nf

"""
BWA=`which bwa`
HISAT2=`which hisat2`
STRINGTIE=`which stringtie`
SAMTOOLS=`which samtools`

touch travis.yml
printf "name: ciriquant\ntools:\n  bwa: \$BWA\n  hisat2: \$HISAT2\n  stringtie: \$STRINGTIE\n  samtools: \$SAMTOOLS\n\nreference:\n  fasta: ${fasta_path}\n  gtf: ${gtf_path}\n  bwa_index: ${bwa_path}/${bwa_prefix}\n  hisat_index: ${hisat2_path}/${hisat2_prefix}" >> travis.yml
"""

NextFlow From line 28 of yml/main.nf

"""
python ${workflow.projectDir}/bin/circRNA_counts_matrix.py > matrix.txt
## handle non-canon chromosomes here (https://stackoverflow.com/questions/71479919/joining-columns-based-on-number-of-fields)
n_samps=\$(ls *.bed | wc -l)
canon=\$(awk -v a="\$n_samps" 'BEGIN {print a + 4}')
awk -v n="\$canon" '{ for (i = 2; i <= NF - n + 1; ++i) { \$1 = \$1"-"\$i; \$i=""; } } 1' matrix.txt | awk -v OFS="\t" '\$1=\$1' > circRNA_matrix.txt
Rscript ${workflow.projectDir}/bin/reformat_count_matrix.R

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
    argparser: \$(Rscript -e "library(arparser); cat(as.character(packageVersion('argparser')))")
    dplyr: \$(Rscript -e "library(dplyr); cat(as.character(packageVersion('dplyr')))")
END_VERSIONS
"""

NextFlow From line 21 of combined/main.nf

"""
## make list of files for R to read
ls *.bed > samples.csv

## Add catch for empty bed file and delete
bash ${workflow.projectDir}/bin/check_empty.sh

## Use intersection of "n" (params.tool_filter) circRNAs called by tools
## remove duplicate IDs, keep highest count.
Rscript ${workflow.projectDir}/bin/consolidate_algorithms_intersection.R samples.csv $tool_filter $duplicates_fun
mv combined_counts.bed ${prefix}.bed

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
    argparser: \$(Rscript -e "library(arparser); cat(as.character(packageVersion('argparser')))")
    dplyr: \$(Rscript -e "library(dplyr); cat(as.character(packageVersion('dplyr')))")
END_VERSIONS
"""

NextFlow From line 24 of merge_tools/main.nf

"""
# Strip tool name from BED files (no consolidation prior to this step for 1 tool)
for b in *.bed; do
    basename=\${b%".bed"};
    sample_name=\${basename%"_${tool_name}"};
    mv \$b \${sample_name}.bed
done

python ${workflow.projectDir}/bin/circRNA_counts_matrix.py > matrix.txt
## handle non-canon chromosomes here (https://stackoverflow.com/questions/71479919/joining-columns-based-on-number-of-fields)
n_samps=\$(ls *.bed | wc -l)
canon=\$(awk -v a="\$n_samps" 'BEGIN {print a + 4}')
awk -v n="\$canon" '{ for (i = 2; i <= NF - n + 1; ++i) { \$1 = \$1"-"\$i; \$i=""; } } 1' matrix.txt | awk -v OFS="\t" '\$1=\$1' > circRNA_matrix.txt
Rscript ${workflow.projectDir}/bin/reformat_count_matrix.R

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
    argparser: \$(Rscript -e "library(arparser); cat(as.character(packageVersion('argparser')))")
    dplyr: \$(Rscript -e "library(dplyr); cat(as.character(packageVersion('dplyr')))")
END_VERSIONS
"""

NextFlow From line 23 of single/main.nf

"""
sed -i 's/^chr//g' $gtf

mkdir ${prefix} && mv ${prefix}.Chimeric.out.junction ${prefix} && printf "${prefix}/${prefix}.Chimeric.out.junction" > samplesheet

DCC @samplesheet -D -an $gtf -Pi -ss -F -M -Nr 1 1 -fg -A $fasta -N -T ${task.cpus}

awk '{print \$6}' CircCoordinates >> strand
paste CircRNACount strand | tail -n +2 | awk -v OFS="\t" '{print \$1,\$2,\$3,\$5,\$4}' >> ${prefix}.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    dcc: \$(DCC --version)
END_VERSIONS
"""

NextFlow dcc From line 26 of dcc/main.nf

"""
sed -i 's/^chr//g' $gtf

mkdir ${prefix} && mv ${prefix}.Chimeric.out.junction ${prefix} && printf "${prefix}/${prefix}.Chimeric.out.junction" > samplesheet
mkdir ${prefix}_mate1 && mv ${prefix}_mate1.Chimeric.out.junction ${prefix}_mate1 && printf "${prefix}_mate1/${prefix}_mate1.Chimeric.out.junction" > mate1file
mkdir ${prefix}_mate2 && mv ${prefix}_mate2.Chimeric.out.junction ${prefix}_mate2 && printf "${prefix}_mate2/${prefix}_mate2.Chimeric.out.junction" > mate2file

DCC @samplesheet -mt1 @mate1file -mt2 @mate2file -D -an $gtf -Pi -ss -F -M -Nr 1 1 -fg -A $fasta -N -T ${task.cpus}

awk '{print \$6}' CircCoordinates >> strand
paste CircRNACount strand | tail -n +2 | awk -v OFS="\t" '{print \$1,\$2,\$3,\$5,\$4}' >> ${prefix}.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    dcc: \$(DCC --version)
END_VERSIONS
"""

NextFlow dcc From line 42 of dcc/main.nf

"""
awk '{if(\$5 >= ${bsj_reads}) print \$0}' ${prefix}.txt > ${prefix}_dcc.filtered
awk -v OFS="\t" '{\$2-=1;print}' ${prefix}_dcc.filtered > ${prefix}_dcc.bed
awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_dcc.bed > ${prefix}_dcc_circs.bed
"""

NextFlow From line 18 of filter/main.nf

"""
## prepDE && circRNA counts headers are sorted such that uppercase preceedes lowercase i.e Z before a
## reformat the phenotype file to match the order of the samples.
head -n 1 $phenotype > header
tail -n +2 $phenotype | LC_COLLATE=C sort > sorted_pheno
cat header sorted_pheno > tmp && rm phenotype.csv && mv tmp phenotype.csv

DEA.R $gene_matrix $phenotype $circrna_matrix $species ensembl_database_map.txt
mv boxplots/ circRNA/

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
    argparser: \$(Rscript -e "library(argparser); cat(as.character(packageVersion('argparser')))")
    biomart: \$(Rscript -e "library(biomaRt); cat(as.character(packageVersion('biomaRt')))")
    deseq2: \$(Rscript -e "library(DESeq2); cat(as.character(packageVersion('DESeq2')))")
    dplyr: \$(Rscript -e "library(dplyr); cat(as.character(packageVersion('dplyr')))")
    enhancedvolcano: \$(Rscript -e "library(EnhancedVolcano); cat(as.character(packageVersion('EnhancedVolcano')))")
    gplots: \$(Rscript -e "library(gplots); cat(as.character(packageVersion('gplots')))")
    ggplot2: \$(Rscript -e "library(ggplot2); cat(as.character(packageVersion('ggplot2')))")
    ggpubr: \$(Rscript -e "library(ggpubr); cat(as.character(packageVersion('ggpubr')))")
    ihw: \$(Rscript -e "library(IHW); cat(as.character(packageVersion('IHW')))")
    pvclust: \$(Rscript -e "library(pvclust); cat(as.character(packageVersion('pvclust')))")
    pcatools: \$(Rscript -e "library(PCAtools); cat(as.character(packageVersion('PCAtools')))")
    pheatmap: \$(Rscript -e "library(pheatmap); cat(as.character(packageVersion('pheatmap')))")
    rcolorbrewer: \$(Rscript -e "library(RColorBrewer); cat(as.character(packageVersion('RColorBrewer')))")
END_VERSIONS
"""

NextFlow ggplot2 DESeq2 pheatmap From line 27 of differential_expression/main.nf

"""
## FASTA sequences (bedtools does not like the extra annotation info - split will not work properly)
cut -d\$'\t' -f1-12 ${prefix}.bed > bed12.tmp
bedtools getfasta -fi $fasta -bed bed12.tmp -s -split -name > circ_seq.tmp

## clean fasta header
grep -A 1 '>' circ_seq.tmp | cut -d: -f1,2,3 > ${prefix}.fa && rm circ_seq.tmp

## add backsplice sequence for miRanda Targetscan, publish canonical FASTA to results.
rm $fasta
bash ${workflow.projectDir}/bin/backsplice_gen.sh ${prefix}.fa

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""

NextFlow BEDTools From line 26 of fasta/main.nf

"""
unmapped2anchors.py $bam | gzip > ${prefix}_anchors.qfa.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    find_circ: $VERSION
END_VERSIONS
"""

NextFlow From line 23 of anchors/main.nf

"""
grep CIRCULAR $bed | \
    grep -v chrM | \
    awk '\$5>=${bsj_reads}' | \
    grep UNAMBIGUOUS_BP | grep ANCHOR_UNIQUE | \
    maxlength.py 100000 \
    > ${prefix}.txt

tail -n +2 ${prefix}.txt | awk -v OFS="\t" '{print \$1,\$2,\$3,\$6,\$5}' > ${prefix}_find_circ.bed

awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_find_circ.bed > ${prefix}_find_circ_circs.bed

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    find_circ: $VERSION
END_VERSIONS
"""

NextFlow From line 25 of filter/main.nf

"""
INDEX=`find -L ./ -name "*.rev.1.bt2" | sed "s/.rev.1.bt2//"`
[ -z "\$INDEX" ] && INDEX=`find -L ./ -name "*.rev.1.bt2l" | sed "s/.rev.1.bt2l//"`
[ -z "\$INDEX" ] && echo "Bowtie2 index files not found" 1>&2 && exit 1

bowtie2 \\
    --threads $task.cpus \\
    --reorder \\
    --mm \\
    -D 20 \\
    --score-min=C,-15,0 \\
    -q \\
    -x \$INDEX \\
    -U $anchors | \\
    find_circ.py  --genome=$fasta --prefix=${prefix} --stats=${prefix}.sites.log --reads=${prefix}.sites.reads > ${prefix}.sites.bed

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//')
    find_circ: $VERSION
END_VERSIONS
"""

NextFlow Bowtie 2 From line 27 of find_circ/main.nf

"""
$handleGzip_R1

mapsplice.py \\
    -c $chromosomes \\
    -x $gtf_prefix \\
    -1 ${read1} \\
    -p ${task.cpus} \\
    --bam \\
    --gene-gtf $gtf \\
    -o $prefix \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    mapsplice: $VERSION
END_VERSIONS
"""

NextFlow MapSplice From line 31 of align/main.nf

"""
$handleGzip_R1
$handleGzip_R2

mapsplice.py \\
    -c $chromosomes \\
    -x $gtf_prefix \\
    -1 ${read1} \\
    -2 ${read2} \\
    -p ${task.cpus} \\
    --bam \\
    --gene-gtf $gtf \\
    -o $prefix \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    mapsplice: $VERSION
END_VERSIONS
"""

NextFlow MapSplice From line 54 of align/main.nf

"""
## reformat and sort miRanda, TargetScan outputs, convert to BED for overlaps.
tail -n +2 $targetscan | sort -k1,1 -k4n | awk -v OFS="\t" '{print \$1, \$2, \$4, \$5, \$9}' | awk -v OFS="\t" '{print \$2, \$3, \$4, \$1, "0", \$5}' > targetscan.bed
tail -n +2 $miranda | sort -k2,2 -k7n | awk -v OFS="\t" '{print \$2, \$1, \$3, \$4, \$7, \$8}' | awk -v OFS="\t" '{print \$2, \$5, \$6, \$1, \$3, \$4}' | sed 's/^[^-]*-//g' > miranda.bed

## intersect, consolidate miRanda, TargetScan information about miRs.
## -wa to output miRanda hits - targetscan makes it difficult to resolve duplicate miRNAs at MRE sites.
bedtools intersect -a miranda.bed -b targetscan.bed -wa > ${prefix}.mirnas.tmp
bedtools intersect -a targetscan.bed -b miranda.bed | awk '{print \$6}' > mirna_type

## remove duplicate miRNA entries at MRE sites.
## strategy: sory by circs, sort by start position, sort by site type - the goal is to take the best site type (i.e rank site type found at MRE site).
paste ${prefix}.mirnas.tmp mirna_type | sort -k3,3 -k2n -k7r | awk -v OFS="\t" '{print \$4,\$1,\$2,\$3,\$5,\$6,\$7}' | awk -F "\t" '{if (!seen[\$1,\$2,\$3,\$4,\$5,\$6]++)print}' | sort -k1,1 -k3n > ${prefix}.mirna_targets.tmp
echo -e "circRNA\tmiRNA\tStart\tEnd\tScore\tEnergy_KcalMol\tSite_type" | cat - ${prefix}.mirna_targets.tmp > ${prefix}.mirna_targets.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""

NextFlow BEDTools From line 23 of mirna_targets/main.nf

"""
check_samplesheet.py \\
    $samplesheet \\
    samplesheet.valid.csv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""

NextFlow From line 21 of local/samplesheet_check.nf

"""
grep ';C;' ${prefix}.sngl.bed | awk -v OFS="\t" '{print \$1,\$2,\$3,\$6}' | sort | uniq -c | awk -v OFS="\t" '{print \$2,\$3,\$4,\$5,\$1}' > ${prefix}_collapsed.bed

awk -v OFS="\t" -v BSJ=${bsj_reads} '{if(\$5>=BSJ) print \$0}' ${prefix}_collapsed.bed > ${prefix}_segemehl.bed

awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_segemehl.bed > ${prefix}_segemehl_circs.bed
"""

NextFlow From line 19 of filter/main.nf

"""
cat *.tab | awk -v BSJ=${bsj_reads} '(\$7 >= BSJ && \$6==0)' | cut -f1-6 | sort | uniq > dataset.SJ.out.tab
"""

NextFlow From line 15 of sjdb/main.nf

"""
for file in \$(ls *.gtf); do sample_id=\${file%".transcripts.gtf"}; touch samples.txt; printf "\$sample_id\t\$file\n" >> samples.txt ; done

prepDE.py -i samples.txt
"""

NextFlow From line 20 of prepde/main.nf

"""
bash ${workflow.projectDir}/bin/targetscan_format.sh $mature
"""

NextFlow From line 15 of database/main.nf

"""
##format for targetscan
cat $fasta | grep ">" | sed 's/>//g' > id
cat $fasta | grep -v ">" > seq
paste id seq | awk -v OFS="\t" '{print \$1, "0000", \$2}' > ${prefix}_ts.txt
# run targetscan
targetscan_70.pl mature.txt ${prefix}_ts.txt ${prefix}.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    targetscan: $VERSION
END_VERSIONS
"""

NextFlow From line 24 of predict/main.nf

"""
INDEX=`find -L ./ -name "*.rev.1.bt2" | sed "s/\\.rev.1.bt2\$//"`
[ -z "\$INDEX" ] && INDEX=`find -L ./ -name "*.rev.1.bt2l" | sed "s/\\.rev.1.bt2l\$//"`
[ -z "\$INDEX" ] && echo "Bowtie2 index files not found" 1>&2 && exit 1

bowtie2 \\
    -x \$INDEX \\
    $reads_args \\
    --threads $task.cpus \\
    $unaligned \\
    $args \\
    2> ${prefix}.bowtie2.log \\
    | samtools $samtools_command $args2 --threads $task.cpus -o ${prefix}.bam -

if [ -f ${prefix}.unmapped.fastq.1.gz ]; then
    mv ${prefix}.unmapped.fastq.1.gz ${prefix}.unmapped_1.fastq.gz
fi

if [ -f ${prefix}.unmapped.fastq.2.gz ]; then
    mv ${prefix}.unmapped.fastq.2.gz ${prefix}.unmapped_2.fastq.gz
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//')
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    pigz: \$( pigz --version 2>&1 | sed 's/pigz //g' )
END_VERSIONS
"""

NextFlow SAMtools Bowtie 2 From line 42 of align/main.nf

"""
mkdir bowtie2
bowtie2-build $args --threads $task.cpus $fasta bowtie2/${fasta.baseName}
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//')
END_VERSIONS
"""

NextFlow Bowtie 2 From line 22 of build/main.nf

"""
mkdir bowtie
bowtie-build --threads $task.cpus $fasta bowtie/${fasta.baseName}
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bowtie: \$(echo \$(bowtie --version 2>&1) | sed 's/^.*bowtie-align-s version //; s/ .*\$//')
END_VERSIONS
"""

NextFlow Bowtie 2 From line 22 of build/main.nf

"""
mkdir bwa
bwa \\
    index \\
    $args \\
    -p bwa/${fasta.baseName} \\
    $fasta

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bwa: \$(echo \$(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*\$//')
END_VERSIONS
"""

NextFlow BWA From line 22 of index/main.nf

"""
mkdir bwa

touch bwa/genome.amb
touch bwa/genome.ann
touch bwa/genome.bwt
touch bwa/genome.pac
touch bwa/genome.sa

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bwa: \$(echo \$(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*\$//')
END_VERSIONS
"""

NextFlow BWA From line 37 of index/main.nf

"""
cat ${readList.join(' ')} > ${prefix}.merged.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
END_VERSIONS
"""

NextFlow From line 26 of fastq/main.nf

"""
cat ${read1.join(' ')} > ${prefix}_1.merged.fastq.gz
cat ${read2.join(' ')} > ${prefix}_2.merged.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
END_VERSIONS
"""

NextFlow From line 40 of fastq/main.nf

"""
touch ${prefix}.merged.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
END_VERSIONS
"""

NextFlow From line 57 of fastq/main.nf

"""
touch ${prefix}_1.merged.fastq.gz
touch ${prefix}_2.merged.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
END_VERSIONS
"""

NextFlow From line 68 of fastq/main.nf

"""
CIRCexplorer2 \\
    annotate \\
    -r $gene_annotation \\
    -g $fasta \\
    -b $junctions \\
    -o ${prefix}.txt \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    circexplorer2: \$(echo \$(CIRCexplorer2 --version 2>&1) )
END_VERSIONS
"""

NextFlow circexplorer2 From line 25 of annotate/main.nf

"""
touch ${prefix}.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    circexplorer2: \$(echo \$(CIRCexplorer2 --version 2>&1) )
END_VERSIONS
"""

NextFlow From line 42 of annotate/main.nf

"""
CIRCexplorer2 \\
    parse \\
    $aligner \\
    $fusions \\
    -b ${prefix}.bed \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    circexplorer2: \$( echo \$(CIRCexplorer2 --version 2>&1) )
END_VERSIONS
"""

NextFlow circexplorer2 From line 25 of parse/main.nf

"""
touch ${prefix}.bed

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    circexplorer2: \$( echo \$(CIRCexplorer2 --version 2>&1) )
END_VERSIONS
"""

NextFlow From line 41 of parse/main.nf

"""
printf "%s %s\\n" $rename_to | while read old_name new_name; do
    [ -f "\${new_name}" ] || ln -s \$old_name \$new_name
done
fastqc $args --threads $task.cpus $renamed_files

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""

NextFlow FastQC From line 28 of fastqc/main.nf

"""
touch ${prefix}.html
touch ${prefix}.zip

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""

NextFlow FastQC From line 42 of fastqc/main.nf

"""
INDEX=`find -L ./ -name "*.1.ht2" | sed 's/\\.1.ht2\$//'`
hisat2 \\
    -x \$INDEX \\
    -U $reads \\
    $strandedness \\
    --known-splicesite-infile $splicesites \\
    --summary-file ${prefix}.hisat2.summary.log \\
    --threads $task.cpus \\
    $seq_center \\
    $unaligned \\
    $args \\
    | samtools view -bS -F 4 -F 256 - > ${prefix}.bam

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    hisat2: $VERSION
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow SAMtools HISAT2 From line 39 of align/main.nf

"""
INDEX=`find -L ./ -name "*.1.ht2" | sed 's/\\.1.ht2\$//'`
hisat2 \\
    -x \$INDEX \\
    -1 ${reads[0]} \\
    -2 ${reads[1]} \\
    $strandedness \\
    --known-splicesite-infile $splicesites \\
    --summary-file ${prefix}.hisat2.summary.log \\
    --threads $task.cpus \\
    $seq_center \\
    $unaligned \\
    --no-mixed \\
    --no-discordant \\
    $args \\
    | samtools view -bS -F 4 -F 8 -F 256 - > ${prefix}.bam

if [ -f ${prefix}.unmapped.fastq.1.gz ]; then
    mv ${prefix}.unmapped.fastq.1.gz ${prefix}.unmapped_1.fastq.gz
fi
if [ -f ${prefix}.unmapped.fastq.2.gz ]; then
    mv ${prefix}.unmapped.fastq.2.gz ${prefix}.unmapped_2.fastq.gz
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    hisat2: $VERSION
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow SAMtools HISAT2 From line 61 of align/main.nf

"""
mkdir hisat2
$extract_exons
hisat2-build \\
    -p $task.cpus \\
    $ss \\
    $exon \\
    $args \\
    $fasta \\
    hisat2/${fasta.baseName}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    hisat2: $VERSION
END_VERSIONS
"""

NextFlow HISAT2 From line 48 of build/main.nf

"""
miranda \\
    $mirbase \\
    $query \\
    $args \\
    -out ${prefix}.out

echo "miRNA\tTarget\tScore\tEnergy_KcalMol\tQuery_Start\tQuery_End\tSubject_Start\tSubject_End\tAln_len\tSubject_Identity\tQuery_Identity" > ${prefix}.txt
grep -A 1 "Scores for this hit:" ${prefix}.out | sort | grep ">"  | cut -c 2- | tr ' ' '\t' >> ${prefix}.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    miranda: \$(echo \$(miranda -v | sed -n 4p | sed 's/^.*miranda v//; s/microRNA.*\$//' ))
END_VERSIONS
"""

NextFlow miranda From line 24 of miranda/main.nf

"""
touch ${prefix}.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    miranda: \$(echo \$(miranda -v | sed -n 4p | sed 's/^.*miranda v//; s/microRNA.*\$//' ))
END_VERSIONS
"""

NextFlow From line 42 of miranda/main.nf

"""
multiqc \\
    --force \\
    $args \\
    $config \\
    $extra_config \\
    .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" )
END_VERSIONS
"""

NextFlow MultiQC From line 28 of multiqc/main.nf

"""
touch multiqc_data
touch multiqc_plots
touch multiqc_report.html

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" )
END_VERSIONS
"""

NextFlow MultiQC From line 43 of multiqc/main.nf

"""
samtools \\
    index \\
    -@ ${task.cpus-1} \\
    $args \\
    $input

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow SAMtools From line 24 of index/main.nf

"""
touch ${input}.bai
touch ${input}.crai
touch ${input}.csi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow From line 38 of index/main.nf

"""
samtools sort $args -@ $task.cpus -o ${prefix}.bam -T $prefix $bam
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow SAMtools From line 25 of sort/main.nf

"""
touch ${prefix}.bam

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow From line 35 of sort/main.nf

"""
samtools \\
    view \\
    --threads ${task.cpus-1} \\
    ${reference} \\
    ${readnames} \\
    $args \\
    -o ${prefix}.${file_type} \\
    $input \\
    $args2

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow SAMtools From line 38 of view/main.nf

"""
touch ${prefix}.bam
touch ${prefix}.cram

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""

NextFlow From line 57 of view/main.nf

"""
mkdir -p $prefix

segemehl.x \\
    -t $task.cpus \\
    -d $fasta \\
    -i $index \\
    $reads \\
    $args \\
    -o ${prefix}/${prefix}.${suffix}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    segemehl: \$(echo \$(segemehl.x 2>&1 | grep "ge5dee" | awk -F Z '{print substr(\$1, 2, 6)}' ))
END_VERSIONS
"""

NextFlow From line 27 of align/main.nf

"""
mkdir -p $prefix
touch ${prefix}/${prefix}.${suffix}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    segemehl: \$(echo \$(segemehl.x 2>&1 | grep "ge5dee" | awk -F Z '{print substr(\$1, 2, 6)}' ))
END_VERSIONS
"""

NextFlow From line 47 of align/main.nf

"""
segemehl.x \\
    -t $task.cpus \\
    -d $fasta \\
    -x ${prefix}.idx \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    segemehl: \$(echo \$(segemehl.x 2>&1 | grep "ge5dee" | awk -F Z '{print substr(\$1, 2, 6)}' ))
END_VERSIONS
"""

NextFlow From line 23 of index/main.nf

"""
touch ${prefix}.idx

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    segemehl: \$(echo \$(segemehl.x 2>&1 | grep "ge5dee" | awk -F Z '{print substr(\$1, 2, 6)}' ))
END_VERSIONS
"""

NextFlow From line 38 of index/main.nf

"""
STAR \\
    --genomeDir $index \\
    --readFilesIn $reads  \\
    --runThreadN $task.cpus \\
    --outFileNamePrefix $prefix. \\
    $out_sam_type \\
    $ignore_gtf \\
    $seq_center \\
    $args

$mv_unsorted_bam

if [ -f ${prefix}.Unmapped.out.mate1 ]; then
    mv ${prefix}.Unmapped.out.mate1 ${prefix}.unmapped_1.fastq
    gzip ${prefix}.unmapped_1.fastq
fi
if [ -f ${prefix}.Unmapped.out.mate2 ]; then
    mv ${prefix}.Unmapped.out.mate2 ${prefix}.unmapped_2.fastq
    gzip ${prefix}.unmapped_2.fastq
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow STAR From line 44 of align/main.nf

"""
touch ${prefix}Xd.out.bam
touch ${prefix}.Log.final.out
touch ${prefix}.Log.out
touch ${prefix}.Log.progress.out
touch ${prefix}.sortedByCoord.out.bam
touch ${prefix}.toTranscriptome.out.bam
touch ${prefix}.Aligned.unsort.out.bam
touch ${prefix}.unmapped_1.fastq.gz
touch ${prefix}.unmapped_2.fastq.gz
touch ${prefix}.tab
touch ${prefix}.Chimeric.out.junction
touch ${prefix}.out.sam

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow From line 76 of align/main.nf

"""
mkdir star
STAR \\
    --runMode genomeGenerate \\
    --genomeDir star/ \\
    --genomeFastaFiles $fasta \\
    --sjdbGTFfile $gtf \\
    --runThreadN $task.cpus \\
    $memory \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow STAR From line 26 of genomegenerate/main.nf

"""
samtools faidx $fasta
NUM_BASES=`gawk '{sum = sum + \$2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' ${fasta}.fai`

mkdir star
STAR \\
    --runMode genomeGenerate \\
    --genomeDir star/ \\
    --genomeFastaFiles $fasta \\
    --sjdbGTFfile $gtf \\
    --runThreadN $task.cpus \\
    --genomeSAindexNbases \$NUM_BASES \\
    $memory \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow SAMtools STAR From line 45 of genomegenerate/main.nf

"""
mkdir star
touch star/Genome
touch star/Log.out
touch star/SA
touch star/SAindex
touch star/chrLength.txt
touch star/chrName.txt
touch star/chrNameLength.txt
touch star/chrStart.txt
touch star/exonGeTrInfo.tab
touch star/exonInfo.tab
touch star/geneInfo.tab
touch star/genomeParameters.txt
touch star/sjdbInfo.txt
touch star/sjdbList.fromGTF.out.tab
touch star/sjdbList.out.tab
touch star/transcriptInfo.tab

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""

NextFlow From line 70 of genomegenerate/main.nf

"""
stringtie \\
    $bam \\
    $strandedness \\
    $reference \\
    -o ${prefix}.transcripts.gtf \\
    -A ${prefix}.gene.abundance.txt \\
    $coverage \\
    $ballgown \\
    -p $task.cpus \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    stringtie: \$(stringtie --version 2>&1)
END_VERSIONS
"""

NextFlow StringTie From line 37 of stringtie/main.nf

"""
touch ${prefix}.transcripts.gtf
touch ${prefix}.gene.abundance.txt
touch ${prefix}.coverage.gtf
touch ${prefix}.ballgown

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    stringtie: \$(stringtie --version 2>&1)
END_VERSIONS
"""

NextFlow From line 57 of stringtie/main.nf

"""
[ ! -f  ${prefix}.fastq.gz ] && ln -s $reads ${prefix}.fastq.gz
trim_galore \\
    $args \\
    --cores $cores \\
    --gzip \\
    ${prefix}.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    trimgalore: \$(echo \$(trim_galore --version 2>&1) | sed 's/^.*version //; s/Last.*\$//')
    cutadapt: \$(cutadapt --version)
END_VERSIONS
"""

NextFlow Trim_Galore From line 41 of trimgalore/main.nf

"""
[ ! -f  ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz
[ ! -f  ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz
trim_galore \\
    $args \\
    --cores $cores \\
    --paired \\
    --gzip \\
    ${prefix}_1.fastq.gz \\
    ${prefix}_2.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    trimgalore: \$(echo \$(trim_galore --version 2>&1) | sed 's/^.*version //; s/Last.*\$//')
    cutadapt: \$(cutadapt --version)
END_VERSIONS
"""