Analysis of long non-coding RNAs from RNA-seq datasets

public 1yr ago Version: dev 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

A Nextflow-based pipeline for comprehensive analyses of long non-coding RNAs from RNA-seq datasets

Introduction

Recently, long noncoding RNA molecules (lncRNA) captured widespread attentions for their critical roles in diverse biological process and important implications in variety of human diseases and cancers. Identification and profiling of lncRNAs is a fundamental step to advance our knowledge on their function and regulatory mechanisms. However, RNA sequencing based lncRNA discovery is currently limited due to complicated operations and implementation of the tools involved. Therefore, we present a one-stop multi-tool integrated pipeline called LncPipe focused on characterizing lncRNAs from raw transcriptome sequencing data. The pipeline was developed based on a popular workflow framework Nextflow , composed of four core procedures including reads alignment, assembly, identification and quantification. It contains various unique features such as well-designed lncRNAs annotation strategy, optimized calculating efficiency, diversified classification and interactive analysis report. LncPipe allows users additional control in interuppting the pipeline, resetting parameters from command line, modifying main script directly and resume analysis from previous checkpoint.

Documentation

The nf-core/lncpipe pipeline comes with documentation about the pipeline, found in the docs/ directory:

Installation
Pipeline configuration
- Local installation
- Adding your own system
Running the pipeline
Output and how to interpret the results
Run analysis for non-human species
Troubleshooting

Schematic diagram

workflow

Acknowledgment

Thanks to the author of AfterQC /fastp, Shifu Chen, for his help on providing a gzip output support to meet the require of LncPipe. Thanks to the internal test by Hongwan Zhang and Yan Wang from SYSUCC Cancer bioinformatics platform.

And also many thanks to the wonderful guys @apeltzer, @ewels and others from nf-core that help me to polish the code and structure of lncpipe.

Citation

For details of LncPipe, plz read the article beblow :happy:

Qi Zhao, Yu Sun, Dawei Wang, Hongwan Zhang, Kai Yu, Jian Zheng, Zhixiang Zuo. LncPipe: A Nextflow-based pipeline for identification and analysis of long non-coding RNAs from RNA-Seq data. J Genet Genomics. 2018 Jul 20;45(7):399-401

Code Snippets

'''
set -o pipefail
touch filenames.txt

perl -lpe 's/ ([^"]\\S+) ;/ "$1" ;/g' !{gencode_annotation_gtf} > gencode_annotation_gtf_mod.gtf 
perl -lpe 's/ ([^"]\\S+) ;/ "$1" ;/g' !{lncipedia_gtf} > lncipedia_mod.gtf 

echo  gencode_annotation_gtf_mod.gtf   >>filenames.txt
echo lncipedia_mod.gtf   >>filenames.txt


stringtie --merge -o merged_lncRNA.gtf  filenames.txt
cat gencode_annotation_gtf_mod.gtf   |grep "protein_coding" > gencode_protein_coding.gtf
gffcompare -r gencode_protein_coding.gtf -p !{task.cpus} merged_lncRNA.gtf
awk '$3 =="u"||$3=="x"{print $5}' gffcmp.merged_lncRNA.gtf.tmap |sort|uniq|perl !{baseDir}/bin/extract_gtf_by_name.pl merged_lncRNA.gtf - > merged.filter.gtf
mv  merged.filter.gtf known.lncRNA.gtf

'''

NextFlow StringTie gffcompare From line 208 of dev/main.nf

    '''
    set -o pipefail

    cuffmerge -o merged_lncRNA  !{lncRNA_gtflistfile}
    cat !{gencode_annotation_gtf} |grep "protein_coding" > gencode_protein_coding.gtf
    cuffcompare -o merged_lncRNA -r gencode_protein_coding.gtf -p !{task.cpus} merged_lncRNA/merged.gtf
    awk '$3 =="u"||$3=="x"{print $5}' merged_lncRNA/merged_lncRNA.merged.gtf.tmap  |sort|uniq|perl !{baseDir}/bin/extract_gtf_by_name.pl merged_lncRNA/merged.gtf - > merged.filter.gtf
    mv  merged.filter.gtf known.lncRNA.gtf

'''

NextFlow From line 228 of dev/main.nf

"""
    mkdir star_index
    STAR \
        --runMode genomeGenerate \
        --runThreadN ${star_threads} \
        --sjdbGTFfile $gencode_annotation_gtf \
        --sjdbOverhang 149 \
        --genomeDir star_index/ \
        --genomeFastaFiles $fasta
    """

NextFlow STAR From line 276 of dev/main.nf

"""
    bowtie2-build !{fasta} genome_bt2
    """

NextFlow Bowtie 2 From line 303 of dev/main.nf

"""
    #for human genome it will take more than 160GB memory and take really  long time (6 more hours), thus we recommand to down pre-build genome from hisat website
    extract_splice_sites.py !{gencode_annotation_gtf} >genome_ht2.ss
    extract_exons.py !{gencode_annotation_gtf} > genome_ht2.exon
    hisat2-build -p !{hisat2_index_threads} --ss genome_ht2.ss --exo genome_ht2.exon !{fasta} genome_ht2
    """

NextFlow HISAT2 From line 325 of dev/main.nf

    '''
    fastqc -t !{task.cpus} !{fastq_file[0]} !{fastq_file[1]}
'''

NextFlow FastQC From line 365 of dev/main.nf

    '''
after.py -z -1 !{fastq_file[0]} -g ./
'''

NextFlow From line 391 of dev/main.nf

    '''
after.py -z -1 !{fastq_file[0]} -2 !{fastq_file[1]} -g ./
'''

NextFlow From line 395 of dev/main.nf

    '''
fastp -i !{fastq_file[0]} -o !{samplename}.qc.gz -h !{samplename}_fastp.html

'''

NextFlow fastp From line 424 of dev/main.nf

    '''
fastp -i !{fastq_file[0]}  -I !{fastq_file[1]} -o !{samplename}_1.qc.fq.gz  -O !{samplename}_2.qc.fq.gz -h !{samplename}_fastp.html
'''

NextFlow fastp From line 429 of dev/main.nf

"""
         STAR --runThreadN !{task.cpus} \
            --twopassMode Basic \
            --genomeDir !{star_index} \
            --readFilesIn !{pair} \
            --readFilesCommand zcat \
            --outSAMtype BAM SortedByCoordinate \
            --chimSegmentMin 20 \
            --outFilterIntronMotifs RemoveNoncanonical \
            --outFilterMultimapNmax 20 \
            --alignIntronMin 20 \
            --alignIntronMax 1000000 \
            --alignMatesGapMax 1000000 \
            --outFilterType BySJout \
            --alignSJoverhangMin 8 \
            --alignSJDBoverhangMin 1 \
            --outFileNamePrefix !{file_tag_new} 
    """

NextFlow STAR From line 470 of dev/main.nf

'''
            STAR --runThreadN !{task.cpus}  \
                 --twopassMode Basic --genomeDir !{star_index} \
                 --readFilesIn !{pair[0]} !{pair[1]} \
                 --readFilesCommand zcat \
                 --outSAMtype BAM SortedByCoordinate \
                 --chimSegmentMin 20 \
                 --outFilterIntronMotifs RemoveNoncanonical \
                 --outFilterMultimapNmax 20 \
                 --alignIntronMin 20 \
                 --alignIntronMax 1000000 \
                 --alignMatesGapMax 1000000 \
                 --outFilterType BySJout \
                 --alignSJoverhangMin 8 \
                 --alignSJDBoverhangMin 1 \
                 --outFileNamePrefix !{file_tag_new} 
    '''

NextFlow STAR From line 490 of dev/main.nf

'''
         tophat -p !{task.cpus} -G !{gtf} -–no-novel-juncs -o !{samplename}_thout --library-type !{strand_str} !{index_base} !{pair} 

'''

NextFlow From line 541 of dev/main.nf

'''
     tophat -p !{task.cpus} -G !{gtf} -–no-novel-juncs -o !{samplename}_thout --library-type !{strand_str} !{index_base} !{pair[0]} !{pair[1]} 
'''

NextFlow From line 547 of dev/main.nf

    '''
   mkdir tmp
   hisat2  -p !{task.cpus} --dta  -x  !{index_base}  -U !{pair}  -S !{file_tag_new}.sam 2>!{file_tag_new}.hisat2_summary.txt
  sambamba view -S -f bam -t !{task.cpus} !{file_tag_new}.sam -o temp.bam 
  sambamba sort -o !{file_tag_new}.sort.bam --tmpdir ./tmp -t !{task.cpus} temp.bam
  rm !{file_tag_new}.sam
  rm temp.bam

'''

NextFlow HISAT2 Sambamba From line 580 of dev/main.nf

    '''
    mkdir tmp
  hisat2  -p !{task.cpus} --dta  -x  !{index_base}  -1 !{pair[0]}  -2 !{pair[1]}  -S !{file_tag_new}.sam 2> !{file_tag_new}.hisat2_summary.txt
  sambamba view -S -f bam -t !{hisat2_threads} !{file_tag_new}.sam -o temp.bam
  sambamba sort -o !{file_tag_new}.sort.bam --tmpdir ./tmp -t !{task.cpus} temp.bam
  rm !{file_tag_new}.sam
'''

NextFlow HISAT2 Sambamba From line 591 of dev/main.nf

    '''
   mkdir tmp
   hisat2  -p !{task.cpus} --dta --rna-strandness !{params.hisat_strand} -x  !{index_base}  -U !{pair}  -S !{file_tag_new}.sam 2>!{file_tag_new}.hisat2_summary.txt
  sambamba view -S -f bam -t !{hisat2_threads} !{file_tag_new}.sam -o temp.bam 
  sambamba sort -o !{file_tag_new}.sort.bam --tmpdir ./tmp -t !{hisat2_threads} temp.bam
  rm !{file_tag_new}.sam
  rm temp.bam

'''

NextFlow HISAT2 Sambamba From line 602 of dev/main.nf

    '''
 mkdir tmp
  hisat2  -p !{task.cpus} --dta --rna-strandness !{params.hisat_strand} -x  !{index_base}  -1 !{pair[0]}  -2 !{pair[1]}  -S !{file_tag_new}.sam 2> !{file_tag_new}.hisat2_summary.txt
  sambamba view -S -f bam -t !{task.cpus} !{file_tag_new}.sam -o temp.bam
  sambamba sort -o !{file_tag_new}.sort.bam --tmpdir ./tmp -t !{task.cpus} temp.bam
  rm !{file_tag_new}.sam
'''

NextFlow HISAT2 Sambamba From line 613 of dev/main.nf

    '''
#run stringtie
stringtie -p !{task.cpus} -G !{gencode_annotation_gtf} -l stringtie_!{file_tag_new} -o stringtie_!{file_tag_new}_transcripts.gtf !{alignment_bam}
'''

NextFlow StringTie From line 647 of dev/main.nf

    '''
#run stringtie
stringtie -p !{task.cpus} -G !{gencode_annotation_gtf} --rf -l stringtie_!{file_tag_new} -o stringtie_!{file_tag_new}_transcripts.gtf !{alignment_bam}
'''

NextFlow StringTie From line 652 of dev/main.nf

'''
stringtie --merge -p !{task.cpus} -o merged.gtf !{gtf_filenames}


'''

NextFlow StringTie From line 683 of dev/main.nf

    '''
#run cufflinks

cufflinks -g !{gencode_annotation_gtf} \
          -b !{fasta} \
          --library-type !{strand_str}\
          --max-multiread-fraction 0.25 \
          --3-overhang-tolerance 2000 \
          -o Cufout_!{file_tag_new} \
          -p !{task.cpus} !{alignment_bam}

mv Cufout_!{file_tag_new}/transcripts.gtf Cufout_!{file_tag_new}_transcripts.gtf
'''

NextFlow Cufflinks From line 711 of dev/main.nf

    '''
#run cufflinks

cufflinks -g !{gencode_annotation_gtf} \
          -b !{fasta} \
          --library-type !{strand_str} \
          --max-multiread-fraction 0.25 \
          --3-overhang-tolerance 2000 \
          -o Cufout_!{file_tag_new} \
          -p !{task.cpus} !{alignment_bam}

mv Cufout_!{file_tag_new}/transcripts.gtf Cufout_!{file_tag_new}_transcripts.gtf
'''

NextFlow Cufflinks From line 726 of dev/main.nf

'''
mkdir CUFFMERGE
cuffmerge -o CUFFMERGE \
          -s !{fasta} \
          -p !{task.cpus} \
             !{gtf_filenames}

'''

NextFlow From line 771 of dev/main.nf

    '''
    fastqc -t !{task.cpus} !{fastq_file[0]} !{fastq_file[1]}
'''

NextFlow FastQC From line 817 of dev/main.nf

    '''
after.py -z -1 !{fastq_file[0]} -g ./
'''

NextFlow From line 845 of dev/main.nf

    '''
after.py -z -1 !{fastq_file[0]} -2 !{fastq_file[1]} -g ./
'''

NextFlow From line 849 of dev/main.nf

    '''
fastp -i !{fastq_file[0]} -o !{samplename}.qc.gz -h !{samplename}_fastp.html

'''

NextFlow fastp From line 878 of dev/main.nf

    '''
fastp -i !{fastq_file[0]}  -I !{fastq_file[1]} -o !{samplename}_1.qc.fq.gz  -O !{samplename}_2.qc.fq.gz -h !{samplename}_fastp.html
'''

NextFlow fastp From line 883 of dev/main.nf

'''
    #!/bin/sh
    gffcompare -r !{gencode_annotation_gtf} -p !{task.cpus} !{mergeGtfFile} -o merged_lncRNA
    '''

NextFlow gffcompare From line 914 of dev/main.nf

'''
    # filtering novel lncRNA based on cuffmerged trascripts
    awk '$3 =="x"||$3=="u"||$3=="i"{print $0}' !{comparedTmap} > novel.gtf.tmap
    #   excluding length smaller than 200 nt
    awk '$10 >200{print}' novel.gtf.tmap > novel.longRNA.gtf.tmap
    #   extract gtf
    awk '{print $5}' novel.longRNA.gtf.tmap |perl !{baseDir}/bin/extract_gtf_by_name.pl !{mergedGTF} - >novel.longRNA.gtf
    awk '{if($3=="exon"){print $0}}' novel.longRNA.gtf > novel.longRNA.format.gtf 
    perl !{baseDir}/bin/get_exoncount.pl novel.longRNA.format.gtf  > novel.longRNA.exoncount.txt
    # gtf2gff3
    #check whether required
    # get fasta from gtf
    gffread novel.longRNA.gtf -g !{fasta} -w novel.longRNA.fa -W
 '''

NextFlow gffread From line 938 of dev/main.nf

    '''
        PLEK.py -fasta !{novel_lncRNA_fasta} \
                                   -out novel.longRNA.PLEK.out \
                                   -thread !{task.cpus}
	    exit 0
        '''

NextFlow From line 968 of dev/main.nf

'''
cpat.py -g !{novel_lncRNA_fasta} \
                               -x !{baseDir}/bin/cpat_model/Human_Hexamer.tsv \
                               -d !{baseDir}/bin/cpat_model/Human_logitModel.RData \
                               -o novel.longRNA.CPAT.out
'''

NextFlow From line 983 of dev/main.nf

'''
cpat.py -g !{novel_lncRNA_fasta} \
                               -x !{baseDir}/bin/cpat_model/Mouse_Hexamer.tsv \
                               -d !{baseDir}/bin/cpat_model/Mouse_logitModel.RData \
                               -o novel.longRNA.CPAT.out
'''

NextFlow From line 990 of dev/main.nf

'''
cpat.py -g !{novel_lncRNA_fasta} \
                               -x !{baseDir}/bin/cpat_model/zebrafish_Hexamer.tsv \
                               -d !{baseDir}/bin/cpat_model/zebrafish_logitModel.RData \
                               -o novel.longRNA.CPAT.out
'''

NextFlow From line 998 of dev/main.nf

'''
cpat.py -g !{novel_lncRNA_fasta} \
                               -x !{baseDir}/bin/cpat_model/fly_Hexamer.tsv \
                               -d !{baseDir}/bin/cpat_model/fly_logitModel.RData \
                               -o novel.longRNA.CPAT.out
'''

NextFlow From line 1005 of dev/main.nf

'''
    #merged transcripts
    perl !{baseDir}/bin/integrate_novel_transcripts.pl > novel.longRNA.txt
    awk '$4 >1{print $1}' novel.longRNA.txt|perl !{baseDir}/bin/extract_gtf_by_name.pl !{cuffmergegtf} - > novel.longRNA.stringent.gtf
    # retain lncRNA only by coding ability
    awk '$4 >1&&$5=="lncRNA"{print $1}' novel.longRNA.txt|perl !{baseDir}/bin/extract_gtf_by_name.pl !{cuffmergegtf} - > novel.lncRNA.stringent.gtf
    awk '$4 >1&&$5=="TUCP"{print $1}' novel.longRNA.txt|perl !{baseDir}/bin/extract_gtf_by_name.pl !{cuffmergegtf} - > novel.TUCP.stringent.gtf
    '''

NextFlow From line 1034 of dev/main.nf

'''
gffcompare -G -o filter \
            -r !{knowlncRNAgtf} \
            -p !{task.cpus} !{novel_lncRNA_stringent_Gtf}
awk '$3 =="u"||$3=="x"{print $5}' filter.novel.lncRNA.stringent.gtf.tmap |sort|uniq| \
            perl !{baseDir}/bin/extract_gtf_by_name.pl !{novel_lncRNA_stringent_Gtf} - > novel.lncRNA.stringent.filter.gtf

#rename lncRNAs according to neighbouring protein coding genes
awk '$3 =="gene"{print }' !{gencode_protein_coding_gtf} | perl -F'\\t' -lane '$F[8]=~/gene_id "(.*?)";/ && print join qq{\\t},@F[0,3,4],$1,@F[5,6,1,2,7,8,9]' - | \
    sort-bed - > gencode.protein_coding.gene.bed
gtf2bed < novel.lncRNA.stringent.filter.gtf |sort-bed - > novel.lncRNA.stringent.filter.bed
gtf2bed < !{knowlncRNAgtf} |sort-bed - > known.lncRNA.bed

perl !{baseDir}/bin/rename_lncRNA_2.pl gencode_annotation_gtf_mod.gtf lncipedia_mod.gtf 
# mv lncRNA.final.v2.gtf all_lncRNA_for_classifier.gtf
grep -v NA-1-1 lncRNA.final.v2.gtf > all_lncRNA_for_classifier.gtf
perl !{baseDir}/bin/rename_proteincoding.pl !{gencode_protein_coding_gtf}> protein_coding.final.gtf
cat all_lncRNA_for_classifier.gtf protein_coding.final.gtf > final_all.gtf
gffread final_all.gtf -g !{fasta} -w final_all.fa -W
gffread all_lncRNA_for_classifier.gtf -g !{fasta} -w lncRNA.fa -W
gffread protein_coding.final.gtf -g !{fasta} -w protein_coding.fa -W
#classification 
perl !{baseDir}/bin/lincRNA_classification.pl all_lncRNA_for_classifier.gtf !{gencode_protein_coding_gtf} lncRNA_classification.txt 


'''

NextFlow gffread GFFutils gffcompare From line 1076 of dev/main.nf

'''
gffcompare -G -o filter \
            -r !{knowlncRNAgtf} \
            -p !{task.cpus} !{novel_lncRNA_stringent_Gtf}
awk '$3 =="u"||$3=="x"{print $5}' filter.novel.lncRNA.stringent.gtf.tmap |sort|uniq| \
            perl !{baseDir}/bin/extract_gtf_by_name.pl !{novel_lncRNA_stringent_Gtf} - > novel.lncRNA.stringent.filter.gtf

#rename lncRNAs according to neighbouring protein coding genes
awk '$3 =="gene"{print }' !{gencode_protein_coding_gtf} | perl -F'\\t' -lane '$F[8]=~/gene_id "(.*?)";/ && print join qq{\\t},@F[0,3,4],$1,@F[5,6,1,2,7,8,9]' - | \
    sort-bed - > gencode.protein_coding.gene.bed
gtf2bed < novel.lncRNA.stringent.filter.gtf |sort-bed - > novel.lncRNA.stringent.filter.bed
gtf2bed < !{knowlncRNAgtf} |sort-bed - > known.lncRNA.bed
perl !{baseDir}/bin/rename_lncRNA_2.pl non_human_mod.gtf
# mv lncRNA.final.v2.gtf all_lncRNA_for_classifier.gtf
grep -v NA-1-1 lncRNA.final.v2.gtf > all_lncRNA_for_classifier.gtf
perl !{baseDir}/bin/rename_proteincoding.pl !{gencode_protein_coding_gtf}> protein_coding.final.gtf
cat all_lncRNA_for_classifier.gtf protein_coding.final.gtf > final_all.gtf
gffread final_all.gtf -g !{fasta} -w final_all.fa -W
gffread all_lncRNA_for_classifier.gtf -g !{fasta} -w lncRNA.fa -W
gffread protein_coding.final.gtf -g !{fasta} -w protein_coding.fa -W
#classification 
perl !{baseDir}/bin/lincRNA_classification.pl all_lncRNA_for_classifier.gtf !{gencode_protein_coding_gtf} lncRNA_classification.txt 


'''

NextFlow gffread GFFutils gffcompare From line 1103 of dev/main.nf

'''
cpat.py -g !{lncRNA_final_cpat_fasta} \
                               -x !{baseDir}/bin/cpat_model/Human_Hexamer.tsv \
                               -d !{baseDir}/bin/cpat_model/Human_logitModel.RData \
                               -o lncRNA.final.CPAT.out
'''

NextFlow From line 1144 of dev/main.nf

'''
cpat.py -g !{lncRNA_final_cpat_fasta} \
                               -x !{baseDir}/bin/cpat_model/Mouse_Hexamer.tsv \
                               -d !{baseDir}/bin/cpat_model/Mouse_logitModel.RData \
                               -o lncRNA.final.CPAT.out
'''

NextFlow From line 1151 of dev/main.nf

'''
cpat.py -g !{lncRNA_final_cpat_fasta} \
                               -x !{baseDir}/bin/cpat_model/zebrafish_Hexamer.tsv \
                               -d !{baseDir}/bin/cpat_model/zebrafish_logitModel.RData \
                               -o lncRNA.final.CPAT.out
'''

NextFlow From line 1159 of dev/main.nf

'''
cpat.py -g !{lncRNA_final_cpat_fasta} \
                               -x !{baseDir}/bin/cpat_model/fly_Hexamer.tsv \
                               -d !{baseDir}/bin/cpat_model/fly_logitModel.RData \
                               -o lncRNA.final.CPAT.out
'''

NextFlow From line 1166 of dev/main.nf

'''
    cpat.py -g !{final_coding_gene_for_CPAT} \
                                   -x !{baseDir}/bin/cpat_model/Human_Hexamer.tsv \
                                   -d !{baseDir}/bin/cpat_model/Human_logitModel.RData \
                                   -o protein_coding.final.CPAT.out
    '''

NextFlow From line 1181 of dev/main.nf

'''
    #!/usr/bin/perl -w
    #since CPAT arbitrarily transforms gene names into upper case, we apply 'uc' function to keep the genenames' consistency.  
    use strict;
    open OUT,">basic_charac.txt" or die;

    open FH,"all_lncRNA_for_classifier.gtf" or die;

    my %class;
    my %g2t;
    my %trans_len;
    my %exon_num;
    while(<FH>){
    chomp;
    my @field=split "\t";
    $_=~/gene_id "(.+?)"/;
    my $gid=$1;
    $_=~/transcript_id "(.+?)"/;
    my $tid=uc($1);
    $class{$tid}=$field[1];
    $g2t{$tid}=$gid;
    my $len=$field[4]-$field[3];
    $trans_len{$tid}=(exists $trans_len{$tid})?$trans_len{$tid}+$len:$len;
    $exon_num{$tid}=(exists $exon_num{$tid})?$exon_num{$tid}+1:1;
    }
    open FH,"protein_coding.final.gtf" or die;

    while(<FH>){
    chomp;
    my @field=split "\t";
    $_=~/gene_id "(.+?)"/;
    my $gid=uc($1);
    $_=~/transcript_id "(.+?)"/;
    my $tid=$1;
    $class{$tid}="protein_coding";
    $g2t{$tid}=$gid;
    my $len=$field[4]-$field[3];
    $trans_len{$tid}=(exists $trans_len{$tid})?$trans_len{$tid}+$len:$len;
    $exon_num{$tid}=(exists $exon_num{$tid})?$exon_num{$tid}+1:1;
    }

    my %lin_class;
    open IN,"lncRNA_classification.txt" or die;                 #change the file name
    while(<IN>){
    chomp;
    my @data = split /\\t/,$_;
    $lin_class{$data[0]} = $data[1];
    }
    open FH,"lncRNA.final.CPAT.out" or die;

    <FH>;

    while(<FH>){
        chomp;
        my @field=split "\t";
        my $tid=uc($field[0]);
        my $class;
        if (defined($lin_class{$tid})){
            $class = $lin_class{$tid};
        }else{
            $class = 'NA';
        }
        print OUT $g2t{$tid}."\t".$tid."\t".$class{$tid}."\t".$field[5]."\t".$trans_len{$tid}."\t".$exon_num{$tid}."\t".$class."\n";
    }

    open FH,"protein_coding.final.CPAT.out" or die;

    <FH>;

    while(<FH>){
        chomp;
        my @field=split "\t";
        my $tid=uc($field[0]);
        my $class;
        if (defined($lin_class{$tid})){
            $class = $lin_class{$tid};
        }else{
            $class = 'protein_coding';
        }
        print OUT $g2t{$tid}."\t".$tid."\t".$class{$tid}."\t".$field[5]."\t".$trans_len{$tid}."\t".$exon_num{$tid}."\t".$class."\n";
     }

'''

NextFlow From line 1201 of dev/main.nf

'''
sambamba view !{bamfile} > !{samplename}.sam # resolved error caused by bam and htseq version conflicts 
htseq-count -t exon -i gene_id -s no -r pos -f sam !{samplename}.sam !{final_gtf} > !{samplename}.htseq.count 
rm !{samplename}.sam
'''

NextFlow Sambamba htseqcount HTSeq From line 1310 of dev/main.nf

'''
sambamba view !{bamfile} > !{samplename}.sam # resolved error caused by bam and htseq version conflicts 
htseq-count -t exon -i gene_id -r pos -f sam !{samplename}.sam !{final_gtf} > !{samplename}.htseq.count 
rm !{samplename}.sam
'''

NextFlow Sambamba htseqcount HTSeq From line 1316 of dev/main.nf

'''
#index kallisto reference 
kallisto index -i transcripts.idx !{transript_fasta}

'''

NextFlow kallisto From line 1336 of dev/main.nf

'''
#quantification by kallisto in single end mode
kallisto quant -i !{kallistoIndex} -o !{file_tag_new}_kallisto -t !{task.cpus} -b 100 --single -l 180 -s 20  !{pair} 
mv !{file_tag_new}_kallisto/abundance.tsv !{file_tag_new}_abundance.tsv
'''

NextFlow Quant kallisto From line 1361 of dev/main.nf

'''
#quantification by kallisto 
kallisto quant -i !{kallistoIndex} -o !{file_tag_new}_kallisto -t !{task.cpus} -b 100 !{pair[0]} !{pair[1]}
mv !{file_tag_new}_kallisto/abundance.tsv !{file_tag_new}_abundance.tsv
'''

NextFlow Quant kallisto From line 1370 of dev/main.nf

        '''
#index kallisto reference 
kallisto index -i transcripts.idx !{transript_fasta}

'''

NextFlow kallisto From line 1396 of dev/main.nf

        '''
#quantification by kallisto in single end mode
kallisto quant -i !{kallistoIndex} -o !{file_tag_new}_kallisto -t !{task.cpus} -b 100 --single -l 180 -s 20 !{pair} 
mv !{file_tag_new}_kallisto/abundance.tsv !{file_tag_new}_abundance.tsv

'''

NextFlow Quant kallisto From line 1421 of dev/main.nf

        '''
#quantification by kallisto 
kallisto quant -i !{kallistoIndex} -o !{file_tag_new}_kallisto -t !{task.cpus} -b 100 !{pair[0]} !{pair[1]}
mv !{file_tag_new}_kallisto/abundance.tsv !{file_tag_new}_abundance.tsv
'''

NextFlow Quant kallisto From line 1431 of dev/main.nf

'''
perl !{baseDir}/bin/get_map_table.pl  final_all.gtf  > map.file
R CMD BATCH !{baseDir}/bin/get_htseq_matrix.R
'''

NextFlow From line 1459 of dev/main.nf

'''
perl !{baseDir}/bin/get_map_table.pl  --gtf_file=final_all.gtf  > map.file
R CMD BATCH !{baseDir}/bin/get_kallisto_matrix.R
'''

NextFlow From line 1478 of dev/main.nf

    """
 Rscript -e "library(LncPipeReporter);run_reporter(input='.', output = 'reporter.html',output_dir='./LncPipeReports',de.method=\'${detools}\',theme = 'npg',cdf.percent = ${lncRep_cdf_percent},max.lncrna.len = ${lncRep_max_lnc_len},min.expressed.sample = ${lncRep_min_expressed_sample}, ask = FALSE)"
"""

NextFlow From line 1519 of dev/main.nf

"""
perl -F':|,' -lanE'BEGIN{say qq{SampleID\tcondition}} $del = shift @F; say qq{$_\t$del} for @F' ${design}  > design.matrix 
Rscript -e "library(LncPipeReporter);run_reporter(input='.', output = 'reporter.html',output_dir='./LncPipeReports',de.method=\'${detools}\',theme = 'npg',cdf.percent = ${lncRep_cdf_percent},max.lncrna.len = ${lncRep_max_lnc_len},min.expressed.sample = ${lncRep_min_expressed_sample}, ask = FALSE)"
"""

NextFlow From line 1540 of dev/main.nf

"""
 Rscript -e "library(LncPipeReporter);run_reporter(input='.', output = 'reporter.html',output_dir='./LncPipeReports',de.method=\'${detools}\',theme = 'npg',cdf.percent = ${lncRep_cdf_percent},max.lncrna.len = ${lncRep_max_lnc_len},min.expressed.sample = ${lncRep_min_expressed_sample}, ask = FALSE)"
"""

NextFlow From line 1565 of dev/main.nf

"""
 Rscript -e "library(LncPipeReporter);run_reporter(input='.', output = 'reporter.html',output_dir='./LncPipeReports',de.method=\'${detools}\',theme = 'npg',cdf.percent = ${lncRep_cdf_percent},max.lncrna.len = ${lncRep_max_lnc_len},min.expressed.sample = ${lncRep_min_expressed_sample}, ask = FALSE)"
"""