RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.

public public 1yr ago Version: 3.12.0 0 bookmarks
Loading...

Introduction

nf-core/rnaseq is a bioinformatics pipeline that can be used to analyse RNA sequencing data obtained from organisms with a reference genome and annotation. It takes a samplesheet and FASTQ files as input, performs quality control (QC), trimming and (pseudo-)alignment, and produces a gene expression matrix and extensive QC report.

nf-core/rnaseq metro map

  1. Merge re-sequenced FastQ files ( cat )

  2. Sub-sample FastQ files and auto-infer strandedness ( fq , Salmon )

  3. Read QC ( FastQC )

  4. UMI extraction ( UMI-tools )

  5. Adapter and quality trimming ( Trim Galore! )

  6. Removal of genome contaminants ( BBSplit )

  7. Removal of ribosomal RNA ( SortMeRNA )

  8. Choice of multiple alignment and quantification routes:

    1. STAR -> Salmon

    2. STAR -> RSEM

    3. HiSAT2 -> NO QUANTIFICATION

  9. Sort and index alignments ( SAMtools )

  10. UMI-based deduplication ( UMI-tools )

  11. Duplicate read marking ( picard MarkDuplicates )

  12. Transcript assembly and quantification ( StringTie )

  13. Create bigWig coverage files ( BEDTools , bedGraphToBigWig )

  14. Extensive quality control:

    1. RSeQC

    2. Qualimap

    3. dupRadar

    4. Preseq

    5. DESeq2

  15. Pseudo-alignment and quantification ( Salmon ; optional )

  16. Present QC for raw read, alignment, gene biotype, sample similarity, and strand-specificity checks ( MultiQC , R )

Note The SRA download functionality has been removed from the pipeline ( >=3.2 ) and ported to an independent workflow called nf-core/fetchngs . You can provide --nf_core_pipeline rnaseq when running nf-core/fetchngs to download and auto-create a samplesheet containing publicly available samples that can be accepted directly as input by this pipeline.

Warning Quantification isn't performed if using --aligner hisat2 due to the lack of an appropriate option to calculate accurate expression estimates from HISAT2 derived genomic alignments. However, you can use this route if you have a preference for the alignment, QC and other types of downstream analysis compatible with the output of HISAT2.

Usage

Note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, prepare a samplesheet with your input data that looks as follows:

samplesheet.csv :

sample,fastq_1,fastq_2,strandedness
CONTROL_REP1,AEG588A1_S1_L002_R1_001.fastq.gz,AEG588A1_S1_L002_R2_001.fastq.gz,auto
CONTROL_REP1,AEG588A1_S1_L003_R1_001.fastq.gz,AEG588A1_S1_L003_R2_001.fastq.gz,auto
CONTROL_REP1,AEG588A1_S1_L004_R1_001.fastq.gz,AEG588A1_S1_L004_R2_001.fastq.gz,auto

Each row represents a fastq file (single-end) or a pair of fastq files (paired end). Rows with the same sample identifier are considered technical replicates and merged automatically. The strandedness refers to the library preparation and will be automatically inferred if set to auto .

Warning: Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters ; see docs .

Now, you can run the pipeline using:

nextflow run nf-core/rnaseq \
 --input samplesheet.csv \
 --outdir <OUTDIR> \
 --genome GRCh37 \
 -profile <docker/singularity/.../institute>

For more details, please refer to the usage documentation and the parameter documentation .

Pipeline output

To see the the results of a test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation .

Online videos

A short talk about the history, current status and functionality on offer in this pipeline was given by Harshil Patel ( @drpatelh ) on 8th February 2022 as part of the nf-core/bytesize series.

You can find numerous talks on the nf-core events page from various topics including writing pipelines/modules in Nextflow DSL2, using nf-core tooling, running nf-core pipelines as well as more generic content like contributing to Github. Please check them out!

Credits

These scripts were originally written for use at the National Genomics Infrastructure , part of SciLifeLab in Stockholm, Sweden, by Phil Ewels ( @ewels ) and Rickard Hammarén ( @Hammarn ).

The pipeline was re-written in Nextflow DSL2 and is primarily maintained by Harshil Patel ( @drpatelh ) from Seqera Labs, Spain .

The pipeline workflow diagram was designed by Sarah Guinchard ( @G-Sarah ) and James Fellows Yates ( @jfy133 ).

Many thanks to other who have helped out along the way too, including (but not limited to):

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines .

For further information or help, don't hesitate to get in touch on the Slack #rnaseq channel (you can join with this invite ).

Citations

If you use nf-core/rnaseq for your analysis, please cite it using the following doi: 10.5281/zenodo.1400710

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x .

Code Snippets

31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
"""
bedtools \\
    genomecov \\
    -ibam $bam \\
    -bg \\
    -strand + \\
    $args \\
    | bedtools sort > ${prefix_forward}.bedGraph

bedtools \\
    genomecov \\
    -ibam $bam \\
    -bg \\
    -strand - \\
    $args \\
    | bedtools sort > ${prefix_reverse}.bedGraph

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bedtools: \$(bedtools --version | sed -e "s/bedtools v//g")
END_VERSIONS
"""
28
29
30
31
32
33
34
35
36
37
38
39
40
41
"""
fasta2gtf.py \\
    -o ${add_fasta.baseName}.gtf \\
    $biotype_name \\
    $add_fasta

cat $fasta $add_fasta > ${name}.fasta
cat $gtf ${add_fasta.baseName}.gtf > ${name}.gtf

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
"""
deseq2_qc.r \\
    --count_file $counts \\
    --outdir ./ \\
    --cores $task.cpus \\
    $args

if [ -f "R_sessionInfo.log" ]; then
    sed "s/deseq2_pca/${label_lower}_deseq2_pca/g" <$pca_header_multiqc >tmp.txt
    sed -i -e "s/DESeq2 PCA/${label_upper} DESeq2 PCA/g" tmp.txt
    cat tmp.txt *.pca.vals.txt > ${label_lower}.pca.vals_mqc.tsv

    sed "s/deseq2_clustering/${label_lower}_deseq2_clustering/g" <$clustering_header_multiqc >tmp.txt
    sed -i -e "s/DESeq2 sample/${label_upper} DESeq2 sample/g" tmp.txt
    cat tmp.txt *.sample.dists.txt > ${label_lower}.sample.dists_mqc.tsv
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
    bioconductor-deseq2: \$(Rscript -e "library(DESeq2); cat(as.character(packageVersion('DESeq2')))")
END_VERSIONS
"""
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
"""
dupradar.r \\
    $bam \\
    $prefix \\
    $gtf \\
    $strandedness \\
    $paired_end \\
    $task.cpus

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
    bioconductor-dupradar: \$(Rscript -e "library(dupRadar); cat(as.character(packageVersion('dupRadar')))")
END_VERSIONS
"""
21
22
23
24
25
26
27
28
29
30
"""
gtf2bed \\
    $gtf \\
    > ${gtf.baseName}.bed

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    perl: \$(echo \$(perl --version 2>&1) | sed 's/.*v\\(.*\\)) built.*/\\1/')
END_VERSIONS
"""
21
22
23
24
25
26
27
28
29
30
31
"""
filter_gtf_for_genes_in_genome.py \\
    --gtf $gtf \\
    --fasta $fasta \\
    -o ${fasta.baseName}_genes.gtf

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""
22
23
24
25
26
27
28
29
30
31
32
33
34
35
"""
cut -f 1,7 $count | tail -n +3 | cat $header - >> ${prefix}.biotype_counts_mqc.tsv

mqc_features_stat.py \\
    ${prefix}.biotype_counts_mqc.tsv \\
    -s $meta.id \\
    -f rRNA \\
    -o ${prefix}.biotype_counts_rrna_mqc.tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""
60
61
62
63
64
65
66
67
68
69
70
71
"""
multiqc \\
    -f \\
    $args \\
    $custom_config \\
    .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" )
END_VERSIONS
"""
23
24
25
26
27
28
29
30
"""
$command $fasta | cut -d "|" -f1 > ${outfile}.fixed.fa

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    sed: \$(echo \$(sed --version 2>&1) | sed 's/^.*GNU sed) //; s/ .*\$//')
END_VERSIONS
"""
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
"""
mkdir -p tmp/genes
cut -f 1,2 `ls ./genes/* | head -n 1` > gene_ids.txt
for fileid in `ls ./genes/*`; do
    samplename=`basename \$fileid | sed s/\\.genes.results\$//g`
    echo \$samplename > tmp/genes/\${samplename}.counts.txt
    cut -f 5 \${fileid} | tail -n+2 >> tmp/genes/\${samplename}.counts.txt
    echo \$samplename > tmp/genes/\${samplename}.tpm.txt
    cut -f 6 \${fileid} | tail -n+2 >> tmp/genes/\${samplename}.tpm.txt
done

mkdir -p tmp/isoforms
cut -f 1,2 `ls ./isoforms/* | head -n 1` > transcript_ids.txt
for fileid in `ls ./isoforms/*`; do
    samplename=`basename \$fileid | sed s/\\.isoforms.results\$//g`
    echo \$samplename > tmp/isoforms/\${samplename}.counts.txt
    cut -f 5 \${fileid} | tail -n+2 >> tmp/isoforms/\${samplename}.counts.txt
    echo \$samplename > tmp/isoforms/\${samplename}.tpm.txt
    cut -f 6 \${fileid} | tail -n+2 >> tmp/isoforms/\${samplename}.tpm.txt
done

paste gene_ids.txt tmp/genes/*.counts.txt > rsem.merged.gene_counts.tsv
paste gene_ids.txt tmp/genes/*.tpm.txt > rsem.merged.gene_tpm.tsv
paste transcript_ids.txt tmp/isoforms/*.counts.txt > rsem.merged.transcript_counts.tsv
paste transcript_ids.txt tmp/isoforms/*.tpm.txt > rsem.merged.transcript_tpm.tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    sed: \$(echo \$(sed --version 2>&1) | sed 's/^.*GNU sed) //; s/ .*\$//')
END_VERSIONS
"""
23
24
25
26
27
28
29
30
31
32
33
34
"""
salmon_summarizedexperiment.r \\
    NULL \\
    $counts \\
    $tpm

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
    bioconductor-summarizedexperiment: \$(Rscript -e "library(SummarizedExperiment); cat(as.character(packageVersion('SummarizedExperiment')))")
END_VERSIONS
"""
22
23
24
25
26
27
28
29
30
31
32
33
34
"""
salmon_tx2gene.py \\
    --gtf $gtf \\
    --salmon salmon \\
    --id $params.gtf_group_features \\
    --extra $params.gtf_extra_attributes \\
    -o salmon_tx2gene.tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""
26
27
28
29
30
31
32
33
34
35
36
37
"""
salmon_tximport.r \\
    NULL \\
    salmon \\
    salmon.merged

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//')
    bioconductor-tximeta: \$(Rscript -e "library(tximeta); cat(as.character(packageVersion('tximeta')))")
END_VERSIONS
"""
21
22
23
24
25
26
27
28
29
30
"""
check_samplesheet.py \\
    $samplesheet \\
    samplesheet.valid.csv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version | sed 's/Python //g')
END_VERSIONS
"""
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
"""
STAR \\
    --genomeDir $index \\
    --readFilesIn $reads  \\
    --runThreadN $task.cpus \\
    --outFileNamePrefix $prefix. \\
    $out_sam_type \\
    $ignore_gtf \\
    $seq_center \\
    $args

$mv_unsorted_bam

if [ -f ${prefix}.Unmapped.out.mate1 ]; then
    mv ${prefix}.Unmapped.out.mate1 ${prefix}.unmapped_1.fastq
    gzip ${prefix}.unmapped_1.fastq
fi
if [ -f ${prefix}.Unmapped.out.mate2 ]; then
    mv ${prefix}.Unmapped.out.mate2 ${prefix}.unmapped_2.fastq
    gzip ${prefix}.unmapped_2.fastq
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
"""
mkdir star
STAR \\
    --runMode genomeGenerate \\
    --genomeDir star/ \\
    --genomeFastaFiles $fasta \\
    --sjdbGTFfile $gtf \\
    --runThreadN $task.cpus \\
    $memory \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
"""
samtools faidx $fasta
NUM_BASES=`gawk '{sum = sum + \$2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' ${fasta}.fai`

mkdir star
STAR \\
    --runMode genomeGenerate \\
    --genomeDir star/ \\
    --genomeFastaFiles $fasta \\
    --sjdbGTFfile $gtf \\
    --runThreadN $task.cpus \\
    --genomeSAindexNbases \$NUM_BASES \\
    $memory \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""
24
25
26
27
28
29
30
31
32
33
34
35
"""
prepare-for-rsem.py \\
    --stdin=$bam \\
    --stdout=${prefix}.bam \\
    --log=${prefix}.prepare_for_rsem.log \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    umitools: \$(umi_tools --version 2>&1 | sed 's/^.*UMI-tools version://; s/ *\$//')
END_VERSIONS
"""
44
45
46
47
48
49
50
51
52
53
54
55
56
57
"""
bbsplit.sh \\
    -Xmx${avail_mem}M \\
    ref_primary=$primary_ref \\
    ${other_refs.join(' ')} \\
    path=bbsplit \\
    threads=$task.cpus \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bbmap: \$(bbversion.sh | grep -v "Duplicate cpuset")
END_VERSIONS
"""
NextFlow From line 44 of bbsplit/main.nf
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
"""
bbsplit.sh \\
    -Xmx${avail_mem}M \\
    $index_files \\
    threads=$task.cpus \\
    $fastq_in \\
    $fastq_out \\
    refstats=${prefix}.stats.txt \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bbmap: \$(bbversion.sh | grep -v "Duplicate cpuset")
END_VERSIONS
"""
NextFlow From line 72 of bbsplit/main.nf
26
27
28
29
30
31
32
33
"""
cat ${readList.join(' ')} > ${prefix}.merged.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
END_VERSIONS
"""
NextFlow From line 26 of fastq/main.nf
40
41
42
43
44
45
46
47
48
"""
cat ${read1.join(' ')} > ${prefix}_1.merged.fastq.gz
cat ${read2.join(' ')} > ${prefix}_2.merged.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
END_VERSIONS
"""
NextFlow From line 40 of fastq/main.nf
57
58
59
60
61
62
63
64
"""
touch ${prefix}.merged.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
END_VERSIONS
"""
NextFlow From line 57 of fastq/main.nf
68
69
70
71
72
73
74
75
76
"""
touch ${prefix}_1.merged.fastq.gz
touch ${prefix}_2.merged.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//')
END_VERSIONS
"""
NextFlow From line 68 of fastq/main.nf
24
25
26
27
28
29
30
31
32
"""
samtools faidx $fasta
cut -f 1,2 ${fasta}.fai > ${fasta}.sizes

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    getchromsizes: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
35
36
37
38
39
40
41
42
43
"""
touch ${fasta}.fai
touch ${fasta}.sizes

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    getchromsizes: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
"""
[ ! -f  ${prefix}.fastq.gz ] && ln -sf $reads ${prefix}.fastq.gz

fastp \\
    --stdout \\
    --in1 ${prefix}.fastq.gz \\
    --thread $task.cpus \\
    --json ${prefix}.fastp.json \\
    --html ${prefix}.fastp.html \\
    $adapter_list \\
    $fail_fastq \\
    $args \\
    2> ${prefix}.fastp.log \\
| gzip -c > ${prefix}.fastp.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastp: \$(fastp --version 2>&1 | sed -e "s/fastp //g")
END_VERSIONS
"""
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
"""
[ ! -f  ${prefix}.fastq.gz ] && ln -sf $reads ${prefix}.fastq.gz

fastp \\
    --in1 ${prefix}.fastq.gz \\
    --out1  ${prefix}.fastp.fastq.gz \\
    --thread $task.cpus \\
    --json ${prefix}.fastp.json \\
    --html ${prefix}.fastp.html \\
    $adapter_list \\
    $fail_fastq \\
    $args \\
    2> ${prefix}.fastp.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastp: \$(fastp --version 2>&1 | sed -e "s/fastp //g")
END_VERSIONS
"""
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
"""
[ ! -f  ${prefix}_1.fastq.gz ] && ln -sf ${reads[0]} ${prefix}_1.fastq.gz
[ ! -f  ${prefix}_2.fastq.gz ] && ln -sf ${reads[1]} ${prefix}_2.fastq.gz
fastp \\
    --in1 ${prefix}_1.fastq.gz \\
    --in2 ${prefix}_2.fastq.gz \\
    --out1 ${prefix}_1.fastp.fastq.gz \\
    --out2 ${prefix}_2.fastp.fastq.gz \\
    --json ${prefix}.fastp.json \\
    --html ${prefix}.fastp.html \\
    $adapter_list \\
    $fail_fastq \\
    $merge_fastq \\
    --thread $task.cpus \\
    --detect_adapter_for_pe \\
    $args \\
    2> ${prefix}.fastp.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastp: \$(fastp --version 2>&1 | sed -e "s/fastp //g")
END_VERSIONS
"""
28
29
30
31
32
33
34
35
36
37
38
"""
printf "%s %s\\n" $rename_to | while read old_name new_name; do
    [ -f "\${new_name}" ] || ln -s \$old_name \$new_name
done
fastqc $args --threads $task.cpus $renamed_files

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""
42
43
44
45
46
47
48
49
50
"""
touch ${prefix}.html
touch ${prefix}.zip

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""
43
44
45
46
47
48
49
50
51
52
53
54
"""
fq subsample \\
    $args \\
    $fastq \\
    $fastq1_output \\
    $fastq2_output

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fq: \$(echo \$(fq subsample --version | sed 's/fq-subsample //g'))
END_VERSIONS
"""
23
24
25
26
27
28
29
30
31
32
"""
gffread \\
    $gff \\
    $args \\
    -o ${prefix}.gtf
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gffread: \$(gffread --version 2>&1)
END_VERSIONS
"""
23
24
25
26
27
28
29
30
31
32
33
"""
gunzip \\
    -f \\
    $args \\
    $archive

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//')
END_VERSIONS
"""
NextFlow From line 23 of gunzip/main.nf
37
38
39
40
41
42
43
"""
touch $gunzip
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gunzip: \$(echo \$(gunzip --version 2>&1) | sed 's/^.*(gzip) //; s/ Copyright.*\$//')
END_VERSIONS
"""
NextFlow From line 37 of gunzip/main.nf
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
"""
INDEX=`find -L ./ -name "*.1.ht2" | sed 's/\\.1.ht2\$//'`
hisat2 \\
    -x \$INDEX \\
    -U $reads \\
    $strandedness \\
    --known-splicesite-infile $splicesites \\
    --summary-file ${prefix}.hisat2.summary.log \\
    --threads $task.cpus \\
    $seq_center \\
    $unaligned \\
    $args \\
    | samtools view -bS -F 4 -F 256 - > ${prefix}.bam

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    hisat2: $VERSION
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
"""
INDEX=`find -L ./ -name "*.1.ht2" | sed 's/\\.1.ht2\$//'`
hisat2 \\
    -x \$INDEX \\
    -1 ${reads[0]} \\
    -2 ${reads[1]} \\
    $strandedness \\
    --known-splicesite-infile $splicesites \\
    --summary-file ${prefix}.hisat2.summary.log \\
    --threads $task.cpus \\
    $seq_center \\
    $unaligned \\
    --no-mixed \\
    --no-discordant \\
    $args \\
    | samtools view -bS -F 4 -F 8 -F 256 - > ${prefix}.bam

if [ -f ${prefix}.unmapped.fastq.1.gz ]; then
    mv ${prefix}.unmapped.fastq.1.gz ${prefix}.unmapped_1.fastq.gz
fi
if [ -f ${prefix}.unmapped.fastq.2.gz ]; then
    mv ${prefix}.unmapped.fastq.2.gz ${prefix}.unmapped_2.fastq.gz
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    hisat2: $VERSION
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
"""
mkdir hisat2
$extract_exons
hisat2-build \\
    -p $task.cpus \\
    $ss \\
    $exon \\
    $args \\
    $fasta \\
    hisat2/${fasta.baseName}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    hisat2: $VERSION
END_VERSIONS
"""
24
25
26
27
28
29
30
"""
hisat2_extract_splice_sites.py $gtf > ${gtf.baseName}.splice_sites.txt
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    hisat2: $VERSION
END_VERSIONS
"""
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
"""
picard \\
    -Xmx${avail_mem}M \\
    MarkDuplicates \\
    $args \\
    --INPUT $bam \\
    --OUTPUT ${prefix}.bam \\
    --REFERENCE_SEQUENCE $fasta \\
    --METRICS_FILE ${prefix}.MarkDuplicates.metrics.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    picard: \$(echo \$(picard MarkDuplicates --version 2>&1) | grep -o 'Version:.*' | cut -f2- -d:)
END_VERSIONS
"""
51
52
53
54
55
56
57
58
59
60
"""
touch ${prefix}.bam
touch ${prefix}.bam.bai
touch ${prefix}.MarkDuplicates.metrics.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    picard: \$(echo \$(picard MarkDuplicates --version 2>&1) | grep -o 'Version:.*' | cut -f2- -d:)
END_VERSIONS
"""
26
27
28
29
30
31
32
33
34
35
36
37
38
39
"""
preseq \\
    lc_extrap \\
    $args \\
    $paired_end \\
    -output ${prefix}.lc_extrap.txt \\
    $bam
cp .command.err ${prefix}.command.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    preseq: \$(echo \$(preseq 2>&1) | sed 's/^.*Version: //; s/Usage:.*\$//')
END_VERSIONS
"""
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
"""
unset DISPLAY
mkdir -p tmp
export _JAVA_OPTIONS=-Djava.io.tmpdir=./tmp
qualimap \\
    --java-mem-size=$memory \\
    rnaseq \\
    $args \\
    -bam $bam \\
    -gtf $gtf \\
    -p $strandedness \\
    $paired_end \\
    -outdir $prefix

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qualimap: \$(echo \$(qualimap 2>&1) | sed 's/^.*QualiMap v.//; s/Built.*\$//')
END_VERSIONS
"""
55
56
57
58
59
60
61
62
"""
mkdir ${prefix}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qualimap: \$(echo \$(qualimap 2>&1) | sed 's/^.*QualiMap v.//; s/Built.*\$//')
END_VERSIONS
"""
NextFlow From line 55 of rnaseq/main.nf
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
"""
INDEX=`find -L ./ -name "*.grp" | sed 's/\\.grp\$//'`
rsem-calculate-expression \\
    --num-threads $task.cpus \\
    --temporary-folder ./tmp/ \\
    $strandedness \\
    $paired_end \\
    $args \\
    $reads \\
    \$INDEX \\
    $prefix

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rsem: \$(rsem-calculate-expression --version | sed -e "s/Current version: RSEM v//g")
    star: \$(STAR --version | sed -e "s/STAR_//g")
END_VERSIONS
"""
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
"""
STAR \\
    --runMode genomeGenerate \\
    --genomeDir rsem/ \\
    --genomeFastaFiles $fasta \\
    --sjdbGTFfile $gtf \\
    --runThreadN $task.cpus \\
    $memory \\
    $args2

rsem-prepare-reference \\
    --gtf $gtf \\
    --num-threads $task.cpus \\
    ${args_list.join(' ')} \\
    $fasta \\
    rsem/genome

cp rsem/genome.transcripts.fa .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rsem: \$(rsem-calculate-expression --version | sed -e "s/Current version: RSEM v//g")
    star: \$(STAR --version | sed -e "s/STAR_//g")
END_VERSIONS
"""
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
"""
rsem-prepare-reference \\
    --gtf $gtf \\
    --num-threads $task.cpus \\
    $args \\
    $fasta \\
    rsem/genome

cp rsem/genome.transcripts.fa .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rsem: \$(rsem-calculate-expression --version | sed -e "s/Current version: RSEM v//g")
    star: \$(STAR --version | sed -e "s/STAR_//g")
END_VERSIONS
"""
23
24
25
26
27
28
29
30
31
32
33
"""
bam_stat.py \\
    -i $bam \\
    $args \\
    > ${prefix}.bam_stat.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rseqc: \$(bam_stat.py --version | sed -e "s/bam_stat.py //g")
END_VERSIONS
"""
NextFlow From line 23 of bamstat/main.nf
24
25
26
27
28
29
30
31
32
33
34
35
"""
infer_experiment.py \\
    -i $bam \\
    -r $bed \\
    $args \\
    > ${prefix}.infer_experiment.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rseqc: \$(infer_experiment.py --version | sed -e "s/infer_experiment.py //g")
END_VERSIONS
"""
29
30
31
32
33
34
35
36
37
38
39
40
41
42
"""
inner_distance.py \\
    -i $bam \\
    -r $bed \\
    -o $prefix \\
    $args \\
    > stdout.txt
head -n 2 stdout.txt > ${prefix}.inner_distance_mean.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rseqc: \$(inner_distance.py --version | sed -e "s/inner_distance.py //g")
END_VERSIONS
"""
44
45
46
47
48
49
"""
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rseqc: \$(inner_distance.py --version | sed -e "s/inner_distance.py //g")
END_VERSIONS
"""
30
31
32
33
34
35
36
37
38
39
40
41
42
"""
junction_annotation.py \\
    -i $bam \\
    -r $bed \\
    -o $prefix \\
    $args \\
    2> ${prefix}.junction_annotation.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rseqc: \$(junction_annotation.py --version | sed -e "s/junction_annotation.py //g")
END_VERSIONS
"""
25
26
27
28
29
30
31
32
33
34
35
36
"""
junction_saturation.py \\
    -i $bam \\
    -r $bed \\
    -o $prefix \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rseqc: \$(junction_saturation.py --version | sed -e "s/junction_saturation.py //g")
END_VERSIONS
"""
24
25
26
27
28
29
30
31
32
33
34
"""
read_distribution.py \\
    -i $bam \\
    -r $bed \\
    > ${prefix}.read_distribution.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rseqc: \$(read_distribution.py --version | sed -e "s/read_distribution.py //g")
END_VERSIONS
"""
26
27
28
29
30
31
32
33
34
35
36
"""
read_duplication.py \\
    -i $bam \\
    -o $prefix \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rseqc: \$(read_duplication.py --version | sed -e "s/read_duplication.py //g")
END_VERSIONS
"""
25
26
27
28
29
30
31
32
33
34
35
"""
tin.py \\
    -i $bam \\
    -r $bed \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    rseqc: \$(tin.py --version | sed -e "s/tin.py //g")
END_VERSIONS
"""
NextFlow From line 25 of tin/main.nf
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
"""
$get_decoy_ids
sed -i.bak -e 's/>//g' decoys.txt
cat $transcript_fasta $genome_fasta > $gentrome

salmon \\
    index \\
    --threads $task.cpus \\
    -t $gentrome \\
    -d decoys.txt \\
    $args \\
    -i salmon

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    salmon: \$(echo \$(salmon --version) | sed -e "s/salmon //g")
END_VERSIONS
"""
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
"""
salmon quant \\
    --geneMap $gtf \\
    --threads $task.cpus \\
    --libType=$strandedness \\
    $reference \\
    $input_reads \\
    $args \\
    -o $prefix

if [ -f $prefix/aux_info/meta_info.json ]; then
    cp $prefix/aux_info/meta_info.json "${prefix}_meta_info.json"
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    salmon: \$(echo \$(salmon --version) | sed -e "s/salmon //g")
END_VERSIONS
"""
23
24
25
26
27
28
29
30
31
32
33
34
"""
samtools \\
    flagstat \\
    --threads ${task.cpus} \\
    $bam \\
    > ${prefix}.flagstat

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
24
25
26
27
28
29
30
31
32
33
34
35
"""
samtools \\
    idxstats \\
    --threads ${task.cpus-1} \\
    $bam \\
    > ${prefix}.idxstats

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
24
25
26
27
28
29
30
31
32
33
34
35
"""
samtools \\
    index \\
    -@ ${task.cpus-1} \\
    $args \\
    $input

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
38
39
40
41
42
43
44
45
46
47
"""
touch ${input}.bai
touch ${input}.crai
touch ${input}.csi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
NextFlow From line 38 of index/main.nf
25
26
27
28
29
30
31
32
33
34
35
36
37
"""
samtools sort \\
    $args \\
    -@ $task.cpus \\
    -o ${prefix}.bam \\
    -T $prefix \\
    $bam

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
41
42
43
44
45
46
47
48
"""
touch ${prefix}.bam

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
NextFlow From line 41 of sort/main.nf
25
26
27
28
29
30
31
32
33
34
35
36
37
"""
samtools \\
    stats \\
    --threads ${task.cpus} \\
    ${reference} \\
    ${input} \\
    > ${prefix}.stats

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
41
42
43
44
45
46
47
48
"""
touch ${prefix}.stats

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
END_VERSIONS
"""
NextFlow From line 41 of stats/main.nf
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
"""
sortmerna \\
    ${'--ref '+fastas.join(' --ref ')} \\
    --reads $reads \\
    --threads $task.cpus \\
    --workdir . \\
    --aligned rRNA_reads \\
    --fastx \\
    --other non_rRNA_reads \\
    $args

mv non_rRNA_reads.f*q.gz ${prefix}.non_rRNA.fastq.gz
mv rRNA_reads.log ${prefix}.sortmerna.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    sortmerna: \$(echo \$(sortmerna --version 2>&1) | sed 's/^.*SortMeRNA version //; s/ Build Date.*\$//')
END_VERSIONS
"""
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
"""
sortmerna \\
    ${'--ref '+fastas.join(' --ref ')} \\
    --reads ${reads[0]} \\
    --reads ${reads[1]} \\
    --threads $task.cpus \\
    --workdir . \\
    --aligned rRNA_reads \\
    --fastx \\
    --other non_rRNA_reads \\
    --paired_in \\
    --out2 \\
    $args

mv non_rRNA_reads_fwd.f*q.gz ${prefix}_1.non_rRNA.fastq.gz
mv non_rRNA_reads_rev.f*q.gz ${prefix}_2.non_rRNA.fastq.gz
mv rRNA_reads.log ${prefix}.sortmerna.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    sortmerna: \$(echo \$(sortmerna --version 2>&1) | sed 's/^.*SortMeRNA version //; s/ Build Date.*\$//')
END_VERSIONS
"""
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
"""
STAR \\
    --genomeDir $index \\
    --readFilesIn ${reads1.join(",")} ${reads2.join(",")} \\
    --runThreadN $task.cpus \\
    --outFileNamePrefix $prefix. \\
    $out_sam_type \\
    $ignore_gtf \\
    $attrRG \\
    $args

$mv_unsorted_bam

if [ -f ${prefix}.Unmapped.out.mate1 ]; then
    mv ${prefix}.Unmapped.out.mate1 ${prefix}.unmapped_1.fastq
    gzip ${prefix}.unmapped_1.fastq
fi
if [ -f ${prefix}.Unmapped.out.mate2 ]; then
    mv ${prefix}.Unmapped.out.mate2 ${prefix}.unmapped_2.fastq
    gzip ${prefix}.unmapped_2.fastq
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
"""
touch ${prefix}Xd.out.bam
touch ${prefix}.Log.final.out
touch ${prefix}.Log.out
touch ${prefix}.Log.progress.out
touch ${prefix}.sortedByCoord.out.bam
touch ${prefix}.toTranscriptome.out.bam
touch ${prefix}.Aligned.unsort.out.bam
touch ${prefix}.Aligned.sortedByCoord.out.bam
touch ${prefix}.unmapped_1.fastq.gz
touch ${prefix}.unmapped_2.fastq.gz
touch ${prefix}.tab
touch ${prefix}.SJ.out.tab
touch ${prefix}.ReadsPerGene.out.tab
touch ${prefix}.Chimeric.out.junction
touch ${prefix}.out.sam
touch ${prefix}.Signal.UniqueMultiple.str1.out.wig
touch ${prefix}.Signal.UniqueMultiple.str1.out.bg

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""
NextFlow From line 83 of align/main.nf
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
"""
mkdir star
STAR \\
    --runMode genomeGenerate \\
    --genomeDir star/ \\
    --genomeFastaFiles $fasta \\
    --sjdbGTFfile $gtf \\
    --runThreadN $task.cpus \\
    $memory \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
"""
samtools faidx $fasta
NUM_BASES=`gawk '{sum = sum + \$2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' ${fasta}.fai`

mkdir star
STAR \\
    --runMode genomeGenerate \\
    --genomeDir star/ \\
    --genomeFastaFiles $fasta \\
    --sjdbGTFfile $gtf \\
    --runThreadN $task.cpus \\
    --genomeSAindexNbases \$NUM_BASES \\
    $memory \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
"""
mkdir star
touch star/Genome
touch star/Log.out
touch star/SA
touch star/SAindex
touch star/chrLength.txt
touch star/chrName.txt
touch star/chrNameLength.txt
touch star/chrStart.txt
touch star/exonGeTrInfo.tab
touch star/exonInfo.tab
touch star/geneInfo.tab
touch star/genomeParameters.txt
touch star/sjdbInfo.txt
touch star/sjdbList.fromGTF.out.tab
touch star/sjdbList.out.tab
touch star/transcriptInfo.tab

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    star: \$(STAR --version | sed -e "s/STAR_//g")
    samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//')
    gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//')
END_VERSIONS
"""
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
"""
stringtie \\
    $bam \\
    $strandedness \\
    $reference \\
    -o ${prefix}.transcripts.gtf \\
    -A ${prefix}.gene.abundance.txt \\
    $coverage \\
    $ballgown \\
    -p $task.cpus \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    stringtie: \$(stringtie --version 2>&1)
END_VERSIONS
"""
57
58
59
60
61
62
63
64
65
66
67
"""
touch ${prefix}.transcripts.gtf
touch ${prefix}.gene.abundance.txt
touch ${prefix}.coverage.gtf
touch ${prefix}.ballgown

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    stringtie: \$(stringtie --version 2>&1)
END_VERSIONS
"""
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
"""
featureCounts \\
    $args \\
    $paired_end \\
    -T $task.cpus \\
    -a $annotation \\
    -s $strandedness \\
    -o ${prefix}.featureCounts.txt \\
    ${bams.join(' ')}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    subread: \$( echo \$(featureCounts -v 2>&1) | sed -e "s/featureCounts v//g")
END_VERSIONS
"""
42
43
44
45
46
47
48
49
50
51
52
53
54
55
"""
[ ! -f  ${prefix}.fastq.gz ] && ln -s $reads ${prefix}.fastq.gz
trim_galore \\
    ${args_list.join(' ')} \\
    --cores $cores \\
    --gzip \\
    ${prefix}.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    trimgalore: \$(echo \$(trim_galore --version 2>&1) | sed 's/^.*version //; s/Last.*\$//')
    cutadapt: \$(cutadapt --version)
END_VERSIONS
"""
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
"""
[ ! -f  ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz
[ ! -f  ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz
trim_galore \\
    $args \\
    --cores $cores \\
    --paired \\
    --gzip \\
    ${prefix}_1.fastq.gz \\
    ${prefix}_2.fastq.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    trimgalore: \$(echo \$(trim_galore --version 2>&1) | sed 's/^.*version //; s/Last.*\$//')
    cutadapt: \$(cutadapt --version)
END_VERSIONS
"""
26
27
28
29
30
31
32
33
34
35
36
"""
bedClip \\
    $bedgraph \\
    $sizes \\
    ${prefix}.bedGraph

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    ucsc: $VERSION
END_VERSIONS
"""
26
27
28
29
30
31
32
33
34
35
36
"""
bedGraphToBigWig \\
    $bedgraph \\
    $sizes \\
    ${prefix}.bigWig

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    ucsc: $VERSION
END_VERSIONS
"""
31
32
33
34
35
36
37
38
39
40
41
42
43
44
"""
PYTHONHASHSEED=0 umi_tools \\
    dedup \\
    -I $bam \\
    -S ${prefix}.bam \\
    $stats \\
    $paired \\
    $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    umitools: \$(umi_tools --version 2>&1 | sed 's/^.*UMI-tools version://; s/ *\$//')
END_VERSIONS
"""
26
27
28
29
30
31
32
33
34
35
36
37
38
"""
umi_tools \\
    extract \\
    -I $reads \\
    -S ${prefix}.umi_extract.fastq.gz \\
    $args \\
    > ${prefix}.umi_extract.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    umitools: \$(umi_tools --version 2>&1 | sed 's/^.*UMI-tools version://; s/ *\$//')
END_VERSIONS
"""
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
"""
umi_tools \\
    extract \\
    -I ${reads[0]} \\
    --read2-in=${reads[1]} \\
    -S ${prefix}.umi_extract_1.fastq.gz \\
    --read2-out=${prefix}.umi_extract_2.fastq.gz \\
    $args \\
    > ${prefix}.umi_extract.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    umitools: \$(umi_tools --version 2>&1 | sed 's/^.*UMI-tools version://; s/ *\$//')
END_VERSIONS
"""
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
"""
mkdir $prefix

## Ensures --strip-components only applied when top level of tar contents is a directory
## If just files or multiple directories, place all in prefix
if [[ \$(tar -taf ${archive} | grep -o -P "^.*?\\/" | uniq | wc -l) -eq 1 ]]; then
    tar \\
        -C $prefix --strip-components 1 \\
        -xavf \\
        $args \\
        $archive \\
        $args2
else
    tar \\
        -C $prefix \\
        -xavf \\
        $args \\
        $archive \\
        $args2
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    untar: \$(echo \$(tar --version 2>&1) | sed 's/^.*(GNU tar) //; s/ Copyright.*\$//')
END_VERSIONS
"""
NextFlow From line 25 of untar/main.nf
54
55
56
57
58
59
60
61
62
"""
mkdir $prefix
touch ${prefix}/file.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    untar: \$(echo \$(tar --version 2>&1) | sed 's/^.*(GNU tar) //; s/ Copyright.*\$//')
END_VERSIONS
"""
NextFlow From line 54 of untar/main.nf
ShowHide 61 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...