circRNA quantification, differential expression analysis and miRNA target prediction of RNA-Seq data
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Introduction
nf-core/circrna is a bioinformatics best-practice analysis pipeline for circRNA quantification, differential expression analysis and miRNA target prediction of RNA-Seq data.
The pipeline is built using Nextflow , a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies. Where possible, these processes have been submitted to and installed from nf-core/modules in order to make them available to all nf-core pipelines, and to everyone within the Nextflow community!
On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources.The results obtained from the full-sized test can be viewed on the nf-core website .
Pipeline summary
-
Raw read QC (
FastQC
) -
Adapter trimming (
Trim Galore!
) -
circRNA quantification
-
circRNA annotation
-
Export mature spliced length as FASTA file
-
Annotate parent gene, underlying transcripts.
-
circRNA count matrix
-
miRNA target prediction
-
Filter results, miRNAs must be called by both tools
-
Differential expression analysis
DESeq2
-
Circular - Linear ratio tests 'CircTest'
-
MultiQC report
MultiQC
Quick Start
-
Install
Nextflow
(>=22.10.1
) -
Install any of
Docker
,Singularity
(you can follow this tutorial ),Podman
,Shifter
orCharliecloud
for full pipeline reproducibility (you can useConda
both to install Nextflow itself and also to manage software within pipelines. Please only use it within pipelines as a last resort; see docs ) . -
Download the pipeline and test it on a minimal dataset with a single command:
nextflow run nf-core/circrna -profile test,YOURPROFILE --outdir <OUTDIR>
Note that some form of configuration will be needed so that Nextflow knows how to fetch the required software. This is usually done in the form of a config profile (
YOURPROFILE
in the example command above). You can chain multiple config profiles in a comma-separated string.-
The pipeline comes with config profiles called
docker
,singularity
,podman
,shifter
,charliecloud
andconda
which instruct the pipeline to use the named tool for software management. For example,-profile test,docker
. -
Please check nf-core/configs to see if a custom config file to run nf-core pipelines already exists for your Institute. If so, you can simply use
-profile <institute>
in your command. This will enable eitherdocker
orsingularity
and set the appropriate execution settings for your local compute environment. -
If you are using
singularity
, please use thenf-core download
command to download images first, before running the pipeline. Setting theNXF_SINGULARITY_CACHEDIR
orsingularity.cacheDir
Nextflow options enables you to store and re-use the images from a central location for future pipeline runs. -
If you are using
conda
, it is highly recommended to use theNXF_CONDA_CACHEDIR
orconda.cacheDir
settings to store the environments in a central location for future pipeline runs.
-
-
Start running your own analysis!
nextflow run nf-core/circrna --input samplesheet.csv --outdir <OUTDIR> --genome GRCh37 -profile <docker/singularity/podman/shifter/charliecloud/conda/institute> --tool 'ciriquant' --module 'circrna_discovery,mirna_prediction,differential_expression' --bsj_reads 2
Documentation
The nf-core/circrna pipeline comes with documentation about the pipeline usage , parameters and output .
Credits
nf-core/circrna was originally written by Barry Digby.
We thank the following people for their extensive assistance in the development of this pipeline:
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines .
For further information or help, don't hesitate to get in touch on the
Slack
#circrna
channel
(you can join with
this invite
).
Citations
An extensive list of references for the tools used by the pipeline can be found in the
CITATIONS.md
file.
You can cite the
nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x .
Code Snippets
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | """ grep -vf ${workflow.projectDir}/bin/unwanted_biotypes.txt $gtf > filt.gtf mv $bed circs.bed annotate_outputs.sh $exon_boundary &> ${prefix}.log mv master_bed12.bed ${prefix}.bed.tmp awk -v FS="\t" '{print \$11}' ${prefix}.bed.tmp > mature_len.tmp awk -v FS="," '{for(i=t=0;i<NF;) t+=\$++i; \$0=t}1' mature_len.tmp > mature_length paste ${prefix}.bed.tmp mature_length > ${prefix}.bed cat <<-END_VERSIONS > versions.yml "${task.process}": bedtools: \$(bedtools --version | sed -e "s/bedtools v//g") ucsc: $VERSION END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | """ # remove redundant biotypes from GTF. grep -vf ${workflow.projectDir}/bin/unwanted_biotypes.txt $gtf > filt.gtf # generate circrna BED file. tail -n +2 $circrna_matrix | awk '{print \$1}' > IDs.txt ID_to_BED.sh IDs.txt cat *.bed > merged.txt && rm IDs.txt && rm *.bed && mv merged.txt circs.bed # Re-use annotation script to identify the host gene. annotate_outputs.sh $exon_boundary &> annotation.log awk -v OFS="\t" '{print \$4, \$14}' master_bed12.bed > circrna_host-gene.txt cat <<-END_VERSIONS > versions.yml "${task.process}": bedtools: \$(bedtools --version | sed -e "s/bedtools v//g") ucsc: $VERSION END_VERSIONS """ |
19 20 21 22 23 | """ awk '{if(\$13 >= ${bsj_reads}) print \$0}' ${prefix}.txt | awk -v OFS="\t" '{print \$1,\$2,\$3,\$6,\$13}' > ${prefix}_${meta.tool}.bed awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_${meta.tool}.bed > ${prefix}_${meta.tool}_circs.bed """ |
24 25 26 27 28 29 30 31 32 33 34 35 36 | """ gtfToGenePred \ $args \ $gtf \ ${prefix}.genepred awk -v OFS="\t" '{print \$12, \$1, \$2, \$3, \$4, \$5, \$6, \$7, \$8, \$9, \$10}' ${prefix}.genepred > ${prefix}.txt cat <<-END_VERSIONS > versions.yml "${task.process}": ucsc: $VERSION END_VERSIONS """ |
27 28 29 30 31 32 33 34 35 36 37 38 39 | """ mkdir -p star_dir && mv *.tab *.junction *.sam star_dir postProcessStarAlignment.pl --starDir star_dir/ --outDir ./ awk '{if(\$5 >= ${bsj_reads}) print \$0}' ${prefix}.filteredJunctions.bed | awk -v OFS="\t" -F"\t" '{print \$1,\$2,\$3,\$6,\$5}' > ${prefix}_circrna_finder.bed awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_circrna_finder.bed > ${prefix}_circrna_finder_circs.bed cat <<-END_VERSIONS > versions.yml "${task.process}": circRNA_finder: $VERSION END_VERSIONS """ |
24 25 26 27 28 29 30 31 | """ prepare_circ_test.R cat <<-END_VERSIONS > versions.yml "${task.process}": r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//') END_VERSIONS """ |
22 23 24 25 26 27 28 29 30 31 32 | """ circ_test.R $circ_csv $linear_csv $phenotype cat <<-END_VERSIONS > versions.yml "${task.process}": r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//') aod: \$(Rscript -e "library(aod); cat(as.character(packageVersion('aod')))") ggplot2: \$(Rscript -e "library(ggplot2); cat(as.character(packageVersion('ggplot2')))") plyr: \$(Rscript -e "library(plyr); cat(as.character(packageVersion('plyr')))") END_VERSIONS """ |
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | """ CIRIquant \\ -t ${task.cpus} \\ -1 ${reads[0]} \\ -2 ${reads[1]} \\ --config $yml \\ --no-gene \\ -o ${prefix} \\ -p ${prefix} cat <<-END_VERSIONS > versions.yml "${task.process}": bwa: \$(echo \$(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*\$//') ciriquant : \$(echo \$(CIRIquant --version 2>&1) | sed 's/CIRIquant //g' ) samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') stringtie: \$(stringtie --version 2>&1) hisat2: $VERSION END_VERSIONS """ |
18 19 20 21 22 23 24 25 26 27 28 29 30 | """ grep -v "#" ${prefix}.gtf | awk '{print \$14}' | cut -d '.' -f1 > counts grep -v "#" ${prefix}.gtf | awk -v OFS="\t" '{print \$1,\$4,\$5,\$7}' > ${prefix}.tmp paste ${prefix}.tmp counts > ${prefix}_unfilt.bed awk '{if(\$5 >= ${bsj_reads}) print \$0}' ${prefix}_unfilt.bed > ${prefix}_filt.bed grep -v '^\$' ${prefix}_filt.bed > ${prefix}_ciriquant awk -v OFS="\t" '{\$2-=1;print}' ${prefix}_ciriquant > ${prefix}_ciriquant.bed rm ${prefix}.gtf awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_ciriquant.bed > ${prefix}_ciriquant_circs.bed """ |
28 29 30 31 32 33 34 35 36 | """ BWA=`which bwa` HISAT2=`which hisat2` STRINGTIE=`which stringtie` SAMTOOLS=`which samtools` touch travis.yml printf "name: ciriquant\ntools:\n bwa: \$BWA\n hisat2: \$HISAT2\n stringtie: \$STRINGTIE\n samtools: \$SAMTOOLS\n\nreference:\n fasta: ${fasta_path}\n gtf: ${gtf_path}\n bwa_index: ${bwa_path}/${bwa_prefix}\n hisat_index: ${hisat2_path}/${hisat2_prefix}" >> travis.yml """ |
21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 | """ python ${workflow.projectDir}/bin/circRNA_counts_matrix.py > matrix.txt ## handle non-canon chromosomes here (https://stackoverflow.com/questions/71479919/joining-columns-based-on-number-of-fields) n_samps=\$(ls *.bed | wc -l) canon=\$(awk -v a="\$n_samps" 'BEGIN {print a + 4}') awk -v n="\$canon" '{ for (i = 2; i <= NF - n + 1; ++i) { \$1 = \$1"-"\$i; \$i=""; } } 1' matrix.txt | awk -v OFS="\t" '\$1=\$1' > circRNA_matrix.txt Rscript ${workflow.projectDir}/bin/reformat_count_matrix.R cat <<-END_VERSIONS > versions.yml "${task.process}": r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//') argparser: \$(Rscript -e "library(arparser); cat(as.character(packageVersion('argparser')))") dplyr: \$(Rscript -e "library(dplyr); cat(as.character(packageVersion('dplyr')))") END_VERSIONS """ |
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | """ ## make list of files for R to read ls *.bed > samples.csv ## Add catch for empty bed file and delete bash ${workflow.projectDir}/bin/check_empty.sh ## Use intersection of "n" (params.tool_filter) circRNAs called by tools ## remove duplicate IDs, keep highest count. Rscript ${workflow.projectDir}/bin/consolidate_algorithms_intersection.R samples.csv $tool_filter $duplicates_fun mv combined_counts.bed ${prefix}.bed cat <<-END_VERSIONS > versions.yml "${task.process}": r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//') argparser: \$(Rscript -e "library(arparser); cat(as.character(packageVersion('argparser')))") dplyr: \$(Rscript -e "library(dplyr); cat(as.character(packageVersion('dplyr')))") END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | """ # Strip tool name from BED files (no consolidation prior to this step for 1 tool) for b in *.bed; do basename=\${b%".bed"}; sample_name=\${basename%"_${tool_name}"}; mv \$b \${sample_name}.bed done python ${workflow.projectDir}/bin/circRNA_counts_matrix.py > matrix.txt ## handle non-canon chromosomes here (https://stackoverflow.com/questions/71479919/joining-columns-based-on-number-of-fields) n_samps=\$(ls *.bed | wc -l) canon=\$(awk -v a="\$n_samps" 'BEGIN {print a + 4}') awk -v n="\$canon" '{ for (i = 2; i <= NF - n + 1; ++i) { \$1 = \$1"-"\$i; \$i=""; } } 1' matrix.txt | awk -v OFS="\t" '\$1=\$1' > circRNA_matrix.txt Rscript ${workflow.projectDir}/bin/reformat_count_matrix.R cat <<-END_VERSIONS > versions.yml "${task.process}": r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//') argparser: \$(Rscript -e "library(arparser); cat(as.character(packageVersion('argparser')))") dplyr: \$(Rscript -e "library(dplyr); cat(as.character(packageVersion('dplyr')))") END_VERSIONS """ |
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | """ sed -i 's/^chr//g' $gtf mkdir ${prefix} && mv ${prefix}.Chimeric.out.junction ${prefix} && printf "${prefix}/${prefix}.Chimeric.out.junction" > samplesheet DCC @samplesheet -D -an $gtf -Pi -ss -F -M -Nr 1 1 -fg -A $fasta -N -T ${task.cpus} awk '{print \$6}' CircCoordinates >> strand paste CircRNACount strand | tail -n +2 | awk -v OFS="\t" '{print \$1,\$2,\$3,\$5,\$4}' >> ${prefix}.txt cat <<-END_VERSIONS > versions.yml "${task.process}": dcc: \$(DCC --version) END_VERSIONS """ |
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | """ sed -i 's/^chr//g' $gtf mkdir ${prefix} && mv ${prefix}.Chimeric.out.junction ${prefix} && printf "${prefix}/${prefix}.Chimeric.out.junction" > samplesheet mkdir ${prefix}_mate1 && mv ${prefix}_mate1.Chimeric.out.junction ${prefix}_mate1 && printf "${prefix}_mate1/${prefix}_mate1.Chimeric.out.junction" > mate1file mkdir ${prefix}_mate2 && mv ${prefix}_mate2.Chimeric.out.junction ${prefix}_mate2 && printf "${prefix}_mate2/${prefix}_mate2.Chimeric.out.junction" > mate2file DCC @samplesheet -mt1 @mate1file -mt2 @mate2file -D -an $gtf -Pi -ss -F -M -Nr 1 1 -fg -A $fasta -N -T ${task.cpus} awk '{print \$6}' CircCoordinates >> strand paste CircRNACount strand | tail -n +2 | awk -v OFS="\t" '{print \$1,\$2,\$3,\$5,\$4}' >> ${prefix}.txt cat <<-END_VERSIONS > versions.yml "${task.process}": dcc: \$(DCC --version) END_VERSIONS """ |
18 19 20 21 22 | """ awk '{if(\$5 >= ${bsj_reads}) print \$0}' ${prefix}.txt > ${prefix}_dcc.filtered awk -v OFS="\t" '{\$2-=1;print}' ${prefix}_dcc.filtered > ${prefix}_dcc.bed awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_dcc.bed > ${prefix}_dcc_circs.bed """ |
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 | """ ## prepDE && circRNA counts headers are sorted such that uppercase preceedes lowercase i.e Z before a ## reformat the phenotype file to match the order of the samples. head -n 1 $phenotype > header tail -n +2 $phenotype | LC_COLLATE=C sort > sorted_pheno cat header sorted_pheno > tmp && rm phenotype.csv && mv tmp phenotype.csv DEA.R $gene_matrix $phenotype $circrna_matrix $species ensembl_database_map.txt mv boxplots/ circRNA/ cat <<-END_VERSIONS > versions.yml "${task.process}": r-base: \$(echo \$(R --version 2>&1) | sed 's/^.*R version //; s/ .*\$//') argparser: \$(Rscript -e "library(argparser); cat(as.character(packageVersion('argparser')))") biomart: \$(Rscript -e "library(biomaRt); cat(as.character(packageVersion('biomaRt')))") deseq2: \$(Rscript -e "library(DESeq2); cat(as.character(packageVersion('DESeq2')))") dplyr: \$(Rscript -e "library(dplyr); cat(as.character(packageVersion('dplyr')))") enhancedvolcano: \$(Rscript -e "library(EnhancedVolcano); cat(as.character(packageVersion('EnhancedVolcano')))") gplots: \$(Rscript -e "library(gplots); cat(as.character(packageVersion('gplots')))") ggplot2: \$(Rscript -e "library(ggplot2); cat(as.character(packageVersion('ggplot2')))") ggpubr: \$(Rscript -e "library(ggpubr); cat(as.character(packageVersion('ggpubr')))") ihw: \$(Rscript -e "library(IHW); cat(as.character(packageVersion('IHW')))") pvclust: \$(Rscript -e "library(pvclust); cat(as.character(packageVersion('pvclust')))") pcatools: \$(Rscript -e "library(PCAtools); cat(as.character(packageVersion('PCAtools')))") pheatmap: \$(Rscript -e "library(pheatmap); cat(as.character(packageVersion('pheatmap')))") rcolorbrewer: \$(Rscript -e "library(RColorBrewer); cat(as.character(packageVersion('RColorBrewer')))") END_VERSIONS """ |
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | """ ## FASTA sequences (bedtools does not like the extra annotation info - split will not work properly) cut -d\$'\t' -f1-12 ${prefix}.bed > bed12.tmp bedtools getfasta -fi $fasta -bed bed12.tmp -s -split -name > circ_seq.tmp ## clean fasta header grep -A 1 '>' circ_seq.tmp | cut -d: -f1,2,3 > ${prefix}.fa && rm circ_seq.tmp ## add backsplice sequence for miRanda Targetscan, publish canonical FASTA to results. rm $fasta bash ${workflow.projectDir}/bin/backsplice_gen.sh ${prefix}.fa cat <<-END_VERSIONS > versions.yml "${task.process}": bedtools: \$(bedtools --version | sed -e "s/bedtools v//g") END_VERSIONS """ |
23 24 25 26 27 28 29 30 | """ unmapped2anchors.py $bam | gzip > ${prefix}_anchors.qfa.gz cat <<-END_VERSIONS > versions.yml "${task.process}": find_circ: $VERSION END_VERSIONS """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 | """ grep CIRCULAR $bed | \ grep -v chrM | \ awk '\$5>=${bsj_reads}' | \ grep UNAMBIGUOUS_BP | grep ANCHOR_UNIQUE | \ maxlength.py 100000 \ > ${prefix}.txt tail -n +2 ${prefix}.txt | awk -v OFS="\t" '{print \$1,\$2,\$3,\$6,\$5}' > ${prefix}_find_circ.bed awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_find_circ.bed > ${prefix}_find_circ_circs.bed cat <<-END_VERSIONS > versions.yml "${task.process}": find_circ: $VERSION END_VERSIONS """ |
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | """ INDEX=`find -L ./ -name "*.rev.1.bt2" | sed "s/.rev.1.bt2//"` [ -z "\$INDEX" ] && INDEX=`find -L ./ -name "*.rev.1.bt2l" | sed "s/.rev.1.bt2l//"` [ -z "\$INDEX" ] && echo "Bowtie2 index files not found" 1>&2 && exit 1 bowtie2 \\ --threads $task.cpus \\ --reorder \\ --mm \\ -D 20 \\ --score-min=C,-15,0 \\ -q \\ -x \$INDEX \\ -U $anchors | \\ find_circ.py --genome=$fasta --prefix=${prefix} --stats=${prefix}.sites.log --reads=${prefix}.sites.reads > ${prefix}.sites.bed cat <<-END_VERSIONS > versions.yml "${task.process}": bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//') find_circ: $VERSION END_VERSIONS """ |
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | """ $handleGzip_R1 mapsplice.py \\ -c $chromosomes \\ -x $gtf_prefix \\ -1 ${read1} \\ -p ${task.cpus} \\ --bam \\ --gene-gtf $gtf \\ -o $prefix \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": mapsplice: $VERSION END_VERSIONS """ |
54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 | """ $handleGzip_R1 $handleGzip_R2 mapsplice.py \\ -c $chromosomes \\ -x $gtf_prefix \\ -1 ${read1} \\ -2 ${read2} \\ -p ${task.cpus} \\ --bam \\ --gene-gtf $gtf \\ -o $prefix \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": mapsplice: $VERSION END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | """ ## reformat and sort miRanda, TargetScan outputs, convert to BED for overlaps. tail -n +2 $targetscan | sort -k1,1 -k4n | awk -v OFS="\t" '{print \$1, \$2, \$4, \$5, \$9}' | awk -v OFS="\t" '{print \$2, \$3, \$4, \$1, "0", \$5}' > targetscan.bed tail -n +2 $miranda | sort -k2,2 -k7n | awk -v OFS="\t" '{print \$2, \$1, \$3, \$4, \$7, \$8}' | awk -v OFS="\t" '{print \$2, \$5, \$6, \$1, \$3, \$4}' | sed 's/^[^-]*-//g' > miranda.bed ## intersect, consolidate miRanda, TargetScan information about miRs. ## -wa to output miRanda hits - targetscan makes it difficult to resolve duplicate miRNAs at MRE sites. bedtools intersect -a miranda.bed -b targetscan.bed -wa > ${prefix}.mirnas.tmp bedtools intersect -a targetscan.bed -b miranda.bed | awk '{print \$6}' > mirna_type ## remove duplicate miRNA entries at MRE sites. ## strategy: sory by circs, sort by start position, sort by site type - the goal is to take the best site type (i.e rank site type found at MRE site). paste ${prefix}.mirnas.tmp mirna_type | sort -k3,3 -k2n -k7r | awk -v OFS="\t" '{print \$4,\$1,\$2,\$3,\$5,\$6,\$7}' | awk -F "\t" '{if (!seen[\$1,\$2,\$3,\$4,\$5,\$6]++)print}' | sort -k1,1 -k3n > ${prefix}.mirna_targets.tmp echo -e "circRNA\tmiRNA\tStart\tEnd\tScore\tEnergy_KcalMol\tSite_type" | cat - ${prefix}.mirna_targets.tmp > ${prefix}.mirna_targets.txt cat <<-END_VERSIONS > versions.yml "${task.process}": bedtools: \$(bedtools --version | sed -e "s/bedtools v//g") END_VERSIONS """ |
21 22 23 24 25 26 27 28 29 30 | """ check_samplesheet.py \\ $samplesheet \\ samplesheet.valid.csv cat <<-END_VERSIONS > versions.yml "${task.process}": python: \$(python --version | sed 's/Python //g') END_VERSIONS """ |
19 20 21 22 23 24 25 | """ grep ';C;' ${prefix}.sngl.bed | awk -v OFS="\t" '{print \$1,\$2,\$3,\$6}' | sort | uniq -c | awk -v OFS="\t" '{print \$2,\$3,\$4,\$5,\$1}' > ${prefix}_collapsed.bed awk -v OFS="\t" -v BSJ=${bsj_reads} '{if(\$5>=BSJ) print \$0}' ${prefix}_collapsed.bed > ${prefix}_segemehl.bed awk -v OFS="\t" '{print \$1, \$2, \$3, \$1":"\$2"-"\$3":"\$4, \$5, \$4}' ${prefix}_segemehl.bed > ${prefix}_segemehl_circs.bed """ |
15 16 17 | """ cat *.tab | awk -v BSJ=${bsj_reads} '(\$7 >= BSJ && \$6==0)' | cut -f1-6 | sort | uniq > dataset.SJ.out.tab """ |
20 21 22 23 24 | """ for file in \$(ls *.gtf); do sample_id=\${file%".transcripts.gtf"}; touch samples.txt; printf "\$sample_id\t\$file\n" >> samples.txt ; done prepDE.py -i samples.txt """ |
15 16 17 | """ bash ${workflow.projectDir}/bin/targetscan_format.sh $mature """ |
24 25 26 27 28 29 30 31 32 33 34 35 36 | """ ##format for targetscan cat $fasta | grep ">" | sed 's/>//g' > id cat $fasta | grep -v ">" > seq paste id seq | awk -v OFS="\t" '{print \$1, "0000", \$2}' > ${prefix}_ts.txt # run targetscan targetscan_70.pl mature.txt ${prefix}_ts.txt ${prefix}.txt cat <<-END_VERSIONS > versions.yml "${task.process}": targetscan: $VERSION END_VERSIONS """ |
42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 | """ INDEX=`find -L ./ -name "*.rev.1.bt2" | sed "s/\\.rev.1.bt2\$//"` [ -z "\$INDEX" ] && INDEX=`find -L ./ -name "*.rev.1.bt2l" | sed "s/\\.rev.1.bt2l\$//"` [ -z "\$INDEX" ] && echo "Bowtie2 index files not found" 1>&2 && exit 1 bowtie2 \\ -x \$INDEX \\ $reads_args \\ --threads $task.cpus \\ $unaligned \\ $args \\ 2> ${prefix}.bowtie2.log \\ | samtools $samtools_command $args2 --threads $task.cpus -o ${prefix}.bam - if [ -f ${prefix}.unmapped.fastq.1.gz ]; then mv ${prefix}.unmapped.fastq.1.gz ${prefix}.unmapped_1.fastq.gz fi if [ -f ${prefix}.unmapped.fastq.2.gz ]; then mv ${prefix}.unmapped.fastq.2.gz ${prefix}.unmapped_2.fastq.gz fi cat <<-END_VERSIONS > versions.yml "${task.process}": bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//') samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') pigz: \$( pigz --version 2>&1 | sed 's/pigz //g' ) END_VERSIONS """ |
22 23 24 25 26 27 28 29 | """ mkdir bowtie2 bowtie2-build $args --threads $task.cpus $fasta bowtie2/${fasta.baseName} cat <<-END_VERSIONS > versions.yml "${task.process}": bowtie2: \$(echo \$(bowtie2 --version 2>&1) | sed 's/^.*bowtie2-align-s version //; s/ .*\$//') END_VERSIONS """ |
22 23 24 25 26 27 28 29 | """ mkdir bowtie bowtie-build --threads $task.cpus $fasta bowtie/${fasta.baseName} cat <<-END_VERSIONS > versions.yml "${task.process}": bowtie: \$(echo \$(bowtie --version 2>&1) | sed 's/^.*bowtie-align-s version //; s/ .*\$//') END_VERSIONS """ |
22 23 24 25 26 27 28 29 30 31 32 33 34 | """ mkdir bwa bwa \\ index \\ $args \\ -p bwa/${fasta.baseName} \\ $fasta cat <<-END_VERSIONS > versions.yml "${task.process}": bwa: \$(echo \$(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*\$//') END_VERSIONS """ |
37 38 39 40 41 42 43 44 45 46 47 48 49 50 | """ mkdir bwa touch bwa/genome.amb touch bwa/genome.ann touch bwa/genome.bwt touch bwa/genome.pac touch bwa/genome.sa cat <<-END_VERSIONS > versions.yml "${task.process}": bwa: \$(echo \$(bwa 2>&1) | sed 's/^.*Version: //; s/Contact:.*\$//') END_VERSIONS """ |
26 27 28 29 30 31 32 33 | """ cat ${readList.join(' ')} > ${prefix}.merged.fastq.gz cat <<-END_VERSIONS > versions.yml "${task.process}": cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//') END_VERSIONS """ |
40 41 42 43 44 45 46 47 48 | """ cat ${read1.join(' ')} > ${prefix}_1.merged.fastq.gz cat ${read2.join(' ')} > ${prefix}_2.merged.fastq.gz cat <<-END_VERSIONS > versions.yml "${task.process}": cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//') END_VERSIONS """ |
57 58 59 60 61 62 63 64 | """ touch ${prefix}.merged.fastq.gz cat <<-END_VERSIONS > versions.yml "${task.process}": cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//') END_VERSIONS """ |
68 69 70 71 72 73 74 75 76 | """ touch ${prefix}_1.merged.fastq.gz touch ${prefix}_2.merged.fastq.gz cat <<-END_VERSIONS > versions.yml "${task.process}": cat: \$(echo \$(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*\$//') END_VERSIONS """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 38 | """ CIRCexplorer2 \\ annotate \\ -r $gene_annotation \\ -g $fasta \\ -b $junctions \\ -o ${prefix}.txt \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": circexplorer2: \$(echo \$(CIRCexplorer2 --version 2>&1) ) END_VERSIONS """ |
42 43 44 45 46 47 48 49 | """ touch ${prefix}.txt cat <<-END_VERSIONS > versions.yml "${task.process}": circexplorer2: \$(echo \$(CIRCexplorer2 --version 2>&1) ) END_VERSIONS """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 | """ CIRCexplorer2 \\ parse \\ $aligner \\ $fusions \\ -b ${prefix}.bed \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": circexplorer2: \$( echo \$(CIRCexplorer2 --version 2>&1) ) END_VERSIONS """ |
41 42 43 44 45 46 47 48 | """ touch ${prefix}.bed cat <<-END_VERSIONS > versions.yml "${task.process}": circexplorer2: \$( echo \$(CIRCexplorer2 --version 2>&1) ) END_VERSIONS """ |
28 29 30 31 32 33 34 35 36 37 38 | """ printf "%s %s\\n" $rename_to | while read old_name new_name; do [ -f "\${new_name}" ] || ln -s \$old_name \$new_name done fastqc $args --threads $task.cpus $renamed_files cat <<-END_VERSIONS > versions.yml "${task.process}": fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" ) END_VERSIONS """ |
42 43 44 45 46 47 48 49 50 | """ touch ${prefix}.html touch ${prefix}.zip cat <<-END_VERSIONS > versions.yml "${task.process}": fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" ) END_VERSIONS """ |
39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | """ INDEX=`find -L ./ -name "*.1.ht2" | sed 's/\\.1.ht2\$//'` hisat2 \\ -x \$INDEX \\ -U $reads \\ $strandedness \\ --known-splicesite-infile $splicesites \\ --summary-file ${prefix}.hisat2.summary.log \\ --threads $task.cpus \\ $seq_center \\ $unaligned \\ $args \\ | samtools view -bS -F 4 -F 256 - > ${prefix}.bam cat <<-END_VERSIONS > versions.yml "${task.process}": hisat2: $VERSION samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') END_VERSIONS """ |
61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 | """ INDEX=`find -L ./ -name "*.1.ht2" | sed 's/\\.1.ht2\$//'` hisat2 \\ -x \$INDEX \\ -1 ${reads[0]} \\ -2 ${reads[1]} \\ $strandedness \\ --known-splicesite-infile $splicesites \\ --summary-file ${prefix}.hisat2.summary.log \\ --threads $task.cpus \\ $seq_center \\ $unaligned \\ --no-mixed \\ --no-discordant \\ $args \\ | samtools view -bS -F 4 -F 8 -F 256 - > ${prefix}.bam if [ -f ${prefix}.unmapped.fastq.1.gz ]; then mv ${prefix}.unmapped.fastq.1.gz ${prefix}.unmapped_1.fastq.gz fi if [ -f ${prefix}.unmapped.fastq.2.gz ]; then mv ${prefix}.unmapped.fastq.2.gz ${prefix}.unmapped_2.fastq.gz fi cat <<-END_VERSIONS > versions.yml "${task.process}": hisat2: $VERSION samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') END_VERSIONS """ |
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | """ mkdir hisat2 $extract_exons hisat2-build \\ -p $task.cpus \\ $ss \\ $exon \\ $args \\ $fasta \\ hisat2/${fasta.baseName} cat <<-END_VERSIONS > versions.yml "${task.process}": hisat2: $VERSION END_VERSIONS """ |
24 25 26 27 28 29 30 | """ hisat2_extract_splice_sites.py $gtf > ${gtf.baseName}.splice_sites.txt cat <<-END_VERSIONS > versions.yml "${task.process}": hisat2: $VERSION END_VERSIONS """ |
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | """ miranda \\ $mirbase \\ $query \\ $args \\ -out ${prefix}.out echo "miRNA\tTarget\tScore\tEnergy_KcalMol\tQuery_Start\tQuery_End\tSubject_Start\tSubject_End\tAln_len\tSubject_Identity\tQuery_Identity" > ${prefix}.txt grep -A 1 "Scores for this hit:" ${prefix}.out | sort | grep ">" | cut -c 2- | tr ' ' '\t' >> ${prefix}.txt cat <<-END_VERSIONS > versions.yml "${task.process}": miranda: \$(echo \$(miranda -v | sed -n 4p | sed 's/^.*miranda v//; s/microRNA.*\$//' )) END_VERSIONS """ |
42 43 44 45 46 47 48 49 | """ touch ${prefix}.txt cat <<-END_VERSIONS > versions.yml "${task.process}": miranda: \$(echo \$(miranda -v | sed -n 4p | sed 's/^.*miranda v//; s/microRNA.*\$//' )) END_VERSIONS """ |
28 29 30 31 32 33 34 35 36 37 38 39 40 | """ multiqc \\ --force \\ $args \\ $config \\ $extra_config \\ . cat <<-END_VERSIONS > versions.yml "${task.process}": multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" ) END_VERSIONS """ |
43 44 45 46 47 48 49 50 51 52 | """ touch multiqc_data touch multiqc_plots touch multiqc_report.html cat <<-END_VERSIONS > versions.yml "${task.process}": multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" ) END_VERSIONS """ |
24 25 26 27 28 29 30 31 32 33 34 35 | """ samtools \\ index \\ -@ ${task.cpus-1} \\ $args \\ $input cat <<-END_VERSIONS > versions.yml "${task.process}": samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') END_VERSIONS """ |
38 39 40 41 42 43 44 45 46 47 | """ touch ${input}.bai touch ${input}.crai touch ${input}.csi cat <<-END_VERSIONS > versions.yml "${task.process}": samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') END_VERSIONS """ |
25 26 27 28 29 30 31 | """ samtools sort $args -@ $task.cpus -o ${prefix}.bam -T $prefix $bam cat <<-END_VERSIONS > versions.yml "${task.process}": samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') END_VERSIONS """ |
35 36 37 38 39 40 41 42 | """ touch ${prefix}.bam cat <<-END_VERSIONS > versions.yml "${task.process}": samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') END_VERSIONS """ |
38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | """ samtools \\ view \\ --threads ${task.cpus-1} \\ ${reference} \\ ${readnames} \\ $args \\ -o ${prefix}.${file_type} \\ $input \\ $args2 cat <<-END_VERSIONS > versions.yml "${task.process}": samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') END_VERSIONS """ |
57 58 59 60 61 62 63 64 65 | """ touch ${prefix}.bam touch ${prefix}.cram cat <<-END_VERSIONS > versions.yml "${task.process}": samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') END_VERSIONS """ |
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | """ mkdir -p $prefix segemehl.x \\ -t $task.cpus \\ -d $fasta \\ -i $index \\ $reads \\ $args \\ -o ${prefix}/${prefix}.${suffix} cat <<-END_VERSIONS > versions.yml "${task.process}": segemehl: \$(echo \$(segemehl.x 2>&1 | grep "ge5dee" | awk -F Z '{print substr(\$1, 2, 6)}' )) END_VERSIONS """ |
47 48 49 50 51 52 53 54 55 | """ mkdir -p $prefix touch ${prefix}/${prefix}.${suffix} cat <<-END_VERSIONS > versions.yml "${task.process}": segemehl: \$(echo \$(segemehl.x 2>&1 | grep "ge5dee" | awk -F Z '{print substr(\$1, 2, 6)}' )) END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 | """ segemehl.x \\ -t $task.cpus \\ -d $fasta \\ -x ${prefix}.idx \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": segemehl: \$(echo \$(segemehl.x 2>&1 | grep "ge5dee" | awk -F Z '{print substr(\$1, 2, 6)}' )) END_VERSIONS """ |
38 39 40 41 42 43 44 45 | """ touch ${prefix}.idx cat <<-END_VERSIONS > versions.yml "${task.process}": segemehl: \$(echo \$(segemehl.x 2>&1 | grep "ge5dee" | awk -F Z '{print substr(\$1, 2, 6)}' )) END_VERSIONS """ |
44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | """ STAR \\ --genomeDir $index \\ --readFilesIn $reads \\ --runThreadN $task.cpus \\ --outFileNamePrefix $prefix. \\ $out_sam_type \\ $ignore_gtf \\ $seq_center \\ $args $mv_unsorted_bam if [ -f ${prefix}.Unmapped.out.mate1 ]; then mv ${prefix}.Unmapped.out.mate1 ${prefix}.unmapped_1.fastq gzip ${prefix}.unmapped_1.fastq fi if [ -f ${prefix}.Unmapped.out.mate2 ]; then mv ${prefix}.Unmapped.out.mate2 ${prefix}.unmapped_2.fastq gzip ${prefix}.unmapped_2.fastq fi cat <<-END_VERSIONS > versions.yml "${task.process}": star: \$(STAR --version | sed -e "s/STAR_//g") samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//') END_VERSIONS """ |
76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 | """ touch ${prefix}Xd.out.bam touch ${prefix}.Log.final.out touch ${prefix}.Log.out touch ${prefix}.Log.progress.out touch ${prefix}.sortedByCoord.out.bam touch ${prefix}.toTranscriptome.out.bam touch ${prefix}.Aligned.unsort.out.bam touch ${prefix}.unmapped_1.fastq.gz touch ${prefix}.unmapped_2.fastq.gz touch ${prefix}.tab touch ${prefix}.Chimeric.out.junction touch ${prefix}.out.sam cat <<-END_VERSIONS > versions.yml "${task.process}": star: \$(STAR --version | sed -e "s/STAR_//g") samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//') END_VERSIONS """ |
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | """ mkdir star STAR \\ --runMode genomeGenerate \\ --genomeDir star/ \\ --genomeFastaFiles $fasta \\ --sjdbGTFfile $gtf \\ --runThreadN $task.cpus \\ $memory \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": star: \$(STAR --version | sed -e "s/STAR_//g") samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//') END_VERSIONS """ |
45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | """ samtools faidx $fasta NUM_BASES=`gawk '{sum = sum + \$2}END{if ((log(sum)/log(2))/2 - 1 > 14) {printf "%.0f", 14} else {printf "%.0f", (log(sum)/log(2))/2 - 1}}' ${fasta}.fai` mkdir star STAR \\ --runMode genomeGenerate \\ --genomeDir star/ \\ --genomeFastaFiles $fasta \\ --sjdbGTFfile $gtf \\ --runThreadN $task.cpus \\ --genomeSAindexNbases \$NUM_BASES \\ $memory \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": star: \$(STAR --version | sed -e "s/STAR_//g") samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//') END_VERSIONS """ |
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | """ mkdir star touch star/Genome touch star/Log.out touch star/SA touch star/SAindex touch star/chrLength.txt touch star/chrName.txt touch star/chrNameLength.txt touch star/chrStart.txt touch star/exonGeTrInfo.tab touch star/exonInfo.tab touch star/geneInfo.tab touch star/genomeParameters.txt touch star/sjdbInfo.txt touch star/sjdbList.fromGTF.out.tab touch star/sjdbList.out.tab touch star/transcriptInfo.tab cat <<-END_VERSIONS > versions.yml "${task.process}": star: \$(STAR --version | sed -e "s/STAR_//g") samtools: \$(echo \$(samtools --version 2>&1) | sed 's/^.*samtools //; s/Using.*\$//') gawk: \$(echo \$(gawk --version 2>&1) | sed 's/^.*GNU Awk //; s/, .*\$//') END_VERSIONS """ |
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | """ stringtie \\ $bam \\ $strandedness \\ $reference \\ -o ${prefix}.transcripts.gtf \\ -A ${prefix}.gene.abundance.txt \\ $coverage \\ $ballgown \\ -p $task.cpus \\ $args cat <<-END_VERSIONS > versions.yml "${task.process}": stringtie: \$(stringtie --version 2>&1) END_VERSIONS """ |
57 58 59 60 61 62 63 64 65 66 67 | """ touch ${prefix}.transcripts.gtf touch ${prefix}.gene.abundance.txt touch ${prefix}.coverage.gtf touch ${prefix}.ballgown cat <<-END_VERSIONS > versions.yml "${task.process}": stringtie: \$(stringtie --version 2>&1) END_VERSIONS """ |
41 42 43 44 45 46 47 48 49 50 51 52 53 54 | """ [ ! -f ${prefix}.fastq.gz ] && ln -s $reads ${prefix}.fastq.gz trim_galore \\ $args \\ --cores $cores \\ --gzip \\ ${prefix}.fastq.gz cat <<-END_VERSIONS > versions.yml "${task.process}": trimgalore: \$(echo \$(trim_galore --version 2>&1) | sed 's/^.*version //; s/Last.*\$//') cutadapt: \$(cutadapt --version) END_VERSIONS """ |
56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 | """ [ ! -f ${prefix}_1.fastq.gz ] && ln -s ${reads[0]} ${prefix}_1.fastq.gz [ ! -f ${prefix}_2.fastq.gz ] && ln -s ${reads[1]} ${prefix}_2.fastq.gz trim_galore \\ $args \\ --cores $cores \\ --paired \\ --gzip \\ ${prefix}_1.fastq.gz \\ ${prefix}_2.fastq.gz cat <<-END_VERSIONS > versions.yml "${task.process}": trimgalore: \$(echo \$(trim_galore --version 2>&1) | sed 's/^.*version //; s/Last.*\$//') cutadapt: \$(cutadapt --version) END_VERSIONS """ |
Support
- Future updates