Amplicon sequencing analysis workflow using DADA2 and QIIME2
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
Introduction
nfcore/ampliseq is a bioinformatics analysis pipeline used for amplicon sequencing, supporting denoising of any amplicon and, currently, taxonomic assignment of 16S, ITS, CO1 and 18S amplicons. Phylogenetic placement is also possible. Supported is paired-end Illumina or single-end Illumina, PacBio and IonTorrent data. Default is the analysis of 16S rRNA gene amplicons sequenced paired-end with Illumina.
A video about relevance, usage and output of the pipeline (version 2.1.0; 26th Oct. 2021) can also be found in YouTube and billibilli , the slides are deposited at figshare .
The pipeline is built using Nextflow , a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.
On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the nf-core website .
Pipeline summary
By default, the pipeline currently performs the following:
-
Sequencing quality control ( FastQC )
-
Trimming of reads ( Cutadapt )
-
Infer Amplicon Sequence Variants (ASVs) ( DADA2 )
-
Predict whether ASVs are ribosomal RNA sequences ( Barrnap )
-
Phylogenetic placement ( EPA-NG )
-
Excludes unwanted taxa, produces absolute and relative feature/taxa count tables and plots, plots alpha rarefaction curves, computes alpha and beta diversity indices and plots thereof ( QIIME2 )
-
Calls differentially abundant taxa ( ANCOM )
-
Overall pipeline run summaries ( MultiQC )
Usage
Note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with
-profile test
before running the workflow on actual data.
First, you need to know whether the sequencing files at hand are expected to contain primer sequences (usually yes) and if yes, what primer sequences. In the example below, the paired end sequencing data was produced with 515f (GTGYCAGCMGCCGCGGTAA) and 806r (GGACTACNVGGGTWTCTAAT) primers of the V4 region of the 16S rRNA gene. Please note, that those sequences should not contain any sequencing adapter sequences, only the sequence that matches the biological amplicon.
Next, the data needs to be organized in a folder, here
data
, or detailed in a samplesheet (see
input documentation
).
Now, you can run the pipeline using:
nextflow run nf-core/ampliseq \
-profile <docker/singularity/.../institute> \
--input "data" \
--FW_primer "GTGYCAGCMGCCGCGGTAA" \
--RV_primer "GGACTACNVGGGTWTCTAAT" \
--outdir <OUTDIR>
Note Adding metadata will considerably increase the output, see metadata documentation .
Warning: Please provide pipeline parameters via the CLI or Nextflow
-params-file
option. Custom config files including those provided by the-c
Nextflow option can be used to provide any configuration except for parameters ; see docs .
For more details, please refer to the usage documentation and the parameter documentation .
Pipeline output
To see the the results of a test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation .
Credits
nf-core/ampliseq was originally written by Daniel Straub ( @d4straub ) and Alexander Peltzer ( @apeltzer ) for use at the Quantitative Biology Center (QBiC) and Microbial Ecology, Center for Applied Geosciences , part of Eberhard Karls Universität Tübingen (Germany). Daniel Lundin @erikrikarddaniel ( Linnaeus University, Sweden ) joined before pipeline release 2.0.0 and helped to improve the pipeline considerably.
We thank the following people for their extensive assistance in the development of this pipeline (in alphabetical order):
Contributions and Support
If you would like to contribute to this pipeline, please see the contributing guidelines .
For further information or help, don't hesitate to get in touch on the
Slack
#ampliseq
channel
(you can join with
this invite
).
Citations
If you use
nf-core/ampliseq
for your analysis, please cite the
ampliseq
article as follows:
Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline
Daniel Straub, Nia Blackwell, Adrian Langarica-Fuentes, Alexander Peltzer, Sven Nahnsen, Sara Kleindienst
Frontiers in Microbiology 2020, 11:2652 doi: 10.3389/fmicb.2020.550420 .
You can cite the
nf-core/ampliseq
zenodo record for a specific version using the following
doi: 10.5281/zenodo.1493841
An extensive list of references for the tools used by the pipeline can be found in the
CITATIONS.md
file.
You can cite the
nf-core
publication as follows:
The nf-core framework for community-curated bioinformatics pipelines.
Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.
Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x .
Code Snippets
25 26 27 28 29 30 31 32 33 | """ add_sh_to_taxonomy.py ${sh_info.join(' ')} $asvtable $blastfile $outtable $args cat <<-END_VERSIONS > versions.yml "${task.process}": python: \$(python --version 2>&1 | sed 's/Python //g') pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)") END_VERSIONS """ |
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | """ IFS=',' read -r -a kingdom <<< \"$kingdom\" for KINGDOM in "\${kingdom[@]}" do barrnap \\ --threads $task.cpus \\ $args \\ --kingdom \$KINGDOM \\ --outseq Filtered.\${KINGDOM}.fasta \\ < $fasta \\ > rrna.\${KINGDOM}.gff #this fails when processing an empty file, so it requires a workaround! if [ -s Filtered.\${KINGDOM}.fasta ]; then grep -h '>' Filtered.\${KINGDOM}.fasta | sed 's/^>//' | sed 's/:\\+/\\t/g' | awk '{print \$2}' | sort -u >\${KINGDOM}.matches.txt else touch \${KINGDOM}.matches.txt fi done cat <<-END_VERSIONS > versions.yml "${task.process}": barrnap: \$(echo \$(barrnap --version 2>&1) | sed "s/^.*barrnap //g") END_VERSIONS """ |
22 23 24 25 26 27 28 29 30 31 32 33 34 35 | """ summarize_barrnap.py $predictions if [[ \$(wc -l < summary.tsv ) -le 1 ]]; then touch WARNING_no_rRNA_found_warning.txt else touch no_warning.txt fi cat <<-END_VERSIONS > versions.yml "${task.process}": python: \$( python --version ) END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 | """ combine_table.r ${table} ${seq} ${tax} mv combined_ASV_table.tsv ${outfile} cat <<-END_VERSIONS > versions.yml "${task.process}": R: \$(R --version 2>&1 | sed -n 1p | sed 's/R version //' | sed 's/ (.*//') END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | """ #!/usr/bin/env Rscript standard <- read.table(\"${files[0]}\", header = TRUE, sep = "\\t", stringsAsFactors = FALSE) doubleprimer <- read.table(\"${files[1]}\", header = TRUE, sep = "\\t", stringsAsFactors = FALSE) colnames(doubleprimer) <- c("sample", "cutadapt_doubleprimer_total_processed", "cutadapt_doubleprimer_reverse_complemented", "cutadapt_doubleprimer_passing_filters", "cutadapt_doubleprimer_passing_filters_percent") #merge df <- merge(standard, doubleprimer, by = "sample") #filter columns remove_columns <- c("cutadapt_doubleprimer_total_processed") for(column in remove_columns) df[column]<-NULL #write write.table(df, file = \"cutadapt_summary.tsv\", quote=FALSE, col.names=TRUE, row.names=FALSE, sep="\\t") writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")) ), "versions.yml") """ |
42 43 44 | """ cp $files cutadapt_summary.tsv """ |
23 24 25 26 27 28 29 30 | """ cutadapt_summary.py $mode *.cutadapt.log > ${name}_summary.tsv cat <<-END_VERSIONS > versions.yml "${task.process}": python: \$( python --version ) END_VERSIONS """ |
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | """ #!/usr/bin/env Rscript suppressPackageStartupMessages(library(dada2)) set.seed($seed) # Initialize random number generator for reproducibility #add "Species" if not already in taxlevels taxlevels <- $taxlevels if ( !"Species" %in% taxlevels ) { taxlevels <- c(taxlevels,"Species") } taxtable <- readRDS(\"$taxtable\") #remove Species annotation from assignTaxonomy taxa_nospecies <- taxtable[,!colnames(taxtable) %in% 'Species'] tx <- addSpecies(taxa_nospecies, \"$database\", $args, verbose=TRUE) # Create a table with specified column order tmp <- data.frame(row.names(tx)) # To separate ASV_ID from sequence expected_order <- c("ASV_ID",taxlevels,"confidence") taxa <- as.data.frame( subset(tx, select = expected_order) ) taxa\$sequence <- tmp[,1] row.names(taxa) <- row.names(tmp) #rename Species annotation to Species_exact colnames(taxa)[which(names(taxa) == "Species")] <- "Species_exact" #add Species annotation from assignTaxonomy again, after "Genus" column if ( "Species" %in% colnames(taxtable) ) { taxtable <- data.frame(taxtable) taxa_export <- data.frame(append(taxa, list(Species=taxtable\$Species), after=match("Genus", names(taxa)))) } else { taxa_export <- taxa } write.table(taxa_export, file = \"$outfile\", sep = "\\t", row.names = FALSE, col.names = TRUE, quote = FALSE, na = '') write.table('addSpecies\t$args\ntaxlevels\t$taxlevels\nseed\t$seed', file = "addSpecies.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '') writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" dada2: ", packageVersion("dada2")) ), "versions.yml") """ |
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 | """ #!/usr/bin/env Rscript suppressPackageStartupMessages(library(dada2)) errF = readRDS("${errormodel[0]}") errR = readRDS("${errormodel[1]}") filtFs <- sort(list.files("./filtered/", pattern = "_1.filt.fastq.gz", full.names = TRUE)) filtRs <- sort(list.files("./filtered/", pattern = "_2.filt.fastq.gz", full.names = TRUE)) #denoising sink(file = "${meta.run}.dada.log") dadaFs <- dada(filtFs, err = errF, $args, multithread = $task.cpus) saveRDS(dadaFs, "${meta.run}_1.dada.rds") dadaRs <- dada(filtRs, err = errR, $args, multithread = $task.cpus) saveRDS(dadaRs, "${meta.run}_2.dada.rds") sink(file = NULL) #make table mergers <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, $args2, verbose=TRUE) saveRDS(mergers, "${meta.run}.mergers.rds") seqtab <- makeSequenceTable(mergers) saveRDS(seqtab, "${meta.run}.seqtab.rds") write.table('dada\t$args', file = "dada.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '') write.table('mergePairs\t$args2', file = "mergePairs.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '') writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" dada2: ", packageVersion("dada2")) ), "versions.yml") """ |
58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 | """ #!/usr/bin/env Rscript suppressPackageStartupMessages(library(dada2)) errF = readRDS("${errormodel}") filtFs <- sort(list.files("./filtered/", pattern = ".fastq.gz", full.names = TRUE)) #denoising sink(file = "${meta.run}.dada.log") dadaFs <- dada(filtFs, err = errF, $args, multithread = $task.cpus) saveRDS(dadaFs, "${meta.run}.dada.rds") sink(file = NULL) #make table seqtab <- makeSequenceTable(dadaFs) saveRDS(seqtab, "${meta.run}.seqtab.rds") #dummy file to fulfill output rules saveRDS("dummy", "dummy_${meta.run}.mergers.rds") write.table('dada\t$args', file = "dada.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '') writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" dada2: ", packageVersion("dada2")) ), "versions.yml") """ |
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | """ #!/usr/bin/env Rscript suppressPackageStartupMessages(library(dada2)) set.seed($seed) # Initialize random number generator for reproducibility fnFs <- sort(list.files(".", pattern = "_1.filt.fastq.gz", full.names = TRUE)) fnRs <- sort(list.files(".", pattern = "_2.filt.fastq.gz", full.names = TRUE)) sink(file = "${meta.run}.err.log") errF <- learnErrors(fnFs, $args, multithread = $task.cpus, verbose = TRUE) saveRDS(errF, "${meta.run}_1.err.rds") errR <- learnErrors(fnRs, $args, multithread = $task.cpus, verbose = TRUE) saveRDS(errR, "${meta.run}_2.err.rds") sink(file = NULL) pdf("${meta.run}_1.err.pdf") plotErrors(errF, nominalQ = TRUE) dev.off() svg("${meta.run}_1.err.svg") plotErrors(errF, nominalQ = TRUE) dev.off() pdf("${meta.run}_2.err.pdf") plotErrors(errR, nominalQ = TRUE) dev.off() svg("${meta.run}_2.err.svg") plotErrors(errR, nominalQ = TRUE) dev.off() sink(file = "${meta.run}_1.err.convergence.txt") dada2:::checkConvergence(errF) sink(file = NULL) sink(file = "${meta.run}_2.err.convergence.txt") dada2:::checkConvergence(errR) sink(file = NULL) write.table('learnErrors\t$args', file = "learnErrors.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '') writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" dada2: ", packageVersion("dada2")) ), "versions.yml") """ |
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | """ #!/usr/bin/env Rscript suppressPackageStartupMessages(library(dada2)) set.seed($seed) # Initialize random number generator for reproducibility fnFs <- sort(list.files(".", pattern = ".filt.fastq.gz", full.names = TRUE)) sink(file = "${meta.run}.err.log") errF <- learnErrors(fnFs, $args, multithread = $task.cpus, verbose = TRUE) saveRDS(errF, "${meta.run}.err.rds") sink(file = NULL) pdf("${meta.run}.err.pdf") plotErrors(errF, nominalQ = TRUE) dev.off() svg("${meta.run}.err.svg") plotErrors(errF, nominalQ = TRUE) dev.off() sink(file = "${meta.run}.err.convergence.txt") dada2:::checkConvergence(errF) sink(file = NULL) write.table('learnErrors\t$args', file = "learnErrors.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '') writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" dada2: ", packageVersion("dada2")) ), "versions.yml") """ |
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | """ #!/usr/bin/env Rscript suppressPackageStartupMessages(library(dada2)) out <- filterAndTrim($in_and_out, $trunc_args, $args, compress = TRUE, multithread = $task.cpus, verbose = TRUE) out <- cbind(out, ID = row.names(out)) write.table( out, file = "${meta.id}.filter_stats.tsv", sep = "\\t", row.names = FALSE, quote = FALSE, na = '') write.table(paste('filterAndTrim\t$trunc_args','$args',sep=","), file = "filterAndTrim.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '') writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" dada2: ", packageVersion("dada2")) ), "versions.yml") """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 | """ #!/usr/bin/env Rscript suppressPackageStartupMessages(library(dada2)) suppressPackageStartupMessages(library(digest)) #combine stats files for (data in sort(list.files(".", pattern = ".stats.tsv", full.names = TRUE))) { if (!exists("stats")){ stats <- read.csv(data, header=TRUE, sep="\\t") } if (exists("stats")){ temp <-read.csv(data, header=TRUE, sep="\\t") stats <-unique(rbind(stats, temp)) rm(temp) } } write.table( stats, file = "DADA2_stats.tsv", sep = "\\t", row.names = FALSE, col.names = TRUE, quote = FALSE, na = '') #combine dada-class objects files <- sort(list.files(".", pattern = ".ASVtable.rds", full.names = TRUE)) if ( length(files) == 1 ) { ASVtab = readRDS(files[1]) } else { ASVtab <- mergeSequenceTables(tables=files, repeats = "error", orderBy = "abundance", tryRC = FALSE) } saveRDS(ASVtab, "DADA2_table.rds") df <- t(ASVtab) colnames(df) <- gsub('_1.filt.fastq.gz', '', colnames(df)) colnames(df) <- gsub('.filt.fastq.gz', '', colnames(df)) df <- data.frame(sequence = rownames(df), df, check.names=FALSE) # Create an md5 sum of the sequences as ASV_ID and rearrange columns df\$ASV_ID <- sapply(df\$sequence, digest, algo='md5', serialize = FALSE) df <- df[,c(ncol(df),3:ncol(df)-1,1)] # file to publish write.table(df, file = "DADA2_table.tsv", sep = "\\t", row.names = FALSE, quote = FALSE, na = '') # Write fasta file with ASV sequences to file write.table(data.frame(s = sprintf(">%s\n%s", df\$ASV_ID, df\$sequence)), 'ASV_seqs.fasta', col.names = FALSE, row.names = FALSE, quote = FALSE, na = '') # Write ASV file with ASV abundances to file df\$sequence <- NULL write.table(df, file = "ASV_table.tsv", sep="\\t", row.names = FALSE, quote = FALSE, na = '') writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" dada2: ", packageVersion("dada2")) ), "versions.yml") """ |
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 | """ #!/usr/bin/env Rscript suppressPackageStartupMessages(library(dada2)) suppressPackageStartupMessages(library(ShortRead)) readfiles <- sort(list.files(".", pattern = ".fastq.gz", full.names = TRUE)) #make list of number of sequences readfiles_length <- countLines(readfiles) / 4 sum_readfiles_length <- sum(readfiles_length) #use only the first x files when read number gets above 2147483647, read numbers above that do not fit into an INT and crash the process! if ( sum_readfiles_length > 2147483647 ) { max_files = length(which(cumsum(readfiles_length) <= 2147483647 )) write.table(max_files, file = paste0("WARNING Only ",max_files," of ",length(readfiles)," files and ",sum(readfiles_length[1:max_files])," of ",sum_readfiles_length," reads were used for ${meta} plotQualityProfile.txt"), row.names = FALSE, col.names = FALSE, quote = FALSE, na = '') readfiles <- readfiles[1:max_files] } else { max_files <- length(readfiles) write.table(max_files, file = paste0(max_files," files were used for ${prefix} plotQualityProfile.txt"), row.names = FALSE, col.names = FALSE, quote = FALSE, na = '') } plot <- plotQualityProfile(readfiles$args) data <- plot\$data #aggregate data for each sequencing cycle df <- data.frame(Cycle=character(), Count=character(), Median=character(), stringsAsFactors=FALSE) cycles <- sort(unique(data\$Cycle)) for (cycle in cycles) { subdata <- data[data[, "Cycle"] == cycle, ] score <- list() #convert to list to calculate median for (j in 1:nrow(subdata)) {score <- unlist(c(score, rep(subdata\$Score[j], subdata\$Count[j])))} temp = data.frame(Cycle=cycle, Count=sum(subdata\$Count), Median=median(score), stringsAsFactors=FALSE) df <- rbind(df, temp) } #write output write.table( t(df), file = paste0("${prefix}_qual_stats",".tsv"), sep = "\t", row.names = TRUE, col.names = FALSE, quote = FALSE) pdf(paste0("${prefix}_qual_stats",".pdf")) plot dev.off() svg(paste0("${prefix}_qual_stats",".svg")) plot dev.off() write.table(paste0('plotQualityProfile\t$args\nmax_files\t',max_files), file = "${prefix}_plotQualityProfile.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '') writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" dada2: ", packageVersion("dada2")),paste0(" ShortRead: ", packageVersion("ShortRead")) ), "versions.yml") """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 38 | """ #!/usr/bin/env Rscript suppressPackageStartupMessages(library(dada2)) seqtab = readRDS("${seqtab}") #remove chimera seqtab.nochim <- removeBimeraDenovo(seqtab, $args, multithread=$task.cpus, verbose=TRUE) if ( ${no_samples} == 1 ) { rownames(seqtab.nochim) <- "${first_sample}" } saveRDS(seqtab.nochim,"${meta.run}.ASVtable.rds") write.table('removeBimeraDenovo\t$args', file = "removeBimeraDenovo.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '') writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" dada2: ", packageVersion("dada2")) ), "versions.yml") """ |
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 | """ #!/usr/bin/env Rscript suppressPackageStartupMessages(library(dada2)) #combine filter_and_trim files for (data in list.files("./filter_and_trim_files", full.names=TRUE)){ if (!exists("filter_and_trim")){ filter_and_trim <- read.csv(data, header=TRUE, sep="\\t") } if (exists("filter_and_trim")){ tempory <-read.csv(data, header=TRUE, sep="\\t") filter_and_trim <-unique(rbind(filter_and_trim, tempory)) rm(tempory) } } rownames(filter_and_trim) <- filter_and_trim\$ID filter_and_trim["ID"] <- NULL #write.table( filter_and_trim, file = "${meta.run}.filter_and_trim.tsv", sep = "\\t", row.names = TRUE, quote = FALSE, na = '') #read data dadaFs = readRDS("${denoised[0]}") dadaRs = readRDS("${denoised[1]}") mergers = readRDS("$mergers") seqtab.nochim = readRDS("$seqtab_nochim") #track reads through pipeline getN <- function(x) sum(getUniques(x)) if ( nrow(filter_and_trim) == 1 ) { track <- cbind(filter_and_trim, getN(dadaFs), getN(dadaRs), getN(mergers), rowSums(seqtab.nochim)) } else { track <- cbind(filter_and_trim, sapply(dadaFs, getN), sapply(dadaRs, getN), sapply(mergers, getN), rowSums(seqtab.nochim)) } colnames(track) <- c("DADA2_input", "filtered", "denoisedF", "denoisedR", "merged", "nonchim") rownames(track) <- sub(pattern = "_1.fastq.gz\$", replacement = "", rownames(track)) #this is when cutadapt is skipped! track <- cbind(sample = sub(pattern = "(.*?)\\\\..*\$", replacement = "\\\\1", rownames(track)), track) write.table( track, file = "${meta.run}.stats.tsv", sep = "\\t", row.names = FALSE, quote = FALSE, na = '') writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" dada2: ", packageVersion("dada2")) ), "versions.yml") """ |
60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 | """ #!/usr/bin/env Rscript suppressPackageStartupMessages(library(dada2)) #combine filter_and_trim files for (data in list.files("./filter_and_trim_files", full.names=TRUE)){ if (!exists("filter_and_trim")){ filter_and_trim <- read.csv(data, header=TRUE, sep="\\t") } if (exists("filter_and_trim")){ tempory <-read.csv(data, header=TRUE, sep="\\t") filter_and_trim <-unique(rbind(filter_and_trim, tempory)) rm(tempory) } } rownames(filter_and_trim) <- filter_and_trim\$ID filter_and_trim["ID"] <- NULL #write.table( filter_and_trim, file = "${meta.run}.filter_and_trim.tsv", sep = "\\t", row.names = TRUE, quote = FALSE, na = '') #read data dadaFs = readRDS("${denoised[0]}") seqtab.nochim = readRDS("$seqtab_nochim") #track reads through pipeline getN <- function(x) sum(getUniques(x)) if ( nrow(filter_and_trim) == 1 ) { track <- cbind(filter_and_trim, getN(dadaFs), rowSums(seqtab.nochim)) } else { track <- cbind(filter_and_trim, sapply(dadaFs, getN), rowSums(seqtab.nochim)) } colnames(track) <- c("DADA2_input", "filtered", "denoised", "nonchim") track <- cbind(sample = sub(pattern = "(.*?)\\\\..*\$", replacement = "\\\\1", rownames(track)), track) write.table( track, file = "${meta.run}.stats.tsv", sep = "\\t", row.names = FALSE, quote = FALSE, na = '') writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" dada2: ", packageVersion("dada2")) ), "versions.yml") """ |
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | """ #!/usr/bin/env Rscript suppressPackageStartupMessages(library(dada2)) set.seed($seed) # Initialize random number generator for reproducibility taxlevels <- $taxlevels seq <- getSequences(\"$fasta\", collapse = TRUE, silence = FALSE) taxa <- assignTaxonomy(seq, \"$database\", taxLevels = taxlevels, $args, multithread = $task.cpus, verbose=TRUE, outputBootstraps = TRUE) # (1) Make a data frame, add ASV_ID from seq tx <- data.frame(ASV_ID = names(seq), taxa, sequence = row.names(taxa\$tax), row.names = names(seq)) # (2) Set confidence to the bootstrap for the most specific taxon # extract columns with taxonomic values tax <- tx[ , grepl( "tax." , names( tx ) ) ] # find first occurrence of NA res <- max.col(is.na(tax), ties = "first") # correct if no NA is present in column to NA if(any(res == 1)) is.na(res) <- (res == 1) & !is.na(tax[[1]]) # find index of last entry before NA, which is the bootstrap value res <- res-1 # if NA choose last entry res[is.na(res)] <- ncol(tax) # extract bootstrap values boot <- tx[ , grepl( "boot." , names( tx ) ) ] boot\$last_tax <- res valid_boot <- apply(boot,1,function(x) x[x[length(x)]][1]/100 ) # replace missing bootstrap values (NA) with 0 valid_boot[is.na(valid_boot)] <- 0 # add bootstrap values to column confidence tx\$confidence <- valid_boot # (3) Reorder columns before writing to file expected_order <- c("ASV_ID",paste0("tax.",taxlevels),"confidence","sequence") expected_order <- intersect(expected_order,colnames(tx)) taxa_export <- subset(tx, select = expected_order) colnames(taxa_export) <- sub("tax.", "", colnames(taxa_export)) rownames(taxa_export) <- names(seq) write.table(taxa_export, file = \"$outfile\", sep = "\\t", row.names = FALSE, col.names = TRUE, quote = FALSE, na = '') # Save a version with rownames for addSpecies taxa_export <- cbind( ASV_ID = tx\$ASV_ID, taxa\$tax, confidence = tx\$confidence) saveRDS(taxa_export, "ASV_tax.rds") write.table('assignTaxonomy\t$args\ntaxlevels\t$taxlevels\nseed\t$seed', file = "assignTaxonomy.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '') writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" dada2: ", packageVersion("dada2")) ), "versions.yml") """ |
27 28 29 30 31 32 33 34 35 36 | """ filt_codons.py -f ${fasta} -t ${asv} -p ASV_codon ${args} filt_codon_stats.py ASV_codon_filtered.table.tsv cat <<-END_VERSIONS > versions.yml "${task.process}": python: \$(python --version 2>&1 | sed 's/Python //g') pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)") END_VERSIONS """ |
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 | """ #!/usr/bin/env Rscript #load packages suppressPackageStartupMessages(library(Biostrings)) #read abundance file, first column is ASV_ID $read_table colnames(table)[1] <- "ASV_ID" #read fasta file of ASV sequences seq <- readDNAStringSet("$fasta") seq <- data.frame(ID=names(seq), sequence=paste(seq)) #filter filtered_seq <- seq[nchar(seq\$sequence) %in% $min_len_asv:$max_len_asv,] list <- filtered_seq[, "ID", drop = FALSE] filtered_table <- merge(table, list, by.x="ASV_ID", by.y="ID", all.x=FALSE, all.y=TRUE) #report distribution_before <- table(nchar(seq\$sequence)) distribution_before <- data.frame(Length=names(distribution_before),Counts=as.vector(distribution_before)) distribution_after <- table(nchar(filtered_seq\$sequence)) distribution_after <- data.frame(Length=names(distribution_after),Counts=as.vector(distribution_after)) #write write.table(filtered_table, file = "ASV_table.len.tsv", row.names=FALSE, sep="\t", col.names = TRUE, quote = FALSE, na = '') write.table(data.frame(s = sprintf(">%s\n%s", filtered_seq\$ID, filtered_seq\$sequence)), 'ASV_seqs.len.fasta', col.names = FALSE, row.names = FALSE, quote = FALSE, na = '') write.table(distribution_before, file = "ASV_len_orig.tsv", row.names=FALSE, sep="\t", col.names = TRUE, quote = FALSE, na = '') write.table(distribution_after, file = "ASV_len_filt.tsv", row.names=FALSE, sep="\t", col.names = TRUE, quote = FALSE, na = '') #stats stats <- as.data.frame( t( rbind( colSums(table[-1]), colSums(filtered_table[-1]) ) ) ) stats\$ID <- rownames(stats) colnames(stats) <- c("lenfilter_input","lenfilter_output", "sample") write.table(stats, file = "stats.len.tsv", row.names=FALSE, sep="\t") writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" Biostrings: ", packageVersion("Biostrings")) ), "versions.yml") """ |
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 | """ #!/usr/bin/env Rscript #load packages suppressPackageStartupMessages(library(Biostrings)) kingdom <- as.list(strsplit("$kingdom", ",")[[1]]) df = read.table("$barrnap_summary", header = TRUE, sep = "\t", stringsAsFactors = FALSE) # keep only ASV_ID & eval columns & sort df <- subset(df, select = c(ASV_ID,mito_eval,euk_eval,arc_eval,bac_eval)) # choose kingdom (column) with lowest evalue df[is.na(df)] <- 1 df\$result = colnames(df[,2:5])[apply(df[,2:5],1,which.min)] df\$result = gsub("_eval", "", df\$result) # filter ASVs df_filtered = subset(df, df\$result %in% kingdom) id_filtered = subset(df_filtered, select = c(ASV_ID)) #error if all ASVs are removed if ( nrow(df_filtered) == 0 ) stop("Chosen kingdom(s) by --filter_ssu has no matches. Please choose a different kingdom (domain) or omit filtering.") #read abundance file, first column is ASV_ID table <- read.table(file = "$table", sep = '\t', comment.char = "", header=TRUE) colnames(table)[1] <- "ASV_ID" #read fasta file of ASV sequences seq <- readDNAStringSet("$fasta") seq <- data.frame(ID=names(seq), sequence=paste(seq)) #check if all ids match if(!all(id_filtered\$ID %in% seq\$ID)) {stop(paste(paste(files,sep=","),"and","$fasta","dont share all IDs, exit."), call.=FALSE)} if(!all(id_filtered\$ID %in% table\$ASV_ID)) {stop(paste(paste(files,sep=","),"and","$table","dont share all IDs, exit"), call.=FALSE)} #merge filtered_table <- merge(table, id_filtered, by.x="ASV_ID", by.y="ASV_ID", all.x=FALSE, all.y=TRUE) filtered_seq <- merge(seq, id_filtered, by.x="ID", by.y="ASV_ID", all.x=FALSE, all.y=TRUE) #write write.table(filtered_table, file = "ASV_table.ssu.tsv", row.names=FALSE, sep="\t", col.names = TRUE, quote = FALSE, na = '') write.table(data.frame(s = sprintf(">%s\n%s", filtered_seq\$ID, filtered_seq\$sequence)), 'ASV_seqs.ssu.fasta', col.names = FALSE, row.names = FALSE, quote = FALSE, na = '') #stats stats <- as.data.frame( t( rbind( colSums(table[-1]), colSums(filtered_table[-1]) ) ) ) stats\$ID <- rownames(stats) colnames(stats) <- c("ssufilter_input","ssufilter_output", "sample") write.table(stats, file = "stats.ssu.tsv", row.names=FALSE, sep="\t") writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" Biostrings: ", packageVersion("Biostrings")) ), "versions.yml") """ |
21 22 23 24 25 26 27 28 29 | """ filter_stats.py $unfiltered $filtered cat <<-END_VERSIONS > versions.yml "${task.process}": python: \$(python --version 2>&1 | sed 's/Python //g') pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)") END_VERSIONS """ |
20 21 22 23 24 25 26 27 | """ cat $fastain | sed '/^>/s/\t/ /g' > input.mod.fasta cat <<-END_VERSIONS > versions.yml "${task.process}": sed: \$(sed --version 2>&1 | sed -n 1p | sed 's/sed (GNU sed) //') END_VERSIONS """ |
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 | """ #!/usr/bin/env Rscript tax = read.table("$tax", header = TRUE, sep = "\t", stringsAsFactors = FALSE, comment.char = '', quote = '') df <- data.frame( name=character(), LWR=character(), fract=character(), aLWR=character(), afract=character(), taxopath=character(), stringsAsFactors=FALSE) for (asvid in unique(tax\$name)) { # subset per ASV ID temp = subset(tax, tax\$name == asvid) maxLWR = max(temp\$LWR) temp = subset(temp, temp\$LWR == maxLWR) if ( nrow(temp) == 1 ) { # STEP 1: if there is just 1 maximum, choose this row print( paste ( asvid,"was added in STEP 1" ) ) temp = temp[which.max(temp\$LWR),] df <- rbind(df, temp) } else { # STEP2: be conservative, choose the taxonomy with less entries # count number of entries in "taxopath", simplified: number of semicolons temp\$taxlevels = nchar( temp\$taxopath ) - nchar(gsub(';', '', temp\$taxopath )) + 1 minTaxlevels = min(temp\$taxlevels) temp = subset(temp, temp\$taxlevels == minTaxlevels) temp\$taxlevels = NULL if ( nrow(temp) == 1 ) { df <- rbind(df, temp) print( paste ( asvid,"was added in STEP 2" ) ) } else { # STEP 3: if number of entries is same, remove last entry # then, check if reduced entries are identical > if yes, choose any row, if no, repeat # at that step the taxonomies have same length print( paste ( asvid,"enters STEP 3" ) ) list_taxonpath <- str_split( temp\$taxopath, ";") df_taxonpath <- as.data.frame(do.call(rbind, list_taxonpath)) for (i in ncol(df_taxonpath):0) { # choose first column and change taxon to reduced overlap if ( length(unique(df_taxonpath[,i])) == 1 ) { if (i>1) { temp\$taxopath <- apply(df_taxonpath[1,1:i], 1, paste, collapse=";")[[1]] } else if (i==1) { temp\$taxopath <- df_taxonpath[1,1] } df <- rbind(df, temp[1,]) print( paste ( asvid,"was added with",temp\$taxopath[[1]],"in STEP 3a" ) ) break } else if (i==0) { # if all fails, i.e. there is no consensus on any taxonomic level, use NA temp\$taxopath <- "NA" df <- rbind(df, temp[1,]) print( paste ( asvid,"was added with",temp\$taxopath[[1]],"in STEP 3b" ) ) } } } } } # output the cleaned, unified, unique taxonomic classification per ASV write.table(df, file = "${meta.id}.taxonomy.per_query_unique.tsv", row.names = FALSE, col.names = TRUE, quote = FALSE, na = '', sep = "\\t") # output ASV taxonomy table for QIIME2 df_clean <- subset(df, select = c("name","taxopath")) colnames(df_clean) <- c("ASV_ID","taxonomy") write.table(df_clean, file = "${meta.id}.taxonomy.tsv", row.names = FALSE, col.names = TRUE, quote = FALSE, na = '', sep = "\\t") writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0(" dada2: ", packageVersion("dada2")) ), "versions.yml") """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 | """ ${params.dada_ref_databases[params.dada_ref_taxonomy]["fmtscript"]} #Giving out information echo -e "--dada_ref_taxonomy: ${params.dada_ref_taxonomy}\n" >ref_taxonomy.${suffix}.txt echo -e "Title: ${params.dada_ref_databases[params.dada_ref_taxonomy]["title"]}\n" >>ref_taxonomy.${suffix}.txt echo -e "Citation: ${params.dada_ref_databases[params.dada_ref_taxonomy]["citation"]}\n" >>ref_taxonomy.${suffix}.txt echo "All entries: ${params.dada_ref_databases[params.dada_ref_taxonomy]}" >>ref_taxonomy.${suffix}.txt cat <<-END_VERSIONS > versions.yml "${task.process}": bash: \$(bash --version | sed -n 1p | sed 's/GNU bash, version //g') END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 | """ ${params.qiime_ref_databases[params.qiime_ref_taxonomy]["fmtscript"]} #Giving out information echo -e "--qiime_ref_taxonomy: ${params.qiime_ref_taxonomy}\n" >ref_taxonomy.txt echo -e "Title: ${params.qiime_ref_databases[params.qiime_ref_taxonomy]["title"]}\n" >>ref_taxonomy.txt echo -e "Citation: ${params.qiime_ref_databases[params.qiime_ref_taxonomy]["citation"]}\n" >>ref_taxonomy.txt echo "All entries: ${params.qiime_ref_databases[params.qiime_ref_taxonomy]}" >>ref_taxonomy.txt cat <<-END_VERSIONS > versions.yml "${task.process}": bash: \$(bash --version | sed -n 1p | sed 's/GNU bash, version //g') END_VERSIONS """ |
21 22 23 24 25 26 27 28 29 30 31 32 33 34 | """ ${params.sintax_ref_databases[params.sintax_ref_taxonomy]["fmtscript"]} #Giving out information echo -e "--sintax_ref_taxonomy: ${params.sintax_ref_taxonomy}\n" >ref_taxonomy_sintax.txt echo -e "Title: ${params.sintax_ref_databases[params.sintax_ref_taxonomy]["title"]}\n" >>ref_taxonomy_sintax.txt echo -e "Citation: ${params.sintax_ref_databases[params.sintax_ref_taxonomy]["citation"]}\n" >>ref_taxonomy_sintax.txt echo "All entries: ${params.sintax_ref_databases[params.sintax_ref_taxonomy]}" >>ref_taxonomy_sintax.txt cat <<-END_VERSIONS > versions.yml "${task.process}": bash: \$(bash --version | sed -n 1p | sed 's/GNU bash, version //g') END_VERSIONS """ |
22 23 24 25 26 27 28 29 30 | """ add_full_sequence_to_taxfile.py $taxtable $fastafile $outfile cat <<-END_VERSIONS > versions.yml "${task.process}": python: \$(python --version 2>&1 | sed 's/Python //g') pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)") END_VERSIONS """ |
24 25 26 27 28 29 30 31 | """ convert_sintax_output.py -i $taxtable -f $fastafile -o $outfile -t $taxlevels -d \"${params.sintax_ref_databases[params.sintax_ref_taxonomy]["dbversion"]}\" cat <<-END_VERSIONS > versions.yml "${task.process}": python: \$(python --version 2>&1 | sed 's/Python //g') END_VERSIONS """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | """ ITSx \\ -i $fasta \\ $args \\ --cpu $task.cpus \\ -o ASV_ITS_seqs if [ ! -s $outfile ]; then echo "ERROR: No ITS regions found by ITSx. You might want to modify --cut_its and/or --its_partial" >&2 exit 1 fi echo -e "ITSx\t$args" > ITSx.args.txt cat <<-END_VERSIONS > versions.yml "${task.process}": ITSx: \$( ITSx -h 2>&1 > /dev/null | tail -n 2 | head -n 1 | cut -f 2 -d ' ' ) END_VERSIONS """ |
21 22 23 24 25 26 27 28 29 30 31 32 33 | """ #!/usr/bin/env Rscript x <- read.table(\"file1.tsv\", header = TRUE, sep = "\t", stringsAsFactors = FALSE) y <- read.table(\"file2.tsv\", header = TRUE, sep = "\t", stringsAsFactors = FALSE) #merge df <- merge(x, y, by = "sample", all = TRUE) #write write.table(df, file = \"overall_summary.tsv\", quote=FALSE, col.names=TRUE, row.names=FALSE, sep="\t") writeLines(c("\\"${task.process}\\":", paste0(" R: ", paste0(R.Version()[c("major","minor")], collapse = ".")) ), "versions.yml") """ |
21 22 23 24 25 26 27 28 | """ metadata_all.r ${metadata} cat <<-END_VERSIONS > versions.yml "${task.process}": R: \$(R --version 2>&1 | sed -n 1p | sed 's/R version //' | sed 's/ (.*//') END_VERSIONS """ |
21 22 23 24 25 26 27 28 | """ metadata_pairwise.r ${metadata} cat <<-END_VERSIONS > versions.yml "${task.process}": R: \$(R --version 2>&1 | sed -n 1p | sed 's/R version //' | sed 's/ (.*//') END_VERSIONS """ |
25 26 27 28 29 30 31 32 | """ novaseq_err_pe.r ${errormodel[0]} ${errormodel[1]} ${meta.run} cat <<-END_VERSIONS > versions.yml "${task.process}": R: \$(R --version | sed -n 1p | sed 's/R version //g' | sed 's/\\s.*\$//') END_VERSIONS """ |
34 35 36 37 38 39 40 41 | """ novaseq_err_se.r ${errormodel} ${meta.run} cat <<-END_VERSIONS > versions.yml "${task.process}": R: \$(R --version | sed -n 1p | sed 's/R version //g' | sed 's/\\s.*\$//') END_VERSIONS """ |
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | """ #If input is QIIME2 file, than (1) the first line and (2) the first character (#) of the second line need to be removed if [ "$source" == 'QIIME2' ] then tail -n +2 "$abund" > "${abund}.tmp" && mv "${abund}.tmp" "$abund" fi picrust2_pipeline.py \\ $args \\ -s $seq \\ -i $abund \\ -o all_output \\ -p $task.cpus \\ --in_traits EC,KO \\ --verbose #Add descriptions to identifiers add_descriptions.py -i all_output/EC_metagenome_out/pred_metagenome_unstrat.tsv.gz -m EC \ -o EC_pred_metagenome_unstrat_descrip.tsv add_descriptions.py -i all_output/KO_metagenome_out/pred_metagenome_unstrat.tsv.gz -m KO \ -o KO_pred_metagenome_unstrat_descrip.tsv add_descriptions.py -i all_output/pathways_out/path_abun_unstrat.tsv.gz -m METACYC \ -o METACYC_path_abun_unstrat_descrip.tsv echo "$message" > "${message}.txt" echo -e "picrust\t$args" > "picrust.args.txt" cat <<-END_VERSIONS > versions.yml "${task.process}": python: \$(python --version 2>&1 | sed 's/Python //g') picrust2: \$( picrust2_pipeline.py -v | sed -e "s/picrust2_pipeline.py //g" ) END_VERSIONS """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" maxdepth=\$(count_table_minmax_reads.py $stats maximum 2>&1) #check values if [ \"\$maxdepth\" -gt \"75000\" ]; then maxdepth=\"75000\"; fi if [ \"\$maxdepth\" -gt \"5000\" ]; then maxsteps=\"250\"; else maxsteps=\$((maxdepth/20)); fi qiime diversity alpha-rarefaction \\ --i-table ${table} \\ --i-phylogeny ${tree} \\ --p-max-depth \$maxdepth \\ --m-metadata-file ${metadata} \\ --p-steps \$maxsteps \\ --p-iterations 10 \\ --o-visualization alpha-rarefaction.qzv qiime tools export --input-path alpha-rarefaction.qzv \\ --output-path alpha-rarefaction cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" qiime composition add-pseudocount \\ --i-table ${table} \\ --o-composition-table comp-${table} qiime composition ancom \\ --i-table comp-${table} \\ --m-metadata-file ${metadata} \\ --m-metadata-column ${table.baseName} \\ --o-visualization comp-${table.baseName}.qzv qiime tools export --input-path comp-${table.baseName}.qzv \\ --output-path ancom/Category-${table.baseName}-ASV cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" mkdir ancom # Sum data at the specified level qiime taxa collapse \\ --i-table ${table} \\ --i-taxonomy ${taxonomy} \\ --p-level ${taxlevel} \\ --o-collapsed-table lvl${taxlevel}-${table} # Extract summarised table and output a file with the number of taxa qiime tools export \\ --input-path lvl${taxlevel}-${table} \\ --output-path exported/ biom convert \\ -i exported/feature-table.biom \\ -o ${table.baseName}-level-${taxlevel}.feature-table.tsv \\ --to-tsv if [ \$(grep -v '^#' -c ${table.baseName}-level-${taxlevel}.feature-table.tsv) -lt 2 ]; then echo ${taxlevel} > ancom/\"WARNING Summing your data at taxonomic level ${taxlevel} produced less than two rows (taxa), ANCOM can't proceed -- did you specify a bad reference taxonomy?\".txt else qiime composition add-pseudocount \\ --i-table lvl${taxlevel}-${table} \\ --o-composition-table comp-lvl${taxlevel}-${table} qiime composition ancom \\ --i-table comp-lvl${taxlevel}-${table} \\ --m-metadata-file ${metadata} \\ --m-metadata-column ${table.baseName} \\ --o-visualization comp-lvl${taxlevel}-${table.baseName}.qzv qiime tools export --input-path comp-lvl${taxlevel}-${table.baseName}.qzv \\ --output-path ancom/Category-${table.baseName}-level-${taxlevel} fi cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" qiime taxa barplot \\ --i-table ${table} \\ --i-taxonomy ${taxonomy} \\ ${metadata_cmd} \\ --o-visualization taxa-bar-plots.qzv \\ --verbose qiime tools export \\ --input-path taxa-bar-plots.qzv \\ --output-path barplot${suffix} cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" qiime feature-classifier classify-sklearn \\ --i-classifier ${trained_classifier} \\ --p-n-jobs ${task.cpus} \\ --i-reads ${repseq} \\ --o-classification taxonomy.qza \\ --verbose qiime metadata tabulate \\ --m-input-file taxonomy.qza \\ --o-visualization taxonomy.qzv \\ --verbose #produce "taxonomy/taxonomy.tsv" qiime tools export \\ --input-path taxonomy.qza \\ --output-path taxonomy qiime tools export \\ --input-path taxonomy.qzv \\ --output-path taxonomy cp taxonomy/taxonomy.tsv . cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" qiime diversity adonis \\ --p-n-jobs $task.cpus \\ --i-distance-matrix ${core} \\ --m-metadata-file ${metadata} \\ --o-visualization ${core.baseName}_adonis.qzv \\ $args \\ --p-formula "${formula}" qiime tools export \\ --input-path ${core.baseName}_adonis.qzv \\ --output-path adonis/${core.baseName}-${formula} cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" qiime diversity alpha-group-significance \\ --i-alpha-diversity ${core} \\ --m-metadata-file ${metadata} \\ --o-visualization ${core.baseName}-vis.qzv qiime tools export \\ --input-path ${core.baseName}-vis.qzv \\ --output-path "alpha_diversity/${core.baseName}" cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" qiime diversity beta-group-significance \\ --i-distance-matrix ${core} \\ --m-metadata-file ${metadata} \\ --m-metadata-column \"${category}\" \\ --o-visualization ${core.baseName}-${category}.qzv \\ --p-pairwise qiime tools export \\ --input-path ${core.baseName}-${category}.qzv \\ --output-path beta_diversity/${core.baseName}-${category} cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" mkdir beta_diversity qiime emperor plot \\ --i-pcoa ${core} \\ --m-metadata-file ${metadata} \\ --o-visualization ${core.baseName}-vis.qzv qiime tools export \\ --input-path ${core.baseName}-vis.qzv \ --output-path beta_diversity/${core.baseName}-PCoA cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" mindepth=\$(count_table_minmax_reads.py $stats minimum 2>&1) if [ \"\$mindepth\" -lt \"$mindepth\" ]; then mindepth=$mindepth; fi # report the rarefaction depth and return warning, if needed if [ \"\$mindepth\" -lt \"1000\" ]; then echo \$mindepth >\"WARNING The sampling depth of \$mindepth seems too small for rarefaction.txt\" elif [ \"\$mindepth\" -lt \"5000\" ]; then echo \$mindepth >\"WARNING The sampling depth of \$mindepth is very small for rarefaction.txt\" elif [ \"\$mindepth\" -lt \"10000\" ]; then echo \$mindepth >\"WARNING The sampling depth of \$mindepth is quite small for rarefaction.txt\" else echo \$mindepth >\"Use the sampling depth of \$mindepth for rarefaction.txt\" fi qiime diversity core-metrics-phylogenetic \\ --m-metadata-file ${metadata} \\ --i-phylogeny ${tree} \\ --i-table ${table} \\ --p-sampling-depth \$mindepth \\ --output-dir diversity_core \\ --p-n-jobs-or-threads ${task.cpus} \\ --verbose cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" #produce raw count table in biom format "table/feature-table.biom" qiime tools export \\ --input-path ${table} \\ --output-path table cp table/feature-table.biom . #produce raw count table "table/feature-table.tsv" biom convert \\ -i table/feature-table.biom \\ -o feature-table.tsv \\ --to-tsv #produce representative sequence fasta file "sequences.fasta" qiime feature-table tabulate-seqs \\ --i-data ${repseq} \\ --o-visualization rep-seqs.qzv qiime tools export \\ --input-path rep-seqs.qzv \\ --output-path representative_sequences cp representative_sequences/sequences.fasta rep-seq.fasta cp representative_sequences/*.tsv . ##on several taxa level array=(\$(seq ${tax_agglom_min} 1 ${tax_agglom_max})) for i in \${array[@]} do #collapse taxa qiime taxa collapse \\ --i-table ${table} \\ --i-taxonomy ${taxonomy} \\ --p-level \$i \\ --o-collapsed-table table-\$i.qza #export to biom qiime tools export \\ --input-path table-\$i.qza \\ --output-path table-\$i #convert to tab separated text file biom convert \\ -i table-\$i/feature-table.biom \\ -o abs-abund-table-\$i.tsv --to-tsv done cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" #convert to relative abundances qiime feature-table relative-frequency \\ --i-table ${table} \\ --o-relative-frequency-table relative-table-ASV.qza #export to biom qiime tools export \\ --input-path relative-table-ASV.qza \\ --output-path relative-table-ASV #convert to tab separated text file "rel-table-ASV.tsv" biom convert \\ -i relative-table-ASV/feature-table.biom \\ -o rel-table-ASV.tsv --to-tsv cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" ##on several taxa level array=(\$(seq ${tax_agglom_min} 1 ${tax_agglom_max})) for i in \${array[@]} do #collapse taxa qiime taxa collapse \\ --i-table ${table} \\ --i-taxonomy ${taxonomy} \\ --p-level \$i \\ --o-collapsed-table table-\$i.qza #convert to relative abundances qiime feature-table relative-frequency \\ --i-table table-\$i.qza \\ --o-relative-frequency-table relative-table-\$i.qza #export to biom qiime tools export \\ --input-path relative-table-\$i.qza \\ --output-path relative-table-\$i #convert to tab separated text file biom convert \\ -i relative-table-\$i/feature-table.biom \\ -o rel-table-\$i.tsv --to-tsv done cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" ### Import qiime tools import \\ --type \'FeatureData[Sequence]\' \\ --input-path ${database[0]} \\ --output-path ref-seq.qza qiime tools import \\ --type \'FeatureData[Taxonomy]\' \\ --input-format HeaderlessTSVTaxonomyFormat \\ --input-path ${database[1]} \\ --output-path ref-taxonomy.qza #Extract sequences based on primers qiime feature-classifier extract-reads \\ --i-sequences ref-seq.qza \\ --p-f-primer ${meta.FW_primer} \\ --p-r-primer ${meta.RV_primer} \\ --o-reads ${meta.FW_primer}-${meta.RV_primer}-ref-seq.qza \\ --quiet cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" qiime feature-table filter-samples \\ --i-table "${table}" \\ --m-metadata-file "${metadata}" \\ --p-where \"${category}<>\'\'\" \\ --o-filtered-table "filtered_${category}.qza" qiime feature-table group \\ --i-table "filtered_${category}.qza" \\ --p-axis 'sample' \\ --m-metadata-file "${metadata}" \\ --m-metadata-column "${category}" \\ --p-mode 'sum' \\ --o-grouped-table "${category}" cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 38 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" qiime feature-table filter-samples \\ --i-table ${table} \\ --m-metadata-file ${metadata} \\ $args \\ --o-filtered-table ${prefix}.qza cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" if ! [ \"${exclude_taxa}\" = \"none\" ]; then #filter sequences qiime taxa filter-seqs \\ --i-sequences ${repseq} \\ --i-taxonomy ${taxonomy} \\ --p-exclude ${exclude_taxa} --p-mode contains \\ --o-filtered-sequences tax_filtered-sequences.qza #filter abundance table qiime taxa filter-table \\ --i-table ${table} \\ --i-taxonomy ${taxonomy} \ --p-exclude ${exclude_taxa} \\ --p-mode contains \\ --o-filtered-table tax_filtered-table.qza filtered_table="tax_filtered-table.qza" filtered_sequences="tax_filtered-sequences.qza" else filtered_table=${table} filtered_sequences=${repseq} fi qiime feature-table filter-features \\ --i-table \$filtered_table \\ --p-min-frequency ${min_frequency} \\ --p-min-samples ${min_samples} \\ --o-filtered-table filtered-table.qza qiime feature-table filter-seqs \\ --i-data \$filtered_sequences \\ --i-table filtered-table.qza \\ --o-filtered-data filtered-sequences.qza #produce raw count table in biom format "table/feature-table.biom" qiime tools export \\ --input-path filtered-table.qza \\ --output-path table #produce raw count table biom convert \\ -i table/feature-table.biom \\ -o table/feature-table.tsv \\ --to-tsv cp table/feature-table.tsv filtered-table.tsv cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 | """ # remove first line if needed sed '/^# Constructed from biom file/d' "$asv" > biom-table.txt # load into QIIME2 biom convert -i biom-table.txt -o table.biom --table-type="OTU table" --to-hdf5 qiime tools import \\ --input-path table.biom \\ --type 'FeatureTable[Frequency]' \\ --input-format BIOMV210Format \\ --output-path table.qza cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 | """ qiime tools import \\ --input-path "$seq" \\ --type 'FeatureData[Sequence]' \\ --output-path rep-seqs.qza cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 | """ parse_dada2_taxonomy.r $tax qiime tools import \\ --type 'FeatureData[Taxonomy]' \\ --input-format HeaderlessTSVTaxonomyFormat \\ --input-path tax.tsv \\ --output-path taxonomy.qza cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 | """ qiime tools import \\ --type 'Phylogeny[Rooted]' \\ --input-path $tree \\ --output-path tree.qza cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" #Train classifier qiime feature-classifier fit-classifier-naive-bayes \\ --i-reference-reads ${meta.FW_primer}-${meta.RV_primer}-ref-seq.qza \\ --i-reference-taxonomy ref-taxonomy.qza \\ --o-classifier ${meta.FW_primer}-${meta.RV_primer}-classifier.qza \\ --quiet cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 | """ export XDG_CONFIG_HOME="\${PWD}/HOME" qiime alignment mafft \\ --i-sequences ${repseq} \\ --o-alignment aligned-rep-seqs.qza \\ --p-n-threads ${task.cpus} qiime alignment mask \\ --i-alignment aligned-rep-seqs.qza \\ --o-masked-alignment masked-aligned-rep-seqs.qza qiime phylogeny fasttree \\ --i-alignment masked-aligned-rep-seqs.qza \\ --p-n-threads ${task.cpus} \\ --o-tree unrooted-tree.qza qiime phylogeny midpoint-root \\ --i-tree unrooted-tree.qza \\ --o-rooted-tree rooted-tree.qza qiime tools export \\ --input-path rooted-tree.qza \\ --output-path phylogenetic_tree cp phylogenetic_tree/tree.nwk . cat <<-END_VERSIONS > versions.yml "${task.process}": qiime2: \$( qiime --version | sed '1!d;s/.* //' ) END_VERSIONS """ |
24 25 26 27 28 29 30 31 32 33 34 35 | """ if [ ! -f ${meta.id}.fastq.gz ]; then $args $reads ${meta.id}.fastq.gz else touch ${meta.id}.fastq.gz fi cat <<-END_VERSIONS > versions.yml "${task.process}": sed: \$(sed --version 2>&1 | sed -n 1p | sed 's/sed (GNU sed) //') END_VERSIONS """ |
37 38 39 40 41 42 43 44 45 | """ [ -f "${meta.id}_1.fastq.gz" ] || $args "${reads[0]}" "${meta.id}_1.fastq.gz" [ -f "${meta.id}_2.fastq.gz" ] || $args "${reads[1]}" "${meta.id}_2.fastq.gz" cat <<-END_VERSIONS > versions.yml "${task.process}": sed: \$(sed --version 2>&1 | sed -n 1p | sed 's/sed (GNU sed) //') END_VERSIONS """ |
24 25 26 27 28 29 30 31 | """ sbdiexport.R $args $asvtable $taxonomytable $metadata cat <<-END_VERSIONS > versions.yml "${task.process}": R: \$(R --version 2>&1 | sed -n 1p | sed 's/R version //' | sed 's/ (.*//') END_VERSIONS """ |
24 25 26 27 28 29 30 31 32 33 34 35 36 37 | """ if [[ $workflow.manifest.version == *dev ]]; then ampliseq_version="v$workflow.manifest.version, revision: ${workflow.scriptId.substring(0,10)}" else ampliseq_version="v$workflow.manifest.version" fi sbdiexportreannotate.R \"$dbversion\" $taxonomytable $taxonomymethod \"\$ampliseq_version\" $predictions cat <<-END_VERSIONS > versions.yml "${task.process}": R: \$(R --version 2>&1 | sed -n 1p | sed 's/R version //' | sed 's/ (.*//') END_VERSIONS """ |
22 23 24 25 26 27 28 29 30 | """ trunclen.py $qual_stats $args cat <<-END_VERSIONS > versions.yml "${task.process}": python: \$(python --version 2>&1 | sed 's/Python //g') pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)") END_VERSIONS """ |
25 26 27 28 29 30 31 32 33 34 35 36 | """ cutadapt \\ --cores $task.cpus \\ $args \\ $trimmed \\ $reads \\ > ${prefix}.cutadapt.log cat <<-END_VERSIONS > versions.yml "${task.process}": cutadapt: \$(cutadapt --version) END_VERSIONS """ |
41 42 43 44 45 46 47 48 49 | """ touch ${prefix}.cutadapt.log touch ${trimmed} cat <<-END_VERSIONS > versions.yml "${task.process}": cutadapt: \$(cutadapt --version) END_VERSIONS """ |
33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 | """ epa-ng \\ $args \\ --threads $task.cpus \\ $queryarg \\ $refalnarg \\ $reftreearg \\ $bfastarg \\ $binaryarg if [ -e epa_result.jplace ]; then gzip epa_result.jplace cp epa_result.jplace.gz ${prefix}.epa_result.jplace.gz fi [ -e epa_info.log ] && cp epa_info.log ${prefix}.epa_info.log cat <<-END_VERSIONS > versions.yml "${task.process}": epang: \$(echo \$(epa-ng --version 2>&1) | sed 's/^EPA-ng v//') END_VERSIONS """ |
24 25 26 27 28 29 30 31 32 33 34 35 36 | """ epa-ng \\ $args \\ --split $refaln $fullaln gzip -c query.fasta > ${prefix}.query.fasta.gz; rm query.fasta gzip -c reference.fasta > ${prefix}.reference.fasta.gz; rm reference.fasta cat <<-END_VERSIONS > versions.yml "${task.process}": epang: \$(echo \$(epa-ng --version 2>&1) | sed 's/^EPA-ng v//') END_VERSIONS """ |
28 29 30 31 32 33 34 35 36 37 38 | """ printf "%s %s\\n" $rename_to | while read old_name new_name; do [ -f "\${new_name}" ] || ln -s \$old_name \$new_name done fastqc $args --threads $task.cpus $renamed_files cat <<-END_VERSIONS > versions.yml "${task.process}": fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" ) END_VERSIONS """ |
42 43 44 45 46 47 48 49 50 | """ touch ${prefix}.html touch ${prefix}.zip cat <<-END_VERSIONS > versions.yml "${task.process}": fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" ) END_VERSIONS """ |
28 29 30 31 32 33 34 35 36 37 38 39 40 41 | """ gappa \\ examine assign \\ $args \\ --threads $task.cpus \\ --jplace-path $jplace \\ --taxon-file $taxonomy \\ --file-prefix ${prefix}. cat <<-END_VERSIONS > versions.yml "${task.process}": gappa: \$(echo \$(gappa --version 2>&1 | sed 's/v//' )) END_VERSIONS """ |
23 24 25 26 27 28 29 30 31 32 33 34 35 36 | """ gappa \\ examine \\ graft \\ $args \\ --threads $task.cpus \\ --file-prefix ${prefix}. \\ --jplace-path $jplace cat <<-END_VERSIONS > versions.yml "${task.process}": gappa: \$(echo \$(gappa --version 2>&1 | sed 's/v//' )) END_VERSIONS """ |
28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | """ gappa \\ examine \\ heat-tree \\ --threads $task.cpus \\ --file-prefix ${prefix}. \\ --jplace-path $jplace \\ $args \\ --log-file ${prefix}.log grep '^ *At' ${prefix}.log > ${prefix}.colours.txt cat <<-END_VERSIONS > versions.yml "${task.process}": gappa: \$(echo \$(gappa --version 2>&1 | sed 's/v//' )) END_VERSIONS """ |
36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 | """ esl-alimask \\ $args \\ $fmask_rfarg \\ $fmask_allarg \\ $gmask_rfarg \\ $gmask_allarg \\ $pmask_rfarg \\ $pmask_allarg \\ -o ${prefix}.masked.sthlm \\ $unmaskedaln \\ $maskfile gzip ${prefix}.*mask* cat <<-END_VERSIONS > versions.yml "${task.process}": hmmer/easel: \$(esl-reformat -h | grep -o '^# Easel [0-9.]*' | sed 's/^# Easel *//') END_VERSIONS """ |
26 27 28 29 30 31 32 33 34 35 36 37 | """ esl-reformat \\ $args \\ $seqfile \\ $postproc \\ | gzip -c > ${prefix}.${suffix}.gz cat <<-END_VERSIONS > versions.yml "${task.process}": hmmer/easel: \$(esl-reformat -h | grep -o '^# Easel [0-9.]*' | sed 's/^# Easel *//') END_VERSIONS """ |
24 25 26 27 28 29 30 31 32 33 34 | """ hmmalign \\ $args \\ $hmm \\ $fasta | gzip -c > ${prefix}.sthlm.gz cat <<-END_VERSIONS > versions.yml "${task.process}": hmmer: \$(hmmalign -h | grep -o '^# HMMER [0-9.]*' | sed 's/^# HMMER *//') END_VERSIONS """ |
27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 | """ hmmbuild \\ $args \\ --cpu $task.cpus \\ -n ${prefix} \\ -o ${prefix}.hmmbuild.txt \\ ${mxfileopt} \\ ${prefix}.hmm \\ $alignment gzip ${prefix}.hmm cat <<-END_VERSIONS > versions.yml "${task.process}": hmmer: \$(echo \$(hmmbuild -h | grep HMMER | sed 's/# HMMER //' | sed 's/ .*//' 2>&1)) END_VERSIONS """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 | """ mafft \\ --thread ${task.cpus} \\ ${args} \\ ${add} \\ ${fasta} \\ > ${prefix}.fas cat <<-END_VERSIONS > versions.yml "${task.process}": mafft: \$(mafft --version 2>&1 | sed 's/^v//' | sed 's/ (.*)//') END_VERSIONS """ |
28 29 30 31 32 33 34 35 36 37 38 39 40 | """ multiqc \\ --force \\ $args \\ $config \\ $extra_config \\ . cat <<-END_VERSIONS > versions.yml "${task.process}": multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" ) END_VERSIONS """ |
43 44 45 46 47 48 49 50 51 52 | """ touch multiqc_data touch multiqc_plots touch multiqc_report.html cat <<-END_VERSIONS > versions.yml "${task.process}": multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" ) END_VERSIONS """ |
25 26 27 28 29 30 31 32 33 34 35 36 37 | """ vsearch \\ --sintax $queryfasta \\ --db $db \\ --threads $task.cpus \\ $args \\ --tabbedout ${prefix}.tsv cat <<-END_VERSIONS > versions.yml "${task.process}": vsearch: \$(vsearch --version 2>&1 | head -n 1 | sed 's/vsearch //g' | sed 's/,.*//g' | sed 's/^v//' | sed 's/_.*//') END_VERSIONS """ |
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 | """ vsearch \\ --usearch_global $queryfasta \\ --db $db \\ --id $idcutoff \\ --threads $task.cpus \\ $args \\ ${columns} \\ ${outfmt} ${prefix}.${out_ext} cat <<-END_VERSIONS > versions.yml "${task.process}": vsearch: \$(vsearch --version 2>&1 | head -n 1 | sed 's/vsearch //g' | sed 's/,.*//g' | sed 's/^v//' | sed 's/_.*//') END_VERSIONS """ |
Support
- Future updates