Amplicon sequencing analysis workflow using DADA2 and QIIME2

public 1yr ago Version: 2.6.1 0 bookmarks

View Workflow

Help improve this workflow!

This workflow has been published but could be further improved with some additional meta data:

Keyword(s) in categories input, output, operation

You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .

Introduction

nfcore/ampliseq is a bioinformatics analysis pipeline used for amplicon sequencing, supporting denoising of any amplicon and, currently, taxonomic assignment of 16S, ITS, CO1 and 18S amplicons. Phylogenetic placement is also possible. Supported is paired-end Illumina or single-end Illumina, PacBio and IonTorrent data. Default is the analysis of 16S rRNA gene amplicons sequenced paired-end with Illumina.

A video about relevance, usage and output of the pipeline (version 2.1.0; 26th Oct. 2021) can also be found in YouTube and billibilli , the slides are deposited at figshare .

nf-core/ampliseq workflow overview

The pipeline is built using Nextflow , a workflow tool to run tasks across multiple compute infrastructures in a very portable manner. It uses Docker/Singularity containers making installation trivial and results highly reproducible. The Nextflow DSL2 implementation of this pipeline uses one container per process which makes it much easier to maintain and update software dependencies.

On release, automated continuous integration tests run the pipeline on a full-sized dataset on the AWS cloud infrastructure. This ensures that the pipeline runs on AWS, has sensible resource allocation defaults set to run on real-world datasets, and permits the persistent storage of results to benchmark between pipeline releases and other analysis sources. The results obtained from the full-sized test can be viewed on the nf-core website .

Pipeline summary

By default, the pipeline currently performs the following:

Sequencing quality control ( FastQC )
Trimming of reads ( Cutadapt )
Infer Amplicon Sequence Variants (ASVs) ( DADA2 )
Predict whether ASVs are ribosomal RNA sequences ( Barrnap )
Phylogenetic placement ( EPA-NG )
Taxonomical classification using DADA2, SINTAX or QIIME2
Excludes unwanted taxa, produces absolute and relative feature/taxa count tables and plots, plots alpha rarefaction curves, computes alpha and beta diversity indices and plots thereof ( QIIME2 )
Calls differentially abundant taxa ( ANCOM )
Overall pipeline run summaries ( MultiQC )

Usage

Note If you are new to Nextflow and nf-core, please refer to this page on how to set-up Nextflow. Make sure to test your setup with -profile test before running the workflow on actual data.

First, you need to know whether the sequencing files at hand are expected to contain primer sequences (usually yes) and if yes, what primer sequences. In the example below, the paired end sequencing data was produced with 515f (GTGYCAGCMGCCGCGGTAA) and 806r (GGACTACNVGGGTWTCTAAT) primers of the V4 region of the 16S rRNA gene. Please note, that those sequences should not contain any sequencing adapter sequences, only the sequence that matches the biological amplicon.

Next, the data needs to be organized in a folder, here data , or detailed in a samplesheet (see input documentation ).

Now, you can run the pipeline using:

nextflow run nf-core/ampliseq \
 -profile <docker/singularity/.../institute> \
 --input "data" \
 --FW_primer "GTGYCAGCMGCCGCGGTAA" \
 --RV_primer "GGACTACNVGGGTWTCTAAT" \
 --outdir <OUTDIR>

Note Adding metadata will considerably increase the output, see metadata documentation .

Warning: Please provide pipeline parameters via the CLI or Nextflow -params-file option. Custom config files including those provided by the -c Nextflow option can be used to provide any configuration except for parameters ; see docs .

For more details, please refer to the usage documentation and the parameter documentation .

Pipeline output

To see the the results of a test run with a full size dataset refer to the results tab on the nf-core website pipeline page. For more details about the output files and reports, please refer to the output documentation .

Credits

nf-core/ampliseq was originally written by Daniel Straub ( @d4straub ) and Alexander Peltzer ( @apeltzer ) for use at the Quantitative Biology Center (QBiC) and Microbial Ecology, Center for Applied Geosciences , part of Eberhard Karls Universität Tübingen (Germany). Daniel Lundin @erikrikarddaniel ( Linnaeus University, Sweden ) joined before pipeline release 2.0.0 and helped to improve the pipeline considerably.

We thank the following people for their extensive assistance in the development of this pipeline (in alphabetical order):

Contributions and Support

If you would like to contribute to this pipeline, please see the contributing guidelines .

For further information or help, don't hesitate to get in touch on the Slack #ampliseq channel (you can join with this invite ).

Citations

If you use nf-core/ampliseq for your analysis, please cite the ampliseq article as follows:

Interpretations of Environmental Microbial Community Studies Are Biased by the Selected 16S rRNA (Gene) Amplicon Sequencing Pipeline

Daniel Straub, Nia Blackwell, Adrian Langarica-Fuentes, Alexander Peltzer, Sven Nahnsen, Sara Kleindienst

Frontiers in Microbiology 2020, 11:2652 doi: 10.3389/fmicb.2020.550420 .

You can cite the nf-core/ampliseq zenodo record for a specific version using the following doi: 10.5281/zenodo.1493841

An extensive list of references for the tools used by the pipeline can be found in the CITATIONS.md file.

You can cite the nf-core publication as follows:

The nf-core framework for community-curated bioinformatics pipelines.

Philip Ewels, Alexander Peltzer, Sven Fillinger, Harshil Patel, Johannes Alneberg, Andreas Wilm, Maxime Ulysse Garcia, Paolo Di Tommaso & Sven Nahnsen.

Nat Biotechnol. 2020 Feb 13. doi: 10.1038/s41587-020-0439-x .

Code Snippets

"""
add_sh_to_taxonomy.py ${sh_info.join(' ')} $asvtable $blastfile $outtable $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version 2>&1 | sed 's/Python //g')
    pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
END_VERSIONS
"""

NextFlow Pandas From line 25 of local/assignsh.nf

"""
IFS=',' read -r -a kingdom <<< \"$kingdom\"

for KINGDOM in "\${kingdom[@]}"
do
    barrnap \\
        --threads $task.cpus \\
        $args \\
        --kingdom \$KINGDOM \\
        --outseq Filtered.\${KINGDOM}.fasta \\
        < $fasta \\
        > rrna.\${KINGDOM}.gff

    #this fails when processing an empty file, so it requires a workaround!
    if [ -s Filtered.\${KINGDOM}.fasta ]; then
        grep -h '>' Filtered.\${KINGDOM}.fasta | sed 's/^>//' | sed 's/:\\+/\\t/g' | awk '{print \$2}' | sort -u >\${KINGDOM}.matches.txt
    else
        touch \${KINGDOM}.matches.txt
    fi
done

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    barrnap: \$(echo \$(barrnap --version 2>&1) | sed "s/^.*barrnap //g")
END_VERSIONS
"""

NextFlow Barrnap From line 24 of local/barrnap.nf

"""
summarize_barrnap.py $predictions

if [[ \$(wc -l < summary.tsv ) -le 1 ]]; then
    touch WARNING_no_rRNA_found_warning.txt
else
    touch no_warning.txt
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$( python --version )
END_VERSIONS
"""

NextFlow From line 22 of local/barrnapsummary.nf

"""
combine_table.r ${table} ${seq} ${tax}
mv combined_ASV_table.tsv ${outfile}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    R: \$(R --version 2>&1 | sed -n 1p | sed 's/R version //' | sed 's/ (.*//')
END_VERSIONS
"""

NextFlow From line 23 of local/combine_table.nf

"""
#!/usr/bin/env Rscript
standard <- read.table(\"${files[0]}\", header = TRUE, sep = "\\t", stringsAsFactors = FALSE)
doubleprimer <- read.table(\"${files[1]}\", header = TRUE, sep = "\\t", stringsAsFactors = FALSE)
colnames(doubleprimer) <- c("sample", "cutadapt_doubleprimer_total_processed", "cutadapt_doubleprimer_reverse_complemented", "cutadapt_doubleprimer_passing_filters", "cutadapt_doubleprimer_passing_filters_percent")

#merge
df <- merge(standard, doubleprimer, by = "sample")

#filter columns
remove_columns <- c("cutadapt_doubleprimer_total_processed")
for(column in remove_columns) df[column]<-NULL

#write
write.table(df, file = \"cutadapt_summary.tsv\", quote=FALSE, col.names=TRUE, row.names=FALSE, sep="\\t")

writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")) ), "versions.yml")
"""

NextFlow From line 23 of local/cutadapt_summary_merge.nf

"""
cp $files cutadapt_summary.tsv
"""

NextFlow From line 42 of local/cutadapt_summary_merge.nf

"""
cutadapt_summary.py $mode *.cutadapt.log > ${name}_summary.tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$( python --version )
END_VERSIONS
"""

NextFlow From line 23 of local/cutadapt_summary.nf

"""
#!/usr/bin/env Rscript
suppressPackageStartupMessages(library(dada2))
set.seed($seed) # Initialize random number generator for reproducibility

#add "Species" if not already in taxlevels
taxlevels <- $taxlevels
if ( !"Species" %in% taxlevels ) { taxlevels <- c(taxlevels,"Species") }

taxtable <- readRDS(\"$taxtable\")

#remove Species annotation from assignTaxonomy
taxa_nospecies <- taxtable[,!colnames(taxtable) %in% 'Species']

tx <- addSpecies(taxa_nospecies, \"$database\", $args, verbose=TRUE)

# Create a table with specified column order
tmp <- data.frame(row.names(tx)) # To separate ASV_ID from sequence
expected_order <- c("ASV_ID",taxlevels,"confidence")
taxa <- as.data.frame( subset(tx, select = expected_order) )
taxa\$sequence <- tmp[,1]
row.names(taxa) <- row.names(tmp)

#rename Species annotation to Species_exact
colnames(taxa)[which(names(taxa) == "Species")] <- "Species_exact"

#add Species annotation from assignTaxonomy again, after "Genus" column
if ( "Species" %in% colnames(taxtable) ) {
    taxtable <- data.frame(taxtable)
    taxa_export <- data.frame(append(taxa, list(Species=taxtable\$Species), after=match("Genus", names(taxa))))
} else {
    taxa_export <- taxa
}

write.table(taxa_export, file = \"$outfile\", sep = "\\t", row.names = FALSE, col.names = TRUE, quote = FALSE, na = '')

write.table('addSpecies\t$args\ntaxlevels\t$taxlevels\nseed\t$seed', file = "addSpecies.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '')
writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    dada2: ", packageVersion("dada2")) ), "versions.yml")
"""

NextFlow dada2 From line 31 of local/dada2_addspecies.nf

"""
#!/usr/bin/env Rscript
suppressPackageStartupMessages(library(dada2))

errF = readRDS("${errormodel[0]}")
errR = readRDS("${errormodel[1]}")

filtFs <- sort(list.files("./filtered/", pattern = "_1.filt.fastq.gz", full.names = TRUE))
filtRs <- sort(list.files("./filtered/", pattern = "_2.filt.fastq.gz", full.names = TRUE))

#denoising
sink(file = "${meta.run}.dada.log")
dadaFs <- dada(filtFs, err = errF, $args, multithread = $task.cpus)
saveRDS(dadaFs, "${meta.run}_1.dada.rds")
dadaRs <- dada(filtRs, err = errR, $args, multithread = $task.cpus)
saveRDS(dadaRs, "${meta.run}_2.dada.rds")
sink(file = NULL)

#make table
mergers <- mergePairs(dadaFs, filtFs, dadaRs, filtRs, $args2, verbose=TRUE)
saveRDS(mergers, "${meta.run}.mergers.rds")
seqtab <- makeSequenceTable(mergers)
saveRDS(seqtab, "${meta.run}.seqtab.rds")

write.table('dada\t$args', file = "dada.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '')
write.table('mergePairs\t$args2', file = "mergePairs.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '')
writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    dada2: ", packageVersion("dada2")) ), "versions.yml")
"""

NextFlow dada2 From line 29 of local/dada2_denoising.nf

"""
#!/usr/bin/env Rscript
suppressPackageStartupMessages(library(dada2))

errF = readRDS("${errormodel}")

filtFs <- sort(list.files("./filtered/", pattern = ".fastq.gz", full.names = TRUE))

#denoising
sink(file = "${meta.run}.dada.log")
dadaFs <- dada(filtFs, err = errF, $args, multithread = $task.cpus)
saveRDS(dadaFs, "${meta.run}.dada.rds")
sink(file = NULL)

#make table
seqtab <- makeSequenceTable(dadaFs)
saveRDS(seqtab, "${meta.run}.seqtab.rds")

#dummy file to fulfill output rules
saveRDS("dummy", "dummy_${meta.run}.mergers.rds")

write.table('dada\t$args', file = "dada.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '')
writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    dada2: ", packageVersion("dada2")) ), "versions.yml")
"""

NextFlow dada2 From line 58 of local/dada2_denoising.nf

"""
#!/usr/bin/env Rscript
suppressPackageStartupMessages(library(dada2))
set.seed($seed) # Initialize random number generator for reproducibility

fnFs <- sort(list.files(".", pattern = "_1.filt.fastq.gz", full.names = TRUE))
fnRs <- sort(list.files(".", pattern = "_2.filt.fastq.gz", full.names = TRUE))

sink(file = "${meta.run}.err.log")
errF <- learnErrors(fnFs, $args, multithread = $task.cpus, verbose = TRUE)
saveRDS(errF, "${meta.run}_1.err.rds")
errR <- learnErrors(fnRs, $args, multithread = $task.cpus, verbose = TRUE)
saveRDS(errR, "${meta.run}_2.err.rds")
sink(file = NULL)

pdf("${meta.run}_1.err.pdf")
plotErrors(errF, nominalQ = TRUE)
dev.off()
svg("${meta.run}_1.err.svg")
plotErrors(errF, nominalQ = TRUE)
dev.off()

pdf("${meta.run}_2.err.pdf")
plotErrors(errR, nominalQ = TRUE)
dev.off()
svg("${meta.run}_2.err.svg")
plotErrors(errR, nominalQ = TRUE)
dev.off()

sink(file = "${meta.run}_1.err.convergence.txt")
dada2:::checkConvergence(errF)
sink(file = NULL)

sink(file = "${meta.run}_2.err.convergence.txt")
dada2:::checkConvergence(errR)
sink(file = NULL)

write.table('learnErrors\t$args', file = "learnErrors.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '')
writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    dada2: ", packageVersion("dada2")) ), "versions.yml")
"""

NextFlow dada2 From line 29 of local/dada2_err.nf

"""
#!/usr/bin/env Rscript
suppressPackageStartupMessages(library(dada2))
set.seed($seed) # Initialize random number generator for reproducibility

fnFs <- sort(list.files(".", pattern = ".filt.fastq.gz", full.names = TRUE))

sink(file = "${meta.run}.err.log")
errF <- learnErrors(fnFs, $args, multithread = $task.cpus, verbose = TRUE)
saveRDS(errF, "${meta.run}.err.rds")
sink(file = NULL)

pdf("${meta.run}.err.pdf")
plotErrors(errF, nominalQ = TRUE)
dev.off()
svg("${meta.run}.err.svg")
plotErrors(errF, nominalQ = TRUE)
dev.off()

sink(file = "${meta.run}.err.convergence.txt")
dada2:::checkConvergence(errF)
sink(file = NULL)

write.table('learnErrors\t$args', file = "learnErrors.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '')
writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    dada2: ", packageVersion("dada2")) ), "versions.yml")
"""

NextFlow dada2 From line 70 of local/dada2_err.nf

"""
#!/usr/bin/env Rscript
suppressPackageStartupMessages(library(dada2))

out <- filterAndTrim($in_and_out,
    $trunc_args,
    $args,
    compress = TRUE,
    multithread = $task.cpus,
    verbose = TRUE)
out <- cbind(out, ID = row.names(out))

write.table( out, file = "${meta.id}.filter_stats.tsv", sep = "\\t", row.names = FALSE, quote = FALSE, na = '')
write.table(paste('filterAndTrim\t$trunc_args','$args',sep=","), file = "filterAndTrim.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '')
writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    dada2: ", packageVersion("dada2")) ), "versions.yml")
"""

NextFlow dada2 From line 28 of local/dada2_filtntrim.nf

"""
#!/usr/bin/env Rscript
suppressPackageStartupMessages(library(dada2))
suppressPackageStartupMessages(library(digest))

#combine stats files
for (data in sort(list.files(".", pattern = ".stats.tsv", full.names = TRUE))) {
    if (!exists("stats")){ stats <- read.csv(data, header=TRUE, sep="\\t") }
    if (exists("stats")){
        temp <-read.csv(data, header=TRUE, sep="\\t")
        stats <-unique(rbind(stats, temp))
        rm(temp)
    }
}
write.table( stats, file = "DADA2_stats.tsv", sep = "\\t", row.names = FALSE, col.names = TRUE, quote = FALSE, na = '')

#combine dada-class objects
files <- sort(list.files(".", pattern = ".ASVtable.rds", full.names = TRUE))
if ( length(files) == 1 ) {
    ASVtab = readRDS(files[1])
} else {
    ASVtab <- mergeSequenceTables(tables=files, repeats = "error", orderBy = "abundance", tryRC = FALSE)
}
saveRDS(ASVtab, "DADA2_table.rds")

df <- t(ASVtab)
colnames(df) <- gsub('_1.filt.fastq.gz', '', colnames(df))
colnames(df) <- gsub('.filt.fastq.gz', '', colnames(df))
df <- data.frame(sequence = rownames(df), df, check.names=FALSE)
# Create an md5 sum of the sequences as ASV_ID and rearrange columns
df\$ASV_ID <- sapply(df\$sequence, digest, algo='md5', serialize = FALSE)
df <- df[,c(ncol(df),3:ncol(df)-1,1)]

# file to publish
write.table(df, file = "DADA2_table.tsv", sep = "\\t", row.names = FALSE, quote = FALSE, na = '')

# Write fasta file with ASV sequences to file
write.table(data.frame(s = sprintf(">%s\n%s", df\$ASV_ID, df\$sequence)), 'ASV_seqs.fasta', col.names = FALSE, row.names = FALSE, quote = FALSE, na = '')

# Write ASV file with ASV abundances to file
df\$sequence <- NULL
write.table(df, file = "ASV_table.tsv", sep="\\t", row.names = FALSE, quote = FALSE, na = '')

writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    dada2: ", packageVersion("dada2")) ), "versions.yml")
"""

NextFlow dada2 From line 25 of local/dada2_merge.nf

"""
#!/usr/bin/env Rscript

suppressPackageStartupMessages(library(dada2))
suppressPackageStartupMessages(library(ShortRead))

readfiles <- sort(list.files(".", pattern = ".fastq.gz", full.names = TRUE))

#make list of number of sequences
readfiles_length <- countLines(readfiles) / 4
sum_readfiles_length <- sum(readfiles_length)

#use only the first x files when read number gets above 2147483647, read numbers above that do not fit into an INT and crash the process!
if ( sum_readfiles_length > 2147483647 ) {
    max_files = length(which(cumsum(readfiles_length) <= 2147483647 ))
    write.table(max_files, file = paste0("WARNING Only ",max_files," of ",length(readfiles)," files and ",sum(readfiles_length[1:max_files])," of ",sum_readfiles_length," reads were used for ${meta} plotQualityProfile.txt"), row.names = FALSE, col.names = FALSE, quote = FALSE, na = '')
    readfiles <- readfiles[1:max_files]
} else {
    max_files <- length(readfiles)
    write.table(max_files, file = paste0(max_files," files were used for ${prefix} plotQualityProfile.txt"), row.names = FALSE, col.names = FALSE, quote = FALSE, na = '')
}

plot <- plotQualityProfile(readfiles$args)
data <- plot\$data

#aggregate data for each sequencing cycle
df <- data.frame(Cycle=character(), Count=character(), Median=character(), stringsAsFactors=FALSE)
cycles <- sort(unique(data\$Cycle))
for (cycle in cycles) {
    subdata <- data[data[, "Cycle"] == cycle, ]
    score <- list()
    #convert to list to calculate median
    for (j in 1:nrow(subdata)) {score <- unlist(c(score, rep(subdata\$Score[j], subdata\$Count[j])))}
    temp = data.frame(Cycle=cycle, Count=sum(subdata\$Count), Median=median(score), stringsAsFactors=FALSE)
    df <- rbind(df, temp)
}

#write output
write.table( t(df), file = paste0("${prefix}_qual_stats",".tsv"), sep = "\t", row.names = TRUE, col.names = FALSE, quote = FALSE)
pdf(paste0("${prefix}_qual_stats",".pdf"))
plot
dev.off()
svg(paste0("${prefix}_qual_stats",".svg"))
plot
dev.off()

write.table(paste0('plotQualityProfile\t$args\nmax_files\t',max_files), file = "${prefix}_plotQualityProfile.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '')
writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    dada2: ", packageVersion("dada2")),paste0("    ShortRead: ", packageVersion("ShortRead")) ), "versions.yml")
"""

NextFlow dada2 ShortRead From line 27 of local/dada2_quality.nf

"""
#!/usr/bin/env Rscript
suppressPackageStartupMessages(library(dada2))

seqtab = readRDS("${seqtab}")

#remove chimera
seqtab.nochim <- removeBimeraDenovo(seqtab, $args, multithread=$task.cpus, verbose=TRUE)
if ( ${no_samples} == 1 ) { rownames(seqtab.nochim) <- "${first_sample}" }
saveRDS(seqtab.nochim,"${meta.run}.ASVtable.rds")

write.table('removeBimeraDenovo\t$args', file = "removeBimeraDenovo.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '')
writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    dada2: ", packageVersion("dada2")) ), "versions.yml")
"""

NextFlow dada2 From line 25 of local/dada2_rmchimera.nf

"""
#!/usr/bin/env Rscript
suppressPackageStartupMessages(library(dada2))

#combine filter_and_trim files
for (data in list.files("./filter_and_trim_files", full.names=TRUE)){
    if (!exists("filter_and_trim")){ filter_and_trim <- read.csv(data, header=TRUE, sep="\\t") }
    if (exists("filter_and_trim")){
        tempory <-read.csv(data, header=TRUE, sep="\\t")
        filter_and_trim <-unique(rbind(filter_and_trim, tempory))
        rm(tempory)
    }
}
rownames(filter_and_trim) <- filter_and_trim\$ID
filter_and_trim["ID"] <- NULL
#write.table( filter_and_trim, file = "${meta.run}.filter_and_trim.tsv", sep = "\\t", row.names = TRUE, quote = FALSE, na = '')

#read data
dadaFs = readRDS("${denoised[0]}")
dadaRs = readRDS("${denoised[1]}")
mergers = readRDS("$mergers")
seqtab.nochim = readRDS("$seqtab_nochim")

#track reads through pipeline
getN <- function(x) sum(getUniques(x))
if ( nrow(filter_and_trim) == 1 ) {
    track <- cbind(filter_and_trim, getN(dadaFs), getN(dadaRs), getN(mergers), rowSums(seqtab.nochim))
} else {
    track <- cbind(filter_and_trim, sapply(dadaFs, getN), sapply(dadaRs, getN), sapply(mergers, getN), rowSums(seqtab.nochim))
}
colnames(track) <- c("DADA2_input", "filtered", "denoisedF", "denoisedR", "merged", "nonchim")
rownames(track) <- sub(pattern = "_1.fastq.gz\$", replacement = "", rownames(track)) #this is when cutadapt is skipped!
track <- cbind(sample = sub(pattern = "(.*?)\\\\..*\$", replacement = "\\\\1", rownames(track)), track)
write.table( track, file = "${meta.run}.stats.tsv", sep = "\\t", row.names = FALSE, quote = FALSE, na = '')

writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    dada2: ", packageVersion("dada2")) ), "versions.yml")
"""

NextFlow Cutadapt dada2 From line 22 of local/dada2_stats.nf

"""
#!/usr/bin/env Rscript
suppressPackageStartupMessages(library(dada2))

#combine filter_and_trim files
for (data in list.files("./filter_and_trim_files", full.names=TRUE)){
    if (!exists("filter_and_trim")){ filter_and_trim <- read.csv(data, header=TRUE, sep="\\t") }
    if (exists("filter_and_trim")){
        tempory <-read.csv(data, header=TRUE, sep="\\t")
        filter_and_trim <-unique(rbind(filter_and_trim, tempory))
        rm(tempory)
    }
}
rownames(filter_and_trim) <- filter_and_trim\$ID
filter_and_trim["ID"] <- NULL
#write.table( filter_and_trim, file = "${meta.run}.filter_and_trim.tsv", sep = "\\t", row.names = TRUE, quote = FALSE, na = '')

#read data
dadaFs = readRDS("${denoised[0]}")
seqtab.nochim = readRDS("$seqtab_nochim")

#track reads through pipeline
getN <- function(x) sum(getUniques(x))
if ( nrow(filter_and_trim) == 1 ) {
    track <- cbind(filter_and_trim, getN(dadaFs), rowSums(seqtab.nochim))
} else {
    track <- cbind(filter_and_trim, sapply(dadaFs, getN), rowSums(seqtab.nochim))
}
colnames(track) <- c("DADA2_input", "filtered", "denoised", "nonchim")
track <- cbind(sample = sub(pattern = "(.*?)\\\\..*\$", replacement = "\\\\1", rownames(track)), track)
write.table( track, file = "${meta.run}.stats.tsv", sep = "\\t", row.names = FALSE, quote = FALSE, na = '')

writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    dada2: ", packageVersion("dada2")) ), "versions.yml")
"""

NextFlow dada2 From line 60 of local/dada2_stats.nf

"""
#!/usr/bin/env Rscript
suppressPackageStartupMessages(library(dada2))
set.seed($seed) # Initialize random number generator for reproducibility

taxlevels <- $taxlevels

seq <- getSequences(\"$fasta\", collapse = TRUE, silence = FALSE)
taxa <- assignTaxonomy(seq, \"$database\", taxLevels = taxlevels, $args, multithread = $task.cpus, verbose=TRUE, outputBootstraps = TRUE)

# (1) Make a data frame, add ASV_ID from seq
tx <- data.frame(ASV_ID = names(seq), taxa, sequence = row.names(taxa\$tax), row.names = names(seq))

# (2) Set confidence to the bootstrap for the most specific taxon
# extract columns with taxonomic values
tax <- tx[ , grepl( "tax." , names( tx ) ) ]
# find first occurrence of NA
res <- max.col(is.na(tax), ties = "first")
# correct if no NA is present in column to NA
if(any(res == 1)) is.na(res) <- (res == 1) & !is.na(tax[[1]])
# find index of last entry before NA, which is the bootstrap value
res <- res-1
# if NA choose last entry
res[is.na(res)] <- ncol(tax)
# extract bootstrap values
boot <- tx[ , grepl( "boot." , names( tx ) ) ]
boot\$last_tax <- res
valid_boot <- apply(boot,1,function(x) x[x[length(x)]][1]/100 )
# replace missing bootstrap values (NA) with 0
valid_boot[is.na(valid_boot)] <- 0
# add bootstrap values to column confidence
tx\$confidence <- valid_boot

# (3) Reorder columns before writing to file
expected_order <- c("ASV_ID",paste0("tax.",taxlevels),"confidence","sequence")
expected_order <- intersect(expected_order,colnames(tx))
taxa_export <- subset(tx, select = expected_order)
colnames(taxa_export) <- sub("tax.", "", colnames(taxa_export))
rownames(taxa_export) <- names(seq)

write.table(taxa_export, file = \"$outfile\", sep = "\\t", row.names = FALSE, col.names = TRUE, quote = FALSE, na = '')

# Save a version with rownames for addSpecies
taxa_export <- cbind( ASV_ID = tx\$ASV_ID, taxa\$tax, confidence = tx\$confidence)
saveRDS(taxa_export, "ASV_tax.rds")

write.table('assignTaxonomy\t$args\ntaxlevels\t$taxlevels\nseed\t$seed', file = "assignTaxonomy.args.txt", row.names = FALSE, col.names = FALSE, quote = FALSE, na = '')
writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    dada2: ", packageVersion("dada2")) ), "versions.yml")
"""

NextFlow dada2 From line 31 of local/dada2_taxonomy.nf

"""
filt_codons.py -f ${fasta} -t ${asv} -p ASV_codon ${args}
filt_codon_stats.py ASV_codon_filtered.table.tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version 2>&1 | sed 's/Python //g')
    pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
END_VERSIONS
"""

NextFlow Pandas From line 27 of local/filter_codons.nf

"""
#!/usr/bin/env Rscript

#load packages
suppressPackageStartupMessages(library(Biostrings))

#read abundance file, first column is ASV_ID
$read_table
colnames(table)[1] <- "ASV_ID"

#read fasta file of ASV sequences
seq <- readDNAStringSet("$fasta")
seq <- data.frame(ID=names(seq), sequence=paste(seq))

#filter
filtered_seq <- seq[nchar(seq\$sequence) %in% $min_len_asv:$max_len_asv,]
list <- filtered_seq[, "ID", drop = FALSE]
filtered_table <- merge(table, list, by.x="ASV_ID", by.y="ID", all.x=FALSE, all.y=TRUE)

#report
distribution_before <- table(nchar(seq\$sequence))
distribution_before <- data.frame(Length=names(distribution_before),Counts=as.vector(distribution_before))
distribution_after <- table(nchar(filtered_seq\$sequence))
distribution_after <- data.frame(Length=names(distribution_after),Counts=as.vector(distribution_after))

#write
write.table(filtered_table, file = "ASV_table.len.tsv", row.names=FALSE, sep="\t", col.names = TRUE, quote = FALSE, na = '')
write.table(data.frame(s = sprintf(">%s\n%s", filtered_seq\$ID, filtered_seq\$sequence)), 'ASV_seqs.len.fasta', col.names = FALSE, row.names = FALSE, quote = FALSE, na = '')
write.table(distribution_before, file = "ASV_len_orig.tsv", row.names=FALSE, sep="\t", col.names = TRUE, quote = FALSE, na = '')
write.table(distribution_after, file = "ASV_len_filt.tsv", row.names=FALSE, sep="\t", col.names = TRUE, quote = FALSE, na = '')

#stats
stats <- as.data.frame( t( rbind( colSums(table[-1]), colSums(filtered_table[-1]) ) ) )
stats\$ID <- rownames(stats)
colnames(stats) <- c("lenfilter_input","lenfilter_output", "sample")
write.table(stats, file = "stats.len.tsv", row.names=FALSE, sep="\t")

writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    Biostrings: ", packageVersion("Biostrings")) ), "versions.yml")
"""

NextFlow Biostrings From line 30 of local/filter_len_asv.nf

"""
#!/usr/bin/env Rscript

#load packages
suppressPackageStartupMessages(library(Biostrings))

kingdom <- as.list(strsplit("$kingdom", ",")[[1]])

df = read.table("$barrnap_summary", header = TRUE, sep = "\t", stringsAsFactors = FALSE)
# keep only ASV_ID & eval columns & sort
df <- subset(df, select = c(ASV_ID,mito_eval,euk_eval,arc_eval,bac_eval))

# choose kingdom (column) with lowest evalue
df[is.na(df)] <- 1
df\$result = colnames(df[,2:5])[apply(df[,2:5],1,which.min)]
df\$result = gsub("_eval", "", df\$result)

# filter ASVs
df_filtered = subset(df, df\$result %in% kingdom)
id_filtered = subset(df_filtered, select = c(ASV_ID))

#error if all ASVs are removed
if ( nrow(df_filtered) == 0 ) stop("Chosen kingdom(s) by --filter_ssu has no matches. Please choose a different kingdom (domain) or omit filtering.")

#read abundance file, first column is ASV_ID
table <- read.table(file = "$table", sep = '\t', comment.char = "", header=TRUE)
colnames(table)[1] <- "ASV_ID"

#read fasta file of ASV sequences
seq <- readDNAStringSet("$fasta")
seq <- data.frame(ID=names(seq), sequence=paste(seq))

#check if all ids match
if(!all(id_filtered\$ID %in% seq\$ID))  {stop(paste(paste(files,sep=","),"and","$fasta","dont share all IDs, exit."), call.=FALSE)}
if(!all(id_filtered\$ID %in% table\$ASV_ID))  {stop(paste(paste(files,sep=","),"and","$table","dont share all IDs, exit"), call.=FALSE)}

#merge
filtered_table <- merge(table, id_filtered, by.x="ASV_ID", by.y="ASV_ID", all.x=FALSE, all.y=TRUE)
filtered_seq <- merge(seq, id_filtered, by.x="ID", by.y="ASV_ID", all.x=FALSE, all.y=TRUE)

#write
write.table(filtered_table, file = "ASV_table.ssu.tsv", row.names=FALSE, sep="\t", col.names = TRUE, quote = FALSE, na = '')
write.table(data.frame(s = sprintf(">%s\n%s", filtered_seq\$ID, filtered_seq\$sequence)), 'ASV_seqs.ssu.fasta', col.names = FALSE, row.names = FALSE, quote = FALSE, na = '')

#stats
stats <- as.data.frame( t( rbind( colSums(table[-1]), colSums(filtered_table[-1]) ) ) )
stats\$ID <- rownames(stats)
colnames(stats) <- c("ssufilter_input","ssufilter_output", "sample")
write.table(stats, file = "stats.ssu.tsv", row.names=FALSE, sep="\t")

writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    Biostrings: ", packageVersion("Biostrings")) ), "versions.yml")
"""

NextFlow Biostrings From line 26 of local/filter_ssu.nf

"""
filter_stats.py $unfiltered $filtered

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version 2>&1 | sed 's/Python //g')
    pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
END_VERSIONS
"""

NextFlow Pandas From line 21 of local/filter_stats.nf

"""
cat $fastain | sed '/^>/s/\t/ /g' > input.mod.fasta

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    sed: \$(sed --version 2>&1 | sed -n 1p | sed 's/sed (GNU sed) //')
END_VERSIONS
"""

NextFlow From line 20 of local/format_fastainput.nf

"""
#!/usr/bin/env Rscript

tax = read.table("$tax", header = TRUE, sep = "\t", stringsAsFactors = FALSE, comment.char = '', quote = '')

df <- data.frame(
    name=character(),
    LWR=character(),
    fract=character(),
    aLWR=character(),
    afract=character(),
    taxopath=character(),
    stringsAsFactors=FALSE)

for (asvid in unique(tax\$name)) {
    # subset per ASV ID
    temp = subset(tax, tax\$name == asvid)
    maxLWR = max(temp\$LWR)
    temp = subset(temp, temp\$LWR == maxLWR)

    if ( nrow(temp) == 1 ) {
        # STEP 1: if there is just 1 maximum, choose this row
        print( paste ( asvid,"was added in STEP 1" ) )
        temp = temp[which.max(temp\$LWR),]
        df <- rbind(df, temp)
    } else {
        # STEP2: be conservative, choose the taxonomy with less entries
        # count number of entries in "taxopath", simplified: number of semicolons
        temp\$taxlevels = nchar( temp\$taxopath ) - nchar(gsub(';', '', temp\$taxopath )) + 1
        minTaxlevels = min(temp\$taxlevels)
        temp = subset(temp, temp\$taxlevels == minTaxlevels)
        temp\$taxlevels = NULL
        if ( nrow(temp) == 1 ) {
            df <- rbind(df, temp)
            print( paste ( asvid,"was added in STEP 2" ) )
        } else {
            # STEP 3: if number of entries is same, remove last entry
            # then, check if reduced entries are identical > if yes, choose any row, if no, repeat
            # at that step the taxonomies have same length
            print( paste ( asvid,"enters STEP 3" ) )
            list_taxonpath <- str_split( temp\$taxopath, ";")
            df_taxonpath <- as.data.frame(do.call(rbind, list_taxonpath))
            for (i in ncol(df_taxonpath):0) {
                # choose first column and change taxon to reduced overlap
                if ( length(unique(df_taxonpath[,i])) == 1 ) {
                    if (i>1) {
                        temp\$taxopath <- apply(df_taxonpath[1,1:i], 1, paste, collapse=";")[[1]]
                    } else if (i==1) {
                        temp\$taxopath <- df_taxonpath[1,1]
                    }
                    df <- rbind(df, temp[1,])
                    print( paste ( asvid,"was added with",temp\$taxopath[[1]],"in STEP 3a" ) )
                    break
                } else if (i==0) {
                    # if all fails, i.e. there is no consensus on any taxonomic level, use NA
                    temp\$taxopath <- "NA"
                    df <- rbind(df, temp[1,])
                    print( paste ( asvid,"was added with",temp\$taxopath[[1]],"in STEP 3b" ) )
                }
            }
        }
    }
}

# output the cleaned, unified, unique taxonomic classification per ASV
write.table(df, file = "${meta.id}.taxonomy.per_query_unique.tsv", row.names = FALSE, col.names = TRUE, quote = FALSE, na = '', sep = "\\t")

# output ASV taxonomy table for QIIME2
df_clean <- subset(df, select = c("name","taxopath"))
colnames(df_clean) <- c("ASV_ID","taxonomy")
write.table(df_clean, file = "${meta.id}.taxonomy.tsv", row.names = FALSE, col.names = TRUE, quote = FALSE, na = '', sep = "\\t")

writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")),paste0("    dada2: ", packageVersion("dada2")) ), "versions.yml")
"""

NextFlow Consensus From line 22 of local/format_pplacetax.nf

"""
${params.dada_ref_databases[params.dada_ref_taxonomy]["fmtscript"]}

#Giving out information
echo -e "--dada_ref_taxonomy: ${params.dada_ref_taxonomy}\n" >ref_taxonomy.${suffix}.txt
echo -e "Title: ${params.dada_ref_databases[params.dada_ref_taxonomy]["title"]}\n" >>ref_taxonomy.${suffix}.txt
echo -e "Citation: ${params.dada_ref_databases[params.dada_ref_taxonomy]["citation"]}\n" >>ref_taxonomy.${suffix}.txt
echo "All entries: ${params.dada_ref_databases[params.dada_ref_taxonomy]}" >>ref_taxonomy.${suffix}.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bash: \$(bash --version | sed -n 1p | sed 's/GNU bash, version //g')
END_VERSIONS
"""

NextFlow From line 23 of local/format_taxonomy.nf

"""
${params.qiime_ref_databases[params.qiime_ref_taxonomy]["fmtscript"]}

#Giving out information
echo -e "--qiime_ref_taxonomy: ${params.qiime_ref_taxonomy}\n" >ref_taxonomy.txt
echo -e "Title: ${params.qiime_ref_databases[params.qiime_ref_taxonomy]["title"]}\n" >>ref_taxonomy.txt
echo -e "Citation: ${params.qiime_ref_databases[params.qiime_ref_taxonomy]["citation"]}\n" >>ref_taxonomy.txt
echo "All entries: ${params.qiime_ref_databases[params.qiime_ref_taxonomy]}" >>ref_taxonomy.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bash: \$(bash --version | sed -n 1p | sed 's/GNU bash, version //g')
END_VERSIONS
"""

NextFlow From line 23 of local/format_taxonomy_qiime.nf

"""
${params.sintax_ref_databases[params.sintax_ref_taxonomy]["fmtscript"]}

#Giving out information
echo -e "--sintax_ref_taxonomy: ${params.sintax_ref_taxonomy}\n" >ref_taxonomy_sintax.txt
echo -e "Title: ${params.sintax_ref_databases[params.sintax_ref_taxonomy]["title"]}\n" >>ref_taxonomy_sintax.txt
echo -e "Citation: ${params.sintax_ref_databases[params.sintax_ref_taxonomy]["citation"]}\n" >>ref_taxonomy_sintax.txt
echo "All entries: ${params.sintax_ref_databases[params.sintax_ref_taxonomy]}" >>ref_taxonomy_sintax.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    bash: \$(bash --version | sed -n 1p | sed 's/GNU bash, version //g')
END_VERSIONS
"""

NextFlow From line 21 of local/format_taxonomy_sintax.nf

"""
add_full_sequence_to_taxfile.py $taxtable $fastafile $outfile

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version 2>&1 | sed 's/Python //g')
    pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
END_VERSIONS
"""

NextFlow Pandas From line 22 of local/format_taxresults.nf

"""
convert_sintax_output.py -i $taxtable -f $fastafile -o $outfile -t $taxlevels -d \"${params.sintax_ref_databases[params.sintax_ref_taxonomy]["dbversion"]}\"

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version 2>&1 | sed 's/Python //g')
END_VERSIONS
"""

NextFlow From line 24 of local/format_taxresults_sintax.nf

"""
ITSx \\
    -i $fasta \\
    $args \\
    --cpu $task.cpus \\
    -o ASV_ITS_seqs

if [ ! -s $outfile ]; then
    echo "ERROR: No ITS regions found by ITSx. You might want to modify --cut_its and/or --its_partial" >&2
    exit 1
fi

echo -e "ITSx\t$args" > ITSx.args.txt
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    ITSx: \$( ITSx -h 2>&1 > /dev/null | tail -n 2 | head -n 1 | cut -f 2 -d ' ' )
END_VERSIONS
"""

NextFlow ITSx From line 25 of local/itsx_cutasv.nf

"""
#!/usr/bin/env Rscript
x <- read.table(\"file1.tsv\", header = TRUE, sep = "\t", stringsAsFactors = FALSE)
y <- read.table(\"file2.tsv\", header = TRUE, sep = "\t", stringsAsFactors = FALSE)

#merge
df <- merge(x, y, by = "sample", all = TRUE)

#write
write.table(df, file = \"overall_summary.tsv\", quote=FALSE, col.names=TRUE, row.names=FALSE, sep="\t")

writeLines(c("\\"${task.process}\\":", paste0("    R: ", paste0(R.Version()[c("major","minor")], collapse = ".")) ), "versions.yml")
"""

NextFlow From line 21 of local/merge_stats.nf

"""
metadata_all.r ${metadata}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    R: \$(R --version 2>&1 | sed -n 1p | sed 's/R version //' | sed 's/ (.*//')
END_VERSIONS
"""

NextFlow From line 21 of local/metadata_all.nf

"""
metadata_pairwise.r ${metadata}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    R: \$(R --version 2>&1 | sed -n 1p | sed 's/R version //' | sed 's/ (.*//')
END_VERSIONS
"""

NextFlow From line 21 of local/metadata_pairwise.nf

"""
novaseq_err_pe.r ${errormodel[0]} ${errormodel[1]} ${meta.run}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    R: \$(R --version | sed -n 1p | sed 's/R version //g' | sed 's/\\s.*\$//')
END_VERSIONS
"""

NextFlow From line 25 of local/novaseq_err.nf

"""
novaseq_err_se.r ${errormodel} ${meta.run}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    R: \$(R --version | sed -n 1p | sed 's/R version //g' | sed 's/\\s.*\$//')
END_VERSIONS
"""

NextFlow From line 34 of local/novaseq_err.nf

"""
#If input is QIIME2 file, than (1) the first line and (2) the first character (#) of the second line need to be removed
if [ "$source" == 'QIIME2' ]
then
    tail -n +2 "$abund" > "${abund}.tmp" && mv "${abund}.tmp" "$abund"
fi

picrust2_pipeline.py \\
    $args \\
    -s $seq \\
    -i $abund \\
    -o all_output \\
    -p $task.cpus \\
    --in_traits EC,KO \\
    --verbose

#Add descriptions to identifiers
add_descriptions.py -i all_output/EC_metagenome_out/pred_metagenome_unstrat.tsv.gz -m EC \
                -o EC_pred_metagenome_unstrat_descrip.tsv
add_descriptions.py -i all_output/KO_metagenome_out/pred_metagenome_unstrat.tsv.gz -m KO \
                -o KO_pred_metagenome_unstrat_descrip.tsv
add_descriptions.py -i all_output/pathways_out/path_abun_unstrat.tsv.gz -m METACYC \
                -o METACYC_path_abun_unstrat_descrip.tsv

echo "$message" > "${message}.txt"
echo -e "picrust\t$args" > "picrust.args.txt"

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version 2>&1 | sed 's/Python //g')
    picrust2: \$( picrust2_pipeline.py -v | sed -e "s/picrust2_pipeline.py //g" )
END_VERSIONS
"""

NextFlow From line 28 of local/picrust.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

maxdepth=\$(count_table_minmax_reads.py $stats maximum 2>&1)

#check values
if [ \"\$maxdepth\" -gt \"75000\" ]; then maxdepth=\"75000\"; fi
if [ \"\$maxdepth\" -gt \"5000\" ]; then maxsteps=\"250\"; else maxsteps=\$((maxdepth/20)); fi
qiime diversity alpha-rarefaction  \\
    --i-table ${table}  \\
    --i-phylogeny ${tree}  \\
    --p-max-depth \$maxdepth  \\
    --m-metadata-file ${metadata}  \\
    --p-steps \$maxsteps  \\
    --p-iterations 10  \\
    --o-visualization alpha-rarefaction.qzv
qiime tools export --input-path alpha-rarefaction.qzv  \\
    --output-path alpha-rarefaction

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 25 of local/qiime2_alphararefaction.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

qiime composition add-pseudocount \\
    --i-table ${table} \\
    --o-composition-table comp-${table}
qiime composition ancom \\
    --i-table comp-${table} \\
    --m-metadata-file ${metadata} \\
    --m-metadata-column ${table.baseName} \\
    --o-visualization comp-${table.baseName}.qzv
qiime tools export --input-path comp-${table.baseName}.qzv \\
    --output-path ancom/Category-${table.baseName}-ASV

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 26 of local/qiime2_ancom_asv.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"
mkdir ancom

# Sum data at the specified level
qiime taxa collapse \\
    --i-table ${table} \\
    --i-taxonomy ${taxonomy} \\
    --p-level ${taxlevel} \\
    --o-collapsed-table lvl${taxlevel}-${table}

# Extract summarised table and output a file with the number of taxa
qiime tools export \\
    --input-path lvl${taxlevel}-${table} \\
    --output-path exported/
biom convert \\
    -i exported/feature-table.biom \\
    -o ${table.baseName}-level-${taxlevel}.feature-table.tsv \\
    --to-tsv

if [ \$(grep -v '^#' -c ${table.baseName}-level-${taxlevel}.feature-table.tsv) -lt 2 ]; then
    echo ${taxlevel} > ancom/\"WARNING Summing your data at taxonomic level ${taxlevel} produced less than two rows (taxa), ANCOM can't proceed -- did you specify a bad reference taxonomy?\".txt
else
    qiime composition add-pseudocount \\
            --i-table lvl${taxlevel}-${table} \\
            --o-composition-table comp-lvl${taxlevel}-${table}
    qiime composition ancom \\
            --i-table comp-lvl${taxlevel}-${table} \\
            --m-metadata-file ${metadata} \\
            --m-metadata-column ${table.baseName} \\
            --o-visualization comp-lvl${taxlevel}-${table.baseName}.qzv
    qiime tools export --input-path comp-lvl${taxlevel}-${table.baseName}.qzv \\
            --output-path ancom/Category-${table.baseName}-level-${taxlevel}
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 24 of local/qiime2_ancom_tax.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

qiime taxa barplot  \\
    --i-table ${table}  \\
    --i-taxonomy ${taxonomy}  \\
    ${metadata_cmd}  \\
    --o-visualization taxa-bar-plots.qzv  \\
    --verbose
qiime tools export \\
    --input-path taxa-bar-plots.qzv  \\
    --output-path barplot${suffix}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 27 of local/qiime2_barplot.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

qiime feature-classifier classify-sklearn  \\
    --i-classifier ${trained_classifier}  \\
    --p-n-jobs ${task.cpus}  \\
    --i-reads ${repseq}  \\
    --o-classification taxonomy.qza  \\
    --verbose
qiime metadata tabulate  \\
    --m-input-file taxonomy.qza  \\
    --o-visualization taxonomy.qzv  \\
    --verbose
#produce "taxonomy/taxonomy.tsv"
qiime tools export \\
    --input-path taxonomy.qza  \\
    --output-path taxonomy
qiime tools export \\
    --input-path taxonomy.qzv  \\
    --output-path taxonomy
cp taxonomy/taxonomy.tsv .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 25 of local/qiime2_classify.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

qiime diversity adonis \\
    --p-n-jobs $task.cpus \\
    --i-distance-matrix ${core} \\
    --m-metadata-file ${metadata} \\
    --o-visualization ${core.baseName}_adonis.qzv \\
    $args \\
    --p-formula "${formula}"
qiime tools export \\
    --input-path ${core.baseName}_adonis.qzv \\
    --output-path adonis/${core.baseName}-${formula}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 24 of local/qiime2_diversity_adonis.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

qiime diversity alpha-group-significance \\
    --i-alpha-diversity ${core} \\
    --m-metadata-file ${metadata} \\
    --o-visualization ${core.baseName}-vis.qzv
qiime tools export \\
    --input-path ${core.baseName}-vis.qzv \\
    --output-path "alpha_diversity/${core.baseName}"

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 23 of local/qiime2_diversity_alpha.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

qiime diversity beta-group-significance \\
    --i-distance-matrix ${core} \\
    --m-metadata-file ${metadata} \\
    --m-metadata-column \"${category}\" \\
    --o-visualization ${core.baseName}-${category}.qzv \\
    --p-pairwise
qiime tools export \\
    --input-path ${core.baseName}-${category}.qzv \\
    --output-path beta_diversity/${core.baseName}-${category}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 23 of local/qiime2_diversity_beta.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"
mkdir beta_diversity

qiime emperor plot \\
    --i-pcoa ${core} \\
    --m-metadata-file ${metadata} \\
    --o-visualization ${core.baseName}-vis.qzv
qiime tools export \\
    --input-path ${core.baseName}-vis.qzv \
    --output-path beta_diversity/${core.baseName}-PCoA

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 23 of local/qiime2_diversity_betaord.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

mindepth=\$(count_table_minmax_reads.py $stats minimum 2>&1)
if [ \"\$mindepth\" -lt \"$mindepth\" ]; then mindepth=$mindepth; fi

# report the rarefaction depth and return warning, if needed
if [ \"\$mindepth\" -lt \"1000\" ]; then
    echo \$mindepth >\"WARNING The sampling depth of \$mindepth seems too small for rarefaction.txt\"
elif [ \"\$mindepth\" -lt \"5000\" ]; then
    echo \$mindepth >\"WARNING The sampling depth of \$mindepth is very small for rarefaction.txt\"
elif [ \"\$mindepth\" -lt \"10000\" ]; then
    echo \$mindepth >\"WARNING The sampling depth of \$mindepth is quite small for rarefaction.txt\"
else
    echo \$mindepth >\"Use the sampling depth of \$mindepth for rarefaction.txt\"
fi

qiime diversity core-metrics-phylogenetic \\
    --m-metadata-file ${metadata} \\
    --i-phylogeny ${tree} \\
    --i-table ${table} \\
    --p-sampling-depth \$mindepth \\
    --output-dir diversity_core \\
    --p-n-jobs-or-threads ${task.cpus} \\
    --verbose

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 29 of local/qiime2_diversity_core.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

#produce raw count table in biom format "table/feature-table.biom"
qiime tools export \\
    --input-path ${table} \\
    --output-path table
cp table/feature-table.biom .

#produce raw count table "table/feature-table.tsv"
biom convert \\
    -i table/feature-table.biom \\
    -o feature-table.tsv \\
    --to-tsv

#produce representative sequence fasta file "sequences.fasta"
qiime feature-table tabulate-seqs \\
    --i-data ${repseq} \\
    --o-visualization rep-seqs.qzv
qiime tools export \\
    --input-path rep-seqs.qzv \\
    --output-path representative_sequences
cp representative_sequences/sequences.fasta rep-seq.fasta
cp representative_sequences/*.tsv .

##on several taxa level
array=(\$(seq ${tax_agglom_min} 1 ${tax_agglom_max}))
for i in \${array[@]}
do
    #collapse taxa
    qiime taxa collapse \\
        --i-table ${table} \\
        --i-taxonomy ${taxonomy} \\
        --p-level \$i \\
        --o-collapsed-table table-\$i.qza
    #export to biom
    qiime tools export \\
        --input-path table-\$i.qza \\
        --output-path table-\$i
    #convert to tab separated text file
    biom convert \\
        -i table-\$i/feature-table.biom \\
        -o abs-abund-table-\$i.tsv --to-tsv
done

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 31 of local/qiime2_export_absolute.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

#convert to relative abundances
qiime feature-table relative-frequency \\
    --i-table ${table} \\
    --o-relative-frequency-table relative-table-ASV.qza

#export to biom
qiime tools export \\
    --input-path relative-table-ASV.qza \\
    --output-path relative-table-ASV

#convert to tab separated text file "rel-table-ASV.tsv"
biom convert \\
    -i relative-table-ASV/feature-table.biom \\
    -o rel-table-ASV.tsv --to-tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 22 of local/qiime2_export_relasv.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

##on several taxa level
array=(\$(seq ${tax_agglom_min} 1 ${tax_agglom_max}))

for i in \${array[@]}
do
    #collapse taxa
    qiime taxa collapse \\
        --i-table ${table} \\
        --i-taxonomy ${taxonomy} \\
        --p-level \$i \\
        --o-collapsed-table table-\$i.qza
    #convert to relative abundances
    qiime feature-table relative-frequency \\
        --i-table table-\$i.qza \\
        --o-relative-frequency-table relative-table-\$i.qza
    #export to biom
    qiime tools export \\
        --input-path relative-table-\$i.qza \\
        --output-path relative-table-\$i
    #convert to tab separated text file
    biom convert \\
        -i relative-table-\$i/feature-table.biom \\
        -o rel-table-\$i.tsv --to-tsv
done

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 25 of local/qiime2_export_reltax.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

### Import
qiime tools import \\
    --type \'FeatureData[Sequence]\' \\
    --input-path ${database[0]} \\
    --output-path ref-seq.qza
qiime tools import \\
    --type \'FeatureData[Taxonomy]\' \\
    --input-format HeaderlessTSVTaxonomyFormat \\
    --input-path ${database[1]} \\
    --output-path ref-taxonomy.qza
#Extract sequences based on primers
qiime feature-classifier extract-reads \\
    --i-sequences ref-seq.qza \\
    --p-f-primer ${meta.FW_primer} \\
    --p-r-primer ${meta.RV_primer} \\
    --o-reads ${meta.FW_primer}-${meta.RV_primer}-ref-seq.qza \\
    --quiet

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 24 of local/qiime2_extract.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

qiime feature-table filter-samples \\
    --i-table "${table}" \\
    --m-metadata-file "${metadata}" \\
    --p-where \"${category}<>\'\'\" \\
    --o-filtered-table "filtered_${category}.qza"

qiime feature-table group \\
    --i-table "filtered_${category}.qza" \\
    --p-axis 'sample' \\
    --m-metadata-file "${metadata}" \\
    --m-metadata-column "${category}" \\
    --p-mode 'sum' \\
    --o-grouped-table "${category}"

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 23 of local/qiime2_featuretable_group.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

qiime feature-table filter-samples \\
    --i-table ${table} \\
    --m-metadata-file ${metadata} \\
    $args \\
    --o-filtered-table ${prefix}.qza

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 25 of local/qiime2_filtersamples.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

if ! [ \"${exclude_taxa}\" = \"none\" ]; then
    #filter sequences
    qiime taxa filter-seqs \\
        --i-sequences ${repseq} \\
        --i-taxonomy ${taxonomy} \\
        --p-exclude ${exclude_taxa} --p-mode contains \\
        --o-filtered-sequences tax_filtered-sequences.qza
    #filter abundance table
    qiime taxa filter-table \\
        --i-table ${table} \\
        --i-taxonomy ${taxonomy} \
        --p-exclude ${exclude_taxa} \\
        --p-mode contains \\
        --o-filtered-table tax_filtered-table.qza
    filtered_table="tax_filtered-table.qza"
    filtered_sequences="tax_filtered-sequences.qza"
else
    filtered_table=${table}
    filtered_sequences=${repseq}
fi
qiime feature-table filter-features \\
    --i-table \$filtered_table \\
    --p-min-frequency ${min_frequency} \\
    --p-min-samples ${min_samples} \\
    --o-filtered-table filtered-table.qza

qiime feature-table filter-seqs \\
    --i-data \$filtered_sequences \\
    --i-table filtered-table.qza \\
    --o-filtered-data filtered-sequences.qza

#produce raw count table in biom format "table/feature-table.biom"
qiime tools export \\
    --input-path filtered-table.qza  \\
    --output-path table
#produce raw count table
biom convert \\
    -i table/feature-table.biom \\
    -o table/feature-table.tsv  \\
    --to-tsv
cp table/feature-table.tsv filtered-table.tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 30 of local/qiime2_filtertaxa.nf

"""
# remove first line if needed
sed '/^# Constructed from biom file/d' "$asv" > biom-table.txt

# load into QIIME2
biom convert -i biom-table.txt -o table.biom --table-type="OTU table" --to-hdf5
qiime tools import \\
    --input-path table.biom \\
    --type 'FeatureTable[Frequency]' \\
    --input-format BIOMV210Format \\
    --output-path table.qza

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 23 of local/qiime2_inasv.nf

"""
qiime tools import \\
    --input-path "$seq" \\
    --type 'FeatureData[Sequence]' \\
    --output-path rep-seqs.qza

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 23 of local/qiime2_inseq.nf

"""
parse_dada2_taxonomy.r $tax

qiime tools import \\
    --type 'FeatureData[Taxonomy]' \\
    --input-format HeaderlessTSVTaxonomyFormat \\
    --input-path tax.tsv \\
    --output-path taxonomy.qza

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 23 of local/qiime2_intax.nf

"""
qiime tools import \\
    --type 'Phylogeny[Rooted]' \\
    --input-path $tree \\
    --output-path tree.qza

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 23 of local/qiime2_intree.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

#Train classifier
qiime feature-classifier fit-classifier-naive-bayes \\
    --i-reference-reads ${meta.FW_primer}-${meta.RV_primer}-ref-seq.qza \\
    --i-reference-taxonomy ref-taxonomy.qza \\
    --o-classifier ${meta.FW_primer}-${meta.RV_primer}-classifier.qza \\
    --quiet

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 From line 24 of local/qiime2_train.nf

"""
export XDG_CONFIG_HOME="\${PWD}/HOME"

qiime alignment mafft \\
    --i-sequences ${repseq} \\
    --o-alignment aligned-rep-seqs.qza \\
    --p-n-threads ${task.cpus}
qiime alignment mask \\
    --i-alignment aligned-rep-seqs.qza \\
    --o-masked-alignment masked-aligned-rep-seqs.qza
qiime phylogeny fasttree \\
    --i-alignment masked-aligned-rep-seqs.qza \\
    --p-n-threads ${task.cpus} \\
    --o-tree unrooted-tree.qza
qiime phylogeny midpoint-root \\
    --i-tree unrooted-tree.qza \\
    --o-rooted-tree rooted-tree.qza
qiime tools export \\
    --input-path rooted-tree.qza  \\
    --output-path phylogenetic_tree
cp phylogenetic_tree/tree.nwk .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    qiime2: \$( qiime --version | sed '1!d;s/.* //' )
END_VERSIONS
"""

NextFlow QIIME2.0 MAFFT API (EBI) fasttree From line 23 of local/qiime2_tree.nf

"""
if [ ! -f  ${meta.id}.fastq.gz ]; then
    $args $reads ${meta.id}.fastq.gz
else
    touch ${meta.id}.fastq.gz
fi

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    sed: \$(sed --version 2>&1 | sed -n 1p | sed 's/sed (GNU sed) //')
END_VERSIONS
"""

NextFlow From line 24 of local/rename_raw_data_files.nf

"""
[ -f "${meta.id}_1.fastq.gz" ] || $args "${reads[0]}" "${meta.id}_1.fastq.gz"
[ -f "${meta.id}_2.fastq.gz" ] || $args "${reads[1]}" "${meta.id}_2.fastq.gz"

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    sed: \$(sed --version 2>&1 | sed -n 1p | sed 's/sed (GNU sed) //')
END_VERSIONS
"""

NextFlow From line 37 of local/rename_raw_data_files.nf

"""
sbdiexport.R $args $asvtable $taxonomytable $metadata

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    R: \$(R --version 2>&1 | sed -n 1p | sed 's/R version //' | sed 's/ (.*//')
END_VERSIONS
"""

NextFlow From line 24 of local/sbdiexport.nf

"""
if [[ $workflow.manifest.version == *dev ]]; then
    ampliseq_version="v$workflow.manifest.version, revision: ${workflow.scriptId.substring(0,10)}"
else
    ampliseq_version="v$workflow.manifest.version"
fi

sbdiexportreannotate.R \"$dbversion\" $taxonomytable $taxonomymethod \"\$ampliseq_version\" $predictions

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    R: \$(R --version 2>&1 | sed -n 1p | sed 's/R version //' | sed 's/ (.*//')
END_VERSIONS
"""

NextFlow From line 24 of local/sbdiexportreannotate.nf

"""
trunclen.py $qual_stats $args

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    python: \$(python --version 2>&1 | sed 's/Python //g')
    pandas: \$(python -c "import pkg_resources; print(pkg_resources.get_distribution('pandas').version)")
END_VERSIONS
"""

NextFlow Pandas From line 22 of local/trunclen.nf

"""
cutadapt \\
    --cores $task.cpus \\
    $args \\
    $trimmed \\
    $reads \\
    > ${prefix}.cutadapt.log
cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cutadapt: \$(cutadapt --version)
END_VERSIONS
"""

NextFlow Cutadapt From line 25 of cutadapt/main.nf

"""
touch ${prefix}.cutadapt.log
touch ${trimmed}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    cutadapt: \$(cutadapt --version)
END_VERSIONS
"""

NextFlow From line 41 of cutadapt/main.nf

"""
epa-ng \\
    $args \\
    --threads $task.cpus \\
    $queryarg \\
    $refalnarg \\
    $reftreearg \\
    $bfastarg \\
    $binaryarg

if [ -e epa_result.jplace ]; then
    gzip epa_result.jplace
    cp epa_result.jplace.gz ${prefix}.epa_result.jplace.gz
fi
[ -e epa_info.log ]      && cp epa_info.log ${prefix}.epa_info.log

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    epang: \$(echo \$(epa-ng --version 2>&1) | sed 's/^EPA-ng v//')
END_VERSIONS
"""

NextFlow epang From line 33 of place/main.nf

"""
epa-ng \\
    $args \\
    --split $refaln $fullaln

gzip -c query.fasta > ${prefix}.query.fasta.gz; rm query.fasta
gzip -c reference.fasta > ${prefix}.reference.fasta.gz; rm reference.fasta

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    epang: \$(echo \$(epa-ng --version 2>&1) | sed 's/^EPA-ng v//')
END_VERSIONS
"""

NextFlow epang From line 24 of split/main.nf

"""
printf "%s %s\\n" $rename_to | while read old_name new_name; do
    [ -f "\${new_name}" ] || ln -s \$old_name \$new_name
done
fastqc $args --threads $task.cpus $renamed_files

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""

NextFlow FastQC From line 28 of fastqc/main.nf

"""
touch ${prefix}.html
touch ${prefix}.zip

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    fastqc: \$( fastqc --version | sed -e "s/FastQC v//g" )
END_VERSIONS
"""

NextFlow FastQC From line 42 of fastqc/main.nf

"""
gappa \\
    examine assign \\
    $args \\
    --threads $task.cpus \\
    --jplace-path $jplace \\
    --taxon-file $taxonomy \\
    --file-prefix ${prefix}.

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gappa: \$(echo \$(gappa --version 2>&1 | sed 's/v//' ))
END_VERSIONS
"""

NextFlow GAPPA From line 28 of examineassign/main.nf

"""
gappa \\
    examine \\
    graft \\
    $args \\
    --threads $task.cpus \\
    --file-prefix ${prefix}. \\
    --jplace-path $jplace

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gappa: \$(echo \$(gappa --version 2>&1 | sed 's/v//' ))
END_VERSIONS
"""

NextFlow GAPPA From line 23 of examinegraft/main.nf

"""
gappa \\
    examine \\
    heat-tree \\
    --threads $task.cpus \\
    --file-prefix ${prefix}. \\
    --jplace-path $jplace \\
    $args \\
    --log-file ${prefix}.log

grep '^ *At' ${prefix}.log > ${prefix}.colours.txt

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    gappa: \$(echo \$(gappa --version 2>&1 | sed 's/v//' ))
END_VERSIONS
"""

NextFlow GAPPA From line 28 of examineheattree/main.nf

"""
esl-alimask \\
    $args \\
    $fmask_rfarg \\
    $fmask_allarg \\
    $gmask_rfarg \\
    $gmask_allarg \\
    $pmask_rfarg \\
    $pmask_allarg \\
    -o ${prefix}.masked.sthlm \\
    $unmaskedaln \\
    $maskfile

gzip ${prefix}.*mask*

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    hmmer/easel: \$(esl-reformat -h | grep -o '^# Easel [0-9.]*' | sed 's/^# Easel *//')
END_VERSIONS
"""

NextFlow From line 36 of eslalimask/main.nf

"""
esl-reformat \\
    $args \\
    $seqfile \\
    $postproc \\
    | gzip -c > ${prefix}.${suffix}.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    hmmer/easel: \$(esl-reformat -h | grep -o '^# Easel [0-9.]*' | sed 's/^# Easel *//')
END_VERSIONS
"""

NextFlow From line 26 of eslreformat/main.nf

"""
hmmalign \\
    $args \\
    $hmm \\
    $fasta | gzip -c > ${prefix}.sthlm.gz

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    hmmer: \$(hmmalign -h | grep -o '^# HMMER [0-9.]*' | sed 's/^# HMMER *//')
END_VERSIONS
"""

NextFlow hmmalign (genouest) From line 24 of hmmalign/main.nf

"""
hmmbuild \\
    $args \\
    --cpu $task.cpus \\
    -n ${prefix}  \\
    -o ${prefix}.hmmbuild.txt \\
    ${mxfileopt} \\
    ${prefix}.hmm \\
    $alignment

gzip ${prefix}.hmm

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    hmmer: \$(echo \$(hmmbuild -h | grep HMMER | sed 's/# HMMER //' | sed 's/ .*//' 2>&1))
END_VERSIONS
"""

NextFlow hmmbuild (genouest) From line 27 of hmmbuild/main.nf

"""
mafft \\
    --thread ${task.cpus} \\
    ${args} \\
    ${add} \\
    ${fasta} \\
    > ${prefix}.fas

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    mafft: \$(mafft --version 2>&1 | sed 's/^v//' | sed 's/ (.*)//')
END_VERSIONS
"""

NextFlow MAFFT API (EBI) From line 25 of mafft/main.nf

"""
multiqc \\
    --force \\
    $args \\
    $config \\
    $extra_config \\
    .

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" )
END_VERSIONS
"""

NextFlow MultiQC From line 28 of multiqc/main.nf

"""
touch multiqc_data
touch multiqc_plots
touch multiqc_report.html

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    multiqc: \$( multiqc --version | sed -e "s/multiqc, version //g" )
END_VERSIONS
"""

NextFlow MultiQC From line 43 of multiqc/main.nf

"""
vsearch \\
    --sintax $queryfasta \\
    --db $db \\
    --threads $task.cpus \\
    $args \\
    --tabbedout ${prefix}.tsv

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    vsearch: \$(vsearch --version 2>&1 | head -n 1 | sed 's/vsearch //g' | sed 's/,.*//g' | sed 's/^v//' | sed 's/_.*//')
END_VERSIONS
"""

NextFlow VSEARCH From line 25 of sintax/main.nf

"""
vsearch \\
    --usearch_global $queryfasta \\
    --db $db \\
    --id $idcutoff \\
    --threads $task.cpus \\
    $args \\
    ${columns} \\
    ${outfmt} ${prefix}.${out_ext}

cat <<-END_VERSIONS > versions.yml
"${task.process}":
    vsearch: \$(vsearch --version 2>&1 | head -n 1 | sed 's/vsearch //g' | sed 's/,.*//g' | sed 's/^v//' | sed 's/_.*//')
END_VERSIONS
"""