A showcase and characterisation of pipeline frameworks (aka workflow managers)
Help improve this workflow!
This workflow has been published but could be further improved with some additional meta data:- Keyword(s) in categories input, output, operation, topic
You can help improve this workflow by suggesting the addition or removal of keywords, suggest changes and report issues, or request to become a maintainer of the Workflow .
A lot of pipeline frameworks exist , each containing a different (and sometimes exotic) syntax. Because of this, it can be confusing to choose the appropriate framework for a particular use case. This is further complicated by an inconsistent terminology between frameworks. For example, rules (make and snakemake), processes (nextflow), tasks (WDL), jobs (CWL) and stages (martian) all refer to the exact same concept in different pipeline specifications.
Here we try to create:
-
Example workflows for each framework with some common workflow tasks . They showcase how each framework is used, and where the similarities and differences lie. They are also useful to learn a particular framework by example. This is inspired by Rosetta code .
-
A working document on the similarities and differences between pipeline frameworks . This document introduces a consistent terminology of features that a pipeline framework can have, while listing alternative terms along the way. The discussion is meant to be neutral and objective, in the sense that it simply lists possible features of pipeline frameworks, along with some advantages and disadvantages of having that particular feature.
-
A comparison of the features of each framework , structured in the same way as the working document.
Structure
-
tasks contains the task description and examples for each framework.
-
comparison contains the working document and a comparison of frameworks.
-
containers contains the dockerfiles for each framework. These are used for executing the example workflows.
Tasks & examples
Task | Frameworks |
---|---|
One task | make , snakemake , nextflow , luigi , airflow , toil , cromwell , drake |
One task cached | make , snakemake , nextflow , luigi , cromwell |
Chain | snakemake , nextflow , cromwell |
Merge | snakemake , nextflow |
Run in docker | nextflow , cromwell |
Split merge | snakemake , nextflow |
Module as is | snakemake |
Alternative paths | nextflow |
Running the examples
-
Install conda and docker
-
Clone this repo:
git clone git@github.com:komparo/rosetta-pipeline.git && cd rosetta-pipeline
-
Install the conda environment:
conda env create -f assets/env/environment.yml
-
Activate the environment:
conda activate rosettapipeline
-
Run snakemake:
snakemake
. A first build can take a while because all docker containers have to be built. To run only one framework, usesnakemake --config framework_id=nextflow
.
Contributing
We welcome contributions of any kind. See assets/contributing.md .
A contribution implies that you agree with the Code of conduct .
Further reading
Opinions
Lists and rankings
Reviews
Code Snippets
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | library(tidyverse) tasks <- map(snakemake@input$tasks, function(file) { task <- yaml::read_yaml(file) task$id <- dirname(file) %>% gsub(".*/([^/]*)", "\\1", .) task$framework_ids <- snakemake@input$examples %>% keep(grepl, pattern = paste0("/", task$id, "/")) %>% dirname() %>% gsub(".*/.*/([^/]*)", "\\1", .) task }) jsonlite::write_json(tasks, snakemake@output[[1]]) |
12 13 14 15 16 17 18 19 20 21 22 | library(tidyverse) # "output/tasks.json" tasks <- jsonlite::read_json(snakemake@input$tasks, simplifyVector = TRUE, simplifyMatrix = FALSE, simplifyDataFrame = FALSE) link_to_jobs <- function(concept) { tasks %>% keep(~concept %in% .$concepts) %>% map_chr(~paste0("[", .$name, "](/tasks/", .$id, ")")) %>% paste0(collapse = ", ") } |
11 12 13 14 | library(tidyverse) # "output/tasks.json" tasks <- jsonlite::read_json(snakemake@input$tasks, simplifyVector = TRUE, simplifyMatrix = FALSE, simplifyDataFrame = FALSE) |
38 39 40 41 42 43 44 45 46 | link <- function(label, url) {paste0("[", label, "](", url, ")")} tasks <- tasks[order(map_int(tasks, ~length(.$framework_ids)), decreasing = TRUE)] map_df(tasks, function(task) { tibble( Task = link(task$name, paste0("tasks/", task$id)), Frameworks = map(task$framework_ids, ~link(., paste0("tasks/", task$id, "/", .))) %>% paste0(collapse = ", ") ) }) %>% knitr::kable(escape = FALSE) |
9 | knitr::opts_chunk$set(echo = FALSE) |
13 | task <- yaml::read_yaml(snakemake@input$task) |
19 | cat(task$description) |
56 57 58 59 60 | shell: """ docker build -t rosettapipeline/{wildcards.framework_id} - < {input.dockerfile} docker inspect rosettapipeline/{wildcards.framework_id} > {output.digest} """ |
67 68 69 70 71 | shell: """ docker build -t rosettapipeline/{wildcards.task_id} - < {input.dockerfile} docker inspect rosettapipeline/{wildcards.task_id} > {output.digest} """ |
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 | shell: """ rm -r output/tasks/{wildcards.task_id}/{wildcards.framework_id} cp -r tasks/{wildcards.task_id}/{wildcards.framework_id} output/tasks/{wildcards.task_id}/ if [ -d "tasks/{wildcards.task_id}/data/" ]; then cp -r tasks/{wildcards.task_id}/data/* output/tasks/{wildcards.task_id}/{wildcards.framework_id} fi # run docker # the working directory is mounted into the exact same location of the docker, so that docker-in-docker can mount subdirectories (necessary for e.g. nextflow) # we also set the group id to the "docker" group id, so that the user within the docker can run dockers GROUPID=`cut -d: -f3 < <(getent group docker)` docker run \ --mount type=bind,source=$(pwd),target=$(pwd) \ --rm \ -w $(pwd)/output/tasks/{wildcards.task_id}/{wildcards.framework_id} \ -v /var/run/docker.sock:/var/run/docker.sock \ -u $(id -u):$GROUPID \ rosettapipeline/{wildcards.framework_id} \ bash run.sh \ 2>&1 | tee {log} echo "true" > {output} """ |
109 | script: "scripts/aggregate_tasks.R" |
116 | script: "scripts/templates/README.Rmd" |
124 | script: "scripts/templates/task.Rmd" |
131 | script: "scripts/templates/comparison.Rmd" |
Support
- Future updates
Related Workflows





