A showcase and characterisation of pipeline frameworks (aka workflow managers)

public public 1yr ago 0 bookmarks
Loading...

A lot of pipeline frameworks exist , each containing a different (and sometimes exotic) syntax. Because of this, it can be confusing to choose the appropriate framework for a particular use case. This is further complicated by an inconsistent terminology between frameworks. For example, rules (make and snakemake), processes (nextflow), tasks (WDL), jobs (CWL) and stages (martian) all refer to the exact same concept in different pipeline specifications.

Here we try to create:

Structure

  • tasks contains the task description and examples for each framework.

  • comparison contains the working document and a comparison of frameworks.

  • containers contains the dockerfiles for each framework. These are used for executing the example workflows.

Tasks & examples

Task Frameworks
One task make , snakemake , nextflow , luigi , airflow , toil , cromwell , drake
One task cached make , snakemake , nextflow , luigi , cromwell
Chain snakemake , nextflow , cromwell
Merge snakemake , nextflow
Run in docker nextflow , cromwell
Split merge snakemake , nextflow
Module as is snakemake
Alternative paths nextflow

Running the examples

  1. Install conda and docker

  2. Clone this repo: git clone git@github.com:komparo/rosetta-pipeline.git && cd rosetta-pipeline

  3. Install the conda environment: conda env create -f assets/env/environment.yml

  4. Activate the environment: conda activate rosettapipeline

  5. Run snakemake: snakemake . A first build can take a while because all docker containers have to be built. To run only one framework, use snakemake --config framework_id=nextflow .

Contributing

We welcome contributions of any kind. See assets/contributing.md .

A contribution implies that you agree with the Code of conduct .

Further reading

Opinions

Lists and rankings

Reviews

Code Snippets

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
library(tidyverse)

tasks <- map(snakemake@input$tasks, function(file) {
  task <- yaml::read_yaml(file)
  task$id <- dirname(file) %>% gsub(".*/([^/]*)", "\\1", .)

  task$framework_ids <- snakemake@input$examples %>% 
    keep(grepl, pattern = paste0("/", task$id, "/")) %>% 
    dirname() %>%
    gsub(".*/.*/([^/]*)", "\\1", .)

  task
})

jsonlite::write_json(tasks, snakemake@output[[1]])
12
13
14
15
16
17
18
19
20
21
22
library(tidyverse)

# "output/tasks.json"
tasks <- jsonlite::read_json(snakemake@input$tasks, simplifyVector = TRUE, simplifyMatrix = FALSE, simplifyDataFrame = FALSE)

link_to_jobs <- function(concept) {
    tasks %>%
        keep(~concept %in% .$concepts) %>%
        map_chr(~paste0("[", .$name, "](/tasks/", .$id, ")")) %>% 
        paste0(collapse = ", ")
}
11
12
13
14
library(tidyverse)

# "output/tasks.json"
tasks <- jsonlite::read_json(snakemake@input$tasks, simplifyVector = TRUE, simplifyMatrix = FALSE, simplifyDataFrame = FALSE)
38
39
40
41
42
43
44
45
46
link <- function(label, url) {paste0("[", label, "](", url, ")")}
tasks <- tasks[order(map_int(tasks, ~length(.$framework_ids)), decreasing = TRUE)]
map_df(tasks, function(task) {
  tibble(
    Task = link(task$name, paste0("tasks/", task$id)),
    Frameworks = map(task$framework_ids, ~link(., paste0("tasks/", task$id, "/", .))) %>%
      paste0(collapse = ", ")
  )
}) %>% knitr::kable(escape = FALSE)
9
knitr::opts_chunk$set(echo = FALSE)
13
task <- yaml::read_yaml(snakemake@input$task)
19
cat(task$description)
56
57
58
59
60
shell:
    """
        docker build -t rosettapipeline/{wildcards.framework_id} - < {input.dockerfile}
        docker inspect rosettapipeline/{wildcards.framework_id} > {output.digest}
    """
67
68
69
70
71
shell:
    """
        docker build -t rosettapipeline/{wildcards.task_id} - < {input.dockerfile}
        docker inspect rosettapipeline/{wildcards.task_id} > {output.digest}
    """
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
shell:
    """
        rm -r output/tasks/{wildcards.task_id}/{wildcards.framework_id}
        cp -r tasks/{wildcards.task_id}/{wildcards.framework_id} output/tasks/{wildcards.task_id}/
        if [ -d "tasks/{wildcards.task_id}/data/" ]; then
            cp -r tasks/{wildcards.task_id}/data/* output/tasks/{wildcards.task_id}/{wildcards.framework_id}
        fi

        # run docker
        # the working directory is mounted into the exact same location of the docker, so that docker-in-docker can mount subdirectories (necessary for e.g. nextflow)
        # we also set the group id to the "docker" group id, so that the user within the docker can run dockers

        GROUPID=`cut -d: -f3 < <(getent group docker)`
        docker run \
            --mount type=bind,source=$(pwd),target=$(pwd) \
            --rm \
            -w $(pwd)/output/tasks/{wildcards.task_id}/{wildcards.framework_id} \
            -v /var/run/docker.sock:/var/run/docker.sock \
            -u $(id -u):$GROUPID \
            rosettapipeline/{wildcards.framework_id} \
            bash run.sh \
            2>&1 | tee {log}

        echo "true" > {output}
    """
109
script: "scripts/aggregate_tasks.R"
SnakeMake From line 109 of master/Snakefile
116
script: "scripts/templates/README.Rmd"
SnakeMake From line 116 of master/Snakefile
124
script: "scripts/templates/task.Rmd"
SnakeMake From line 124 of master/Snakefile
131
script: "scripts/templates/comparison.Rmd"
SnakeMake From line 131 of master/Snakefile
ShowHide 13 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/komparo/rosetta-pipeline
Name: rosetta-pipeline
Version: 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: GNU General Public License v3.0
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...