HPC Molecular Dynamics Simulation Workflow: CWL Version of md_list.py

public public 1yr ago Version: Version 1 0 bookmarks

This repository is part of a series of repositories mirroring the workflow and launchers in https://github.com/bioexcel/biobb_hpc_workflows .

Below the workflow is briefly described, followed by installation instructions, and guidance on running the workflow with the two workflow engines it has been tested on (CWLtool and TOIL).

md_list / md_launch

The md_list workflow performs a molecular dynamics simulation on a given structure listed in the YAML properties file.

The md_launch workflow will run the md_list workflow multiple times (using scatter), passing it structures from a list defined in the YAML properties file.

Getting Started

Requirements

If you are working on your own machine then instructions for installing Git and Docker are given on their websites. You will be able to use either CWLtool (which is the reference implementation of CWL) or toil.

If you are working on HPC then you will need Singularity and Toil rather than Docker and CWLtool. Git and Singularity should already be installed, while the installation of Toil (if this is not installed already) will be covered below.

Version requirements:

  • The workflow engine should support CWL standard 1.2 or more recent (versions tested: 1.2.0-dev5 in toil; 1.2 in CWLtool)

Setup

These workflows make use of the BioBB libraries, which are installed using git submodules . This requires that you clone this repository, rather than downloading a zip archive (as the git hooks are needed for this to work):

git clone --recurse-submodules https://github.com/douglowe/biobb_hpc_cwl_md_list.git

CWLtool

This can be installed via conda , with the command:

conda env create -f install/env_cwlrunner.yml

To install a javascript interpreter (if you do not already have one on your system) use:

conda env create -f install/env_cwlrunner_nodejs.cwl

TOIL

This can be installed using conda , with the command:

conda env create -f install/env_toil.yml

To install a javascript interpreter (if you do not already have one on your system) use:

conda env create -f install/env_toil_nodejs.cwl

Running the Workflows

This workflow requires:

  1. PDB file describing the molecule of interest (see example example_input_files/lysozyme.pdb ).

  2. Configuration file (see example md_list_input_descriptions.yml ).

CWL

To run the workflow use:

cwl-runner md_launch.cwl md_list_input_descriptions.yml

TOIL

TOIL (at the time of writing, version 5.2.0) does not yet fully support the CWL v1.2.0 standard, so you will need to edit md_list.cwl to use: cwlVersion: v1.2.0-dev5 .

To use the toil engine several environmental variables will need to be set. These will be described in more detail on the TOIL documentation page, below we only highlight the variables we found useful to set.

On all HPC systems it is wise to check the temporary directory variable ( TMPDIR ) - for TOIL this needs to be on a disk accessible by all compute nodes that will be used.

For Singularity set the variables CWL_SINGULARITY_CACHE and SINGULARITY_CACHEDIR (again on a disk accessible by all compute nodes).

GridEngine (SGE)

For SGE set:

  1. TOIL_GRIDENGINE_PE (this is the job queue to select)

  2. TOIL_GRIDENGINE_ARGS

To execute the workflow use:

toil-cwl-runner --enable-dev --batchSystem grid_engine --singularity --defaultCores 1 md_launch.cwl md_list_input_descriptions.yml

This example sets the number of cores used to 1 - we recommend you test your setup as a serial job before trying to use parallel compute nodes. When changing to a parallel compute job change the --defaultCores flag.

SLURM

For Slurm job managers set:

  1. TOIL_SLURM_ARGS , this carries all the required slurm job flags, e.g. "--nodes=1 --ntasks-per-node=64 --time=0:10:0 --partition=standard --qos=standard --account=[XXX] --export=ALL"

To execute the workflow use:

toil-cwl-runner --enable-dev --batchSystem slurm --singularity md_launch.cwl md_list_input_descriptions.yml

Copyright & Licensing

This software has been developed in the MMB group at the BSC & IRB ; and in the eScience Lab and Research IT groups at the University of Manchester for the European BioExcel, funded by the European Commission (EU H2020 823830, EU H2020 675728).

Licensed under the Apache License 2.0 , see the file LICENSE for details.

Code Snippets

 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
baseCommand: editconf
hints:
  DockerRequirement:
    dockerPull: quay.io/biocontainers/biobb_md:0.1.5--py_0
inputs:
  input_gro_path:
    type: File
    format: edam:format_GROMACS_GRO
    inputBinding:
      position: 1
      prefix: --input_gro_path

  output_gro_path:
    type: string
    inputBinding:
      position: 2
      prefix: --output_gro_path
    default: "structure_box.gro"

  config:
    type: string?
    inputBinding:
      position: 3
      prefix: --config
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
baseCommand: genion
hints:
  DockerRequirement:
    dockerPull: quay.io/biocontainers/biobb_md:0.1.5--py_0
inputs:
  input_tpr_path:
    type: File
    # TODO: Not yet in EDAM
    #format: edam:format_GROMACS_TPR
    inputBinding:
      position: 1
      prefix: --input_tpr_path

  output_gro_path:
    type: string
    inputBinding:
      position: 2
      prefix: --output_gro_path
    default: "structure_ions.gro"

  input_top_zip_path:
    type: File
    format: edam:format_2333
    inputBinding:
      position: 3
      prefix: --input_top_zip_path

  output_top_zip_path:
    type: string
    inputBinding:
      position: 4
      prefix: --output_top_zip_path
    default: "topology_ions_top.zip"

  config:
    type: string?
    inputBinding:
      position: 5
      prefix: --config
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
baseCommand: grompp
hints:
  DockerRequirement:
    dockerPull: quay.io/biocontainers/biobb_md:0.1.5--py_0


inputs:
  input_gro_path:
    label: Path to GRO file
    doc: |
      Path to the input GROMACS structure GRO file.
      Type: str
      File type: input
      Accepted formats: gro
      Example file:
        https://github.com/bioexcel/biobb_md/raw/master/biobb_md/test/data/gromacs/grompp.gro
    type: File
    format: edam:format_GROMACS_GRO
    inputBinding:
      position: 1
      prefix: --input_gro_path


  input_top_zip_path:
    label: Path to TOP and ITP files
    doc: |
      Path the input GROMACS topology TOP and ITP files in zip format.
      Type: str
      File type: input
      Accepted formats: zip
      Example file:
        https://github.com/bioexcel/biobb_md/raw/master/biobb_md/test/data/gromacs/grompp.zip
    type: File
    format: edam:format_2333
    inputBinding:
      position: 2
      prefix: --input_top_zip_path


  output_tpr_path:
    label: Path to TPR file; Optional
    doc: |
      Path to the output portable binary run file TPR.
      Type: str
      File type: output
      Accepted formats: tpr
      Example file:
        https://github.com/bioexcel/biobb_md/raw/master/biobb_md/test/reference/gromacs/ref_grompp.tpr
    type: string
    inputBinding:
      position: 3
      prefix: --output_tpr_path
    default: "system.tpr"


  input_cpt_path:
    label: Path to the input GROMACS checkpoint file CPT.
    doc: |
      Path to the input GROMACS checkpoint file CPT. Optional parameter.
      Type: str
      File type: input
      Accepted formats: cpt
    type: File?
    format: edam:format_2333
    inputBinding:
      prefix: --input_cpt_path


  config:
    label: Advanced configuration options for GROMACS
    doc: |
      Advanced configuration options for GROMACS. This should be passed as a
      string containing a dict. The possible options to include here are listed
      under 'properties' in the gromacs documentation:
        https://biobb-md.readthedocs.io/en/latest/gromacs.html#module-gromacs.grompp

    type: string?
    inputBinding:
      prefix: --config
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
baseCommand: make_ndx
hints:
  DockerRequirement:
    dockerPull: quay.io/biocontainers/biobb_md:0.1.5--py_0
inputs:
  input_structure_path:
    type: File
    format: edam:format_GROMACS_GRO
    inputBinding:
      position: 1
      prefix: --input_structure_path

  output_ndx_path:
    type: string
    inputBinding:
      position: 2
      prefix: --output_ndx_path
    default: "custom_index.ndx"

  input_ndx_path:
    type: File?
    format: edam:format_2330
    inputBinding:
      prefix: --input_ndx_path

  config:
    type: string?
    inputBinding:
      prefix: --config
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
baseCommand: mdrun
hints:
  DockerRequirement:
    dockerPull: quay.io/biocontainers/biobb_md:0.1.5--py_0
inputs:
  input_tpr_path:
    type: File
    format: edam:format_2333
    inputBinding:
      position: 1
      prefix: --input_tpr_path

  output_trr_path:
    type: string
    inputBinding:
      position: 2
      prefix: --output_trr_path
    default: "trajectory.trr"

  output_gro_path:
    type: string
    inputBinding:
      position: 3
      prefix: --output_gro_path
    default: "trajectory.gro"

  output_edr_path:
    type: string
    inputBinding:
      position: 4
      prefix: --output_edr_path
    default: "trajectory.edr"

  output_log_path:
    type: string
    inputBinding:
      position: 5
      prefix: --output_log_path
    default: "trajectory.log"

  output_xtc_path:
    type: string?
    inputBinding:
      prefix: --output_xtc_path
    default: "trajectory.xtc"

  output_cpt_path:
    type: string?
    inputBinding:
      prefix: --output_cpt_path

  config:
    type: string?
    inputBinding:
      prefix: --config
CWL From line 4 of gromacs/mdrun.cwl
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
baseCommand: pdb2gmx
hints:
  DockerRequirement:
    dockerPull: quay.io/biocontainers/biobb_md:0.1.5--py_0
inputs:
  input_pdb_path:
    type: File
    format: edam:format_1476
    inputBinding:
      position: 1
      prefix: --input_pdb_path

  output_gro_path:
    type: string
    inputBinding:
      position: 2
      prefix: --output_gro_path
    default: "structure.gro"

  output_top_zip_path:
    type: string
    inputBinding:
      position: 3
      prefix: --output_top_zip_path
    default: "topology.zip"

  config:
    type: string?
    inputBinding:
      position: 4
      prefix: --config
CWL From line 4 of gromacs/pdb2gmx.cwl
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
baseCommand: solvate
hints:
  DockerRequirement:
    dockerPull: quay.io/biocontainers/biobb_md:0.1.5--py_0
inputs:
  input_solute_gro_path:
    type: File
    format: edam:format_GROMACS_GRO
    inputBinding:
      position: 1
      prefix: --input_solute_gro_path

  output_gro_path:
    type: string
    inputBinding:
      position: 2
      prefix: --output_gro_path
    default: "structure_solvated.gro"

  input_top_zip_path:
    type: File
    format: edam:format_2333
    inputBinding:
      position: 3
      prefix: --input_top_zip_path

  output_top_zip_path:
    type: string
    inputBinding:
      position: 4
      prefix: --output_top_zip_path
    default: "topology_solvated.zip"

  config:
    type: string?
    inputBinding:
      position: 5
      prefix: --config
ShowHide 3 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/douglowe/biobb_hpc_cwl_md_list
Name: molecular-dynamics-simulation
Version: Version 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: Boost Software License 1.0
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...