Jupyter Notebook Protein conformational ensembles generation

public public 1yr ago Version: Version 1 0 bookmarks

Protein Conformational ensembles generation

Workflow included in the ELIXIR 3D-Bioinfo Implementation Study:

Building on PDBe-KB to chart and characterize the conformation landscape of native proteins

This tutorial aims to illustrate the process of generating protein conformational ensembles from** 3D structures **and analysing its molecular flexibility , step by step, using the BioExcel Building Blocks library (biobb) .

Conformational landscape of native proteins

Proteins are dynamic systems that adopt multiple conformational states , a property essential for many biological processes (e.g. binding other proteins, nucleic acids, small molecule ligands, or switching between functionaly active and inactive states). Characterizing the different conformational states of proteins and the transitions between them is therefore critical for gaining insight into their biological function and can help explain the effects of genetic variants in health and disease and the action of drugs.

Structural biology has become increasingly efficient in sampling the different conformational states of proteins. The PDB has currently archived more than 170,000 individual structures , but over two thirds of these structures represent multiple conformations of the same or related protein, observed in different crystal forms, when interacting with other proteins or other macromolecules, or upon binding small molecule ligands. Charting this conformational diversity across the PDB can therefore be employed to build a useful approximation of the conformational landscape of native proteins.

A number of resources and tools describing and characterizing various often complementary aspects of protein conformational diversity in known structures have been developed, notably by groups in Europe. These tools include algorithms with varying degree of sophistication, for aligning the 3D structures of individual protein chains or domains, of protein assemblies, and evaluating their degree of structural similarity . Using such tools one can align structures pairwise , compute the corresponding similarity matrix , and identify ensembles of structures/conformations with a defined similarity level that tend to recur in different PDB entries, an operation typically performed using clustering methods. Such workflows are at the basis of resources such as CATH, Contemplate, or PDBflex that offer access to conformational ensembles comprised of similar conformations clustered according to various criteria. Other types of tools focus on differences between protein conformations , identifying regions of proteins that undergo large collective displacements in different PDB entries, those that act as hinges or linkers , or regions that are inherently flexible .

To build a meaningful approximation of the conformational landscape of native proteins, the conformational ensembles (and the differences between them), identified on the basis of structural similarity/dissimilarity measures alone, need to be biophysically characterized . This may be approached at two different levels .

  • At the biological level , it is important to link observed conformational ensembles , to their functional roles by evaluating the correspondence with protein family classifications based on sequence information and functional annotations in public databases e.g. Uniprot, PDKe-Knowledge Base (KB). These links should provide valuable mechanistic insights into how the conformational and dynamic properties of proteins are exploited by evolution to regulate their biological function .

  • At the physical level one needs to introduce energetic consideration to evaluate the likelihood that the identified conformational ensembles represent conformational states that the protein (or domain under study) samples in isolation. Such evaluation is notoriously challenging and can only be roughly approximated by using computational methods to evaluate the extent to which the observed conformational ensembles can be reproduced by algorithms that simulate the dynamic behavior of protein systems. These algorithms include the computationally expensive classical molecular dynamics (MD) simulations to sample local thermal fluctuations but also faster more approximate methods such as Elastic Network Models and Normal Node Analysis (NMA) to model low energy collective motions . Alternatively, enhanced sampling molecular dynamics can be used to model complex types of conformational changes but at a very high computational cost.

The ELIXIR 3D-Bioinfo Implementation Study Building on PDBe-KB to chart and characterize the conformation landscape of native proteins focuses on:

  1. Mapping the conformational diversity of proteins and their homologs across the PDB.
  2. Characterize the different flexibility properties of protein regions, and link this information to sequence and functional annotation.
  3. Benchmark computational methods that can predict a biophysical description of protein motions.

This notebook is part of the third objective, where a list of computational resources that are able to predict protein flexibility and conformational ensembles have been collected, evaluated, and integrated in reproducible and interoperable workflows using the BioExcel Building Blocks library . Note that the list is not meant to be exhaustive, it is built following the expertise of the implementation study partners.

Code Snippets

 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
import os
import nglview
import simpletraj
import plotly
import plotly.graph_objs as go
import numpy as np
import pandas as pd
import ipywidgets
import json
import zipfile
from IPython.display import display, Markdown

pdbCode = "1ake"
num_frames = 300
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
# Downloading desired PDB file 
# Import module
from biobb_io.api.pdb import pdb

# Create properties dict and inputs/outputs
downloaded_pdb = pdbCode+'.pdb'

prop = {
    'pdb_code': pdbCode,
    'api_id' : 'mmb'
}

#Create and launch bb
pdb(output_pdb_path=downloaded_pdb,
    properties=prop)
37
38
39
40
41
42
43
44
45
46
47
from biobb_structure_utils.utils.extract_model import extract_model

pdb_model = pdbCode+'_model.pdb'

prop = {
    'models': [ 1 ]
}

extract_model(input_structure_path=downloaded_pdb,
              output_structure_path=pdb_model,
              properties=prop)
51
52
53
54
55
56
57
58
59
60
61
from biobb_structure_utils.utils.extract_chain import extract_chain

monomer = pdbCode+'_monomer.pdb'

prop = {
    'chains': [ 'A' ]
}

extract_chain(input_structure_path=pdb_model,
            output_structure_path=monomer,
            properties=prop)
65
66
67
68
69
# Show protein
view = nglview.show_structure_file(monomer)
view.add_representation(repr_type='ball+stick', selection='all')
view._remote_call('setSize', target='Widget', args=['','600px'])
view
73
74
75
76
77
78
79
80
81
82
83
84
85
from biobb_analysis.ambertools.cpptraj_mask import cpptraj_mask

prot_backbone = pdbCode + "_backbone.pdb"

prop = {
    'mask': 'backbone',
    'format': 'pdb'
}

cpptraj_mask(input_top_path=monomer,
            input_traj_path=monomer,
            output_cpptraj_path=prot_backbone,
            properties=prop)
89
90
91
92
93
# Show protein
view = nglview.show_structure_file(prot_backbone)
view.add_representation(repr_type='ball+stick', selection='all')
view._remote_call('setSize', target='Widget', args=['','600px'])
view
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
from biobb_analysis.ambertools.cpptraj_mask import cpptraj_mask

prot_ca = pdbCode + "_ca.pdb"

prop = {
    'mask': 'c-alpha',
    'format': 'pdb'
}

cpptraj_mask(input_top_path=monomer,
            input_traj_path=monomer,
            output_cpptraj_path=prot_ca,
            properties=prop)
113
114
115
116
117
118
# Show protein
view = nglview.show_structure_file(prot_ca)
view.add_representation(repr_type='ball+stick', selection='all')
view.center()
view._remote_call('setSize', target='Widget', args=['','600px'])
view
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
from biobb_flexdyn.flexdyn.concoord_dist import concoord_dist

concoord_dist_pdb = pdbCode + "_dist.pdb"
concoord_dist_gro = pdbCode + "_dist.gro"
concoord_dist_dat = pdbCode + "_dist.dat"

concoord_lib = os.environ['CONDA_PREFIX']+"/share/concoord/lib"

prop = {
    'retain_hydrogens' : False,
    'cutoff' : 4.0,
    'env_vars_dict' : {
        'CONCOORD_OVERWRITE' : '1',
        'CONCOORDLIB' : concoord_lib
    }
}

concoord_dist(  input_structure_path=monomer,
                output_pdb_path=concoord_dist_pdb,
                output_gro_path=concoord_dist_gro,
                output_dat_path=concoord_dist_dat,
                properties=prop)
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
from biobb_flexdyn.flexdyn.concoord_disco import concoord_disco

concoord_disco_pdb = pdbCode + "_disco_traj.pdb"
concoord_disco_rmsd = pdbCode + "_disco_rmsd.dat"
concoord_disco_bfactor = pdbCode + "_disco_bfactor.pdb"

concoord_lib = os.environ['CONDA_PREFIX']+"/share/concoord/lib"

prop = {
    'vdw' : 4,
    'num_structs' : num_frames,
    'env_vars_dict' : {
        'CONCOORD_OVERWRITE' : '1',
        'CONCOORDLIB' : concoord_lib
    }
}

concoord_disco(     input_pdb_path=concoord_dist_pdb,
                    input_dat_path=concoord_dist_dat,
                    output_traj_path=concoord_disco_pdb,
                    output_rmsd_path=concoord_disco_rmsd,
                    output_bfactor_path=concoord_disco_bfactor,
                    properties=prop)
173
174
175
176
177
178
179
180
181
182
183
# Show protein (if num_frames <= 100)

if (num_frames <= 100):
    view = nglview.show_structure_file(concoord_disco_pdb, default_representation=False)
    view.add_representation(repr_type='line', selection='all', color='modelindex')
    view.center()
    view._remote_call('setSize', target='Widget', args=['','600px'])
    view
else:
    #print("Visualizing a multi-model PDB with > 100 frames is highly dangerous. Please use the trajectory visualization below.")
    display(Markdown('<div class="alert alert-info">Visualizing a multi-model PDB with > 100 frames is highly dangerous. Please use the trajectory visualization below.</div>'))
187
188
189
190
191
192
view = nglview.show_structure_file(concoord_disco_pdb, default_representation=False)
view.clear_representations()
view.add_representation(repr_type='backbone', selection='all', color='modelindex')
view.center()
view._remote_call('setSize', target='Widget', args=['','600px'])
view
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
from biobb_analysis.ambertools.cpptraj_rms import cpptraj_rms

concoord_rmsd = pdbCode + "_concoord_rmsd.dat"

prop = {
    'start': 1,
    'end': -1,
    'steps': 1,
    'mask': 'c-alpha',
    'reference': 'experimental'
}
cpptraj_rms(input_top_path=concoord_dist_pdb,
            input_traj_path=concoord_disco_pdb,
            output_cpptraj_path=concoord_rmsd,
            input_exp_path= monomer,
            properties=prop)
215
216
217
218
219
220
221
222
223
224
225
226
227
228
df = pd.read_csv(concoord_rmsd, header = 0, delimiter='\s+')

plotly.offline.init_notebook_mode(connected=True)

fig = {
    "data": [go.Histogram(x=df['RMSD_00004'], xbins=dict(
    size=0.04), autobinx=False)],
    "layout": go.Layout(title="RMSd variance",
                        xaxis=dict(title = "RMSd (Angstroms)"),
                        yaxis=dict(title = "Population")
                       )
}

plotly.offline.iplot(fig)
232
233
234
235
236
237
238
239
240
241
242
243
244
from biobb_analysis.ambertools.cpptraj_convert import cpptraj_convert

concoord_trr = pdbCode + "_disco_traj.trr"

prop = {
    'mask' : 'c-alpha',
    'format': 'trr'
}

cpptraj_convert(input_top_path=concoord_dist_pdb,
                input_traj_path=concoord_disco_pdb,
                output_cpptraj_path=concoord_trr,
                properties=prop)
248
249
250
251
252
253
# Show trajectory
view = nglview.show_simpletraj(nglview.SimpletrajTrajectory(concoord_trr, prot_ca), gui=True)
view.center()
view.add_representation(repr_type='ball+stick', selection='all')
view._remote_call('setSize', target='Widget', args=['','600px'])
view
257
258
259
260
261
262
263
264
265
266
267
268
269
from biobb_flexdyn.flexdyn.prody_anm import prody_anm

prody_ensemble = pdbCode + "_prody_anm_traj.pdb"

prop = {
    'selection' : 'backbone',
    'num_structs' : num_frames,
    'rmsd' : 2.0
}

prody_anm(  input_pdb_path=monomer,
            output_pdb_path=prody_ensemble,
            properties=prop)
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
from biobb_analysis.ambertools.cpptraj_rms import cpptraj_rms

prody_rmsd = pdbCode + "_prody_rmsd.dat"

prop = {
    'start': 1,
    'end': -1,
    'steps': 1,
    'mask': 'c-alpha',
    'reference': 'experimental'
}
cpptraj_rms(input_top_path=prody_ensemble,
            input_traj_path=prody_ensemble,
            output_cpptraj_path=prody_rmsd,
            input_exp_path= monomer,
            properties=prop)
292
293
294
295
296
297
298
299
300
301
302
303
304
305
df = pd.read_csv(prody_rmsd, header = 0, delimiter='\s+')

plotly.offline.init_notebook_mode(connected=True)

fig = {
    "data": [go.Histogram(x=df['RMSD_00004'], xbins=dict(
    size=0.04), autobinx=False)],
    "layout": go.Layout(title="RMSd variance",
                        xaxis=dict(title = "RMSd (Angstroms)"),
                        yaxis=dict(title = "Population")
                       )
}

plotly.offline.iplot(fig)
309
310
311
312
313
314
315
316
317
318
319
320
321
from biobb_analysis.ambertools.cpptraj_convert import cpptraj_convert

prody_trr = pdbCode + "_prody_anm_traj.trr"

prop = {
    'mask' : 'c-alpha',
    'format': 'trr'
}

cpptraj_convert(input_top_path=prot_backbone,
                input_traj_path=prody_ensemble,
                output_cpptraj_path=prody_trr,
                properties=prop)
325
326
327
328
329
330
# Show trajectory
view = nglview.show_simpletraj(nglview.SimpletrajTrajectory(prody_trr, prot_ca), gui=True)
view.center()
view.add_representation(repr_type='ball+stick', selection='all')
view._remote_call('setSize', target='Widget', args=['','600px'])
view
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
# Running Brownian Dynamics (BD)
# Import module
from biobb_flexserv.flexserv.bd_run import bd_run

# Create properties dict and inputs/outputs

bd_log = pdbCode + '_flexserv_bd_ensemble.log'
bd_crd = pdbCode + '_flexserv_bd_ensemble.mdcrd'

wfreq = 100
time = num_frames * wfreq

prop = {
    'time': time,
    'wfreq': wfreq
}

bd_run( 
     input_pdb_path=prot_ca,
     output_crd_path=bd_crd,
     output_log_path=bd_log,
     properties=prop
)
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
from biobb_analysis.ambertools.cpptraj_rms import cpptraj_rms

flexserv_bd_rmsd = pdbCode + "_flexserv_bd_rmsd.dat"

prop = {
    'start': 1,
    'end': -1,
    'steps': 1,
    'mask': 'c-alpha',
    'reference': 'experimental'
}
cpptraj_rms(input_top_path=prot_ca,
            input_traj_path=bd_crd,
            output_cpptraj_path=flexserv_bd_rmsd,
            input_exp_path=monomer,
            properties=prop)
379
380
381
382
383
384
385
386
387
388
389
390
391
392
df = pd.read_csv(flexserv_bd_rmsd, header = 0, delimiter='\s+')

plotly.offline.init_notebook_mode(connected=True)

fig = {
    "data": [go.Histogram(x=df['RMSD_00004'], xbins=dict(
    size=0.04), autobinx=False)],
    "layout": go.Layout(title="RMSd variance",
                        xaxis=dict(title = "RMSd (Angstroms)"),
                        yaxis=dict(title = "Population")
                       )
}

plotly.offline.iplot(fig)
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
from biobb_analysis.ambertools.cpptraj_rms import cpptraj_rms

flexserv_bd_rmsd = pdbCode + "_flexserv_bd_rmsd.dat" 
flexserv_bd_traj_fitted = pdbCode + "_flexserv_bd_traj_fitted.trr"

prop = {
    'start': 1,
    'end': -1,
    'steps': 1,
    'mask': 'c-alpha',
    'reference': 'experimental'
}
cpptraj_rms(input_top_path=prot_ca,
            input_traj_path=bd_crd,
            output_cpptraj_path=flexserv_bd_rmsd,
            output_traj_path=flexserv_bd_traj_fitted,
            input_exp_path= monomer,
            properties=prop)
417
418
419
420
421
422
# Show trajectory
view = nglview.show_simpletraj(nglview.SimpletrajTrajectory(flexserv_bd_traj_fitted, prot_ca), gui=True)
view.add_representation(repr_type='ball+stick', selection='all')
view.center()
view._remote_call('setSize', target='Widget', args=['','600px'])
view
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
# Running Discrete Molecular Dynamics (DMD)
# Import module
from biobb_flexserv.flexserv.dmd_run import dmd_run

# Create properties dict and inputs/outputs

dmd_log = pdbCode + '_flexserv_dmd_ensemble.log'
dmd_crd = pdbCode + '_flexserv_dmd_ensemble.mdcrd'

prop = {
    'frames': num_frames
}
dmd_run( 
     input_pdb_path=prot_ca,
     output_crd_path=dmd_crd,
     output_log_path=dmd_log,
     properties=prop
)
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
from biobb_analysis.ambertools.cpptraj_rms import cpptraj_rms

flexserv_dmd_rmsd = pdbCode + "_flexserv_dmd_rmsd.dat"

prop = {
    'start': 1,
    'end': -1,
    'steps': 1,
    'mask': 'c-alpha',
    'reference': 'experimental'
}

cpptraj_rms(input_top_path=prot_ca,
            input_traj_path=dmd_crd,
            output_cpptraj_path=flexserv_dmd_rmsd,
            input_exp_path= monomer,
            properties=prop)
467
468
469
470
471
472
473
474
475
476
477
478
479
480
df = pd.read_csv(flexserv_dmd_rmsd, header = 0, delimiter='\s+')

plotly.offline.init_notebook_mode(connected=True)

fig = {
    "data": [go.Histogram(x=df['RMSD_00004'], xbins=dict(
    size=0.04), autobinx=False)],
    "layout": go.Layout(title="RMSd variance",
                        xaxis=dict(title = "RMSd (Angstroms)"),
                        yaxis=dict(title = "Population")
                       )
}

plotly.offline.iplot(fig)
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
from biobb_analysis.ambertools.cpptraj_rms import cpptraj_rms

flexserv_dmd_rmsd = pdbCode + "_flexserv_dmd_rmsd.dat"
flexserv_dmd_traj_fitted = pdbCode + "_flexserv_dmd_traj_fitted.trr"

prop = {
    'start': 1,
    'end': -1,
    'steps': 1,
    'mask': 'c-alpha',
    'reference': 'experimental'
}

cpptraj_rms(input_top_path=prot_ca,
            input_traj_path=dmd_crd,
            output_cpptraj_path=flexserv_dmd_rmsd,
            output_traj_path=flexserv_dmd_traj_fitted,
            input_exp_path= monomer,
            properties=prop)
506
507
508
509
510
511
# Show trajectory
view = nglview.show_simpletraj(nglview.SimpletrajTrajectory(flexserv_dmd_traj_fitted, prot_ca), gui=True)
view.add_representation(repr_type='ball+stick', selection='all')
view.center()
view._remote_call('setSize', target='Widget', args=['','600px'])
view
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
# Running Normal Mode Analysis (NMA)
# Import module
from biobb_flexserv.flexserv.nma_run import nma_run

# Create properties dict and inputs/outputs

nma_log = pdbCode + '_flexserv_nma_ensemble.log'
nma_crd = pdbCode + '_flexserv_nma_ensemble.mdcrd'

prop = {
    'frames' : num_frames
}

nma_run( 
     input_pdb_path=prot_ca,
     output_crd_path=nma_crd,
     output_log_path=nma_log,
     properties=prop
)
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
from biobb_analysis.ambertools.cpptraj_rms import cpptraj_rms

flexserv_nma_rmsd = pdbCode + "_flexserv_nma_rmsd.dat"

prop = {
    'start': 1,
    'end': -1,
    'steps': 1,
    'mask': 'c-alpha',
    'reference': 'experimental'
}

cpptraj_rms(input_top_path=prot_ca,
            input_traj_path=nma_crd,
            output_cpptraj_path=flexserv_nma_rmsd,
            input_exp_path= monomer,
            properties=prop)
557
558
559
560
561
562
563
564
565
566
567
568
569
570
df = pd.read_csv(flexserv_nma_rmsd, header = 0, delimiter='\s+')

plotly.offline.init_notebook_mode(connected=True)

fig = {
    "data": [go.Histogram(x=df['RMSD_00004'], xbins=dict(
    size=0.04), autobinx=False)],
    "layout": go.Layout(title="RMSd variance",
                        xaxis=dict(title = "RMSd (Angstroms)"),
                        yaxis=dict(title = "Population")
                       )
}

plotly.offline.iplot(fig)
574
575
576
577
578
579
580
581
582
583
584
585
586
from biobb_analysis.ambertools.cpptraj_convert import cpptraj_convert

nma_trr = pdbCode + '_flexserv_nma_ensemble.trr'

prop = {
    'mask' : 'c-alpha',
    'format': 'trr'
}

cpptraj_convert(input_top_path=prot_ca,
                input_traj_path=nma_crd,
                output_cpptraj_path=nma_trr,
                properties=prop)
590
591
592
593
594
595
# Show trajectory
view = nglview.show_simpletraj(nglview.SimpletrajTrajectory(nma_trr, prot_ca), gui=True)
view.add_representation(repr_type='ball+stick', selection='all')
view.center()
view._remote_call('setSize', target='Widget', args=['','600px'])
view
599
600
601
602
603
604
605
606
607
608
609
610
from biobb_flexdyn.flexdyn.nolb_nma import nolb_nma

nolb_pdb = pdbCode + '_nolb_ensemble.pdb'

prop = {
    'num_structs' : num_frames,
    'rmsd' : 4
}

nolb_nma(   input_pdb_path=prot_ca,
        output_pdb_path=nolb_pdb,
        properties=prop)
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
from biobb_analysis.ambertools.cpptraj_rms import cpptraj_rms

nolb_rmsd = pdbCode + "_nolb_rmsd.dat"

prop = {
    'start': 1,
    'end': -1,
    'steps': 1,
    'mask': 'c-alpha',
    'reference': 'experimental'
}

cpptraj_rms(input_top_path=prot_ca,
            input_traj_path=nolb_pdb,
            output_cpptraj_path=nolb_rmsd,
            input_exp_path= monomer,
            properties=prop)
634
635
636
637
638
639
640
641
642
643
644
645
646
647
df = pd.read_csv(nolb_rmsd, header = 0, delimiter='\s+')

plotly.offline.init_notebook_mode(connected=True)

fig = {
    "data": [go.Histogram(x=df['RMSD_00004'], xbins=dict(
    size=0.04), autobinx=False)],
    "layout": go.Layout(title="RMSd variance",
                        xaxis=dict(title = "RMSd (Angstroms)"),
                        yaxis=dict(title = "Population")
                       )
}

plotly.offline.iplot(fig)
651
652
653
654
655
656
657
658
659
660
661
662
663
from biobb_analysis.ambertools.cpptraj_convert import cpptraj_convert

nolb_trr = pdbCode + '_nolb_ensemble.trr'

prop = {
    'mask' : 'c-alpha',
    'format': 'trr'
}

cpptraj_convert(input_top_path=prot_ca,
                input_traj_path=nolb_pdb,
                output_cpptraj_path=nolb_trr,
                properties=prop)
667
668
669
670
671
672
673
# Show trajectory
view = nglview.show_simpletraj(nglview.SimpletrajTrajectory(nolb_trr, prot_ca), gui=True)
view.clear_representations()
view.add_representation(repr_type='ball+stick', selection='all')
view.center()
view._remote_call('setSize', target='Widget', args=['','600px'])
view
677
678
679
680
681
682
683
684
685
686
687
from biobb_flexdyn.flexdyn.imod_imode import imod_imode

imode_evecs = pdbCode + '_imode_evecs.dat'

prop = {
    'cg' : 2
}

imod_imode(  input_pdb_path=monomer,
        output_dat_path=imode_evecs,
        properties=prop)
691
692
693
694
695
696
697
698
699
700
701
702
703
704
from biobb_flexdyn.flexdyn.imod_imc import imod_imc

imc_pdb = pdbCode + '_imc.pdb'

prop = {
    'num_structs': num_frames,
    'num_modes': 10,
    'amplitude': 6.0
}

imod_imc(   input_pdb_path=monomer,
            input_dat_path=imode_evecs,
            output_traj_path=imc_pdb,
            properties=prop)
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
from biobb_analysis.ambertools.cpptraj_rms import cpptraj_rms

imods_rmsd = pdbCode + "_imods_rmsd.dat"

prop = {
    'start': 1,
    'end': -1,
    'steps': 1,
    'mask': 'c-alpha',
    'reference': 'experimental'
}

cpptraj_rms(input_top_path=imc_pdb,
            input_traj_path=imc_pdb,
            output_cpptraj_path=imods_rmsd,
            input_exp_path= monomer,
            properties=prop)
728
729
730
731
732
733
734
735
736
737
738
739
740
741
df = pd.read_csv(imods_rmsd, header = 0, delimiter='\s+')

plotly.offline.init_notebook_mode(connected=True)

fig = {
    "data": [go.Histogram(x=df['RMSD_00004'], xbins=dict(
    size=0.04), autobinx=False)],
    "layout": go.Layout(title="RMSd variance",
                        xaxis=dict(title = "RMSd (Angstroms)"),
                        yaxis=dict(title = "Population")
                       )
}

plotly.offline.iplot(fig)
745
746
747
748
749
750
751
752
753
754
755
756
757
from biobb_analysis.ambertools.cpptraj_convert import cpptraj_convert

imods_trr = pdbCode + '_imods_ensemble.trr'

prop = {
    'mask' : 'c-alpha',
    'format': 'trr'
}

cpptraj_convert(input_top_path=imc_pdb,
                input_traj_path=imc_pdb,
                output_cpptraj_path=imods_trr,
                properties=prop)
761
762
763
764
765
766
767
# Show trajectory
view = nglview.show_simpletraj(nglview.SimpletrajTrajectory(imods_trr, prot_ca), gui=True)
view.clear_representations()
view.add_representation(repr_type='ball+stick', selection='all')
view.center()
view._remote_call('setSize', target='Widget', args=['','600px'])
view
771
772
773
774
775
776
777
778
779
traj_zip = pdbCode + "_concat_traj.zip"

with zipfile.ZipFile(traj_zip, 'w') as myzip:
    myzip.write(concoord_trr)
    myzip.write(prody_trr)
    myzip.write(imods_trr)
    #myzip.write(flexserv_bd_traj_fitted)
    myzip.write(flexserv_dmd_traj_fitted)    
    myzip.write(nma_trr)
783
784
785
786
787
788
from biobb_gromacs.gromacs.trjcat import trjcat

concat_trr = pdbCode + "_concat_traj.trr"

trjcat(input_trj_zip_path=traj_zip,
       output_trj_path=concat_trr)
792
793
794
795
796
797
798
799
800
from biobb_gromacs.gromacs.make_ndx import make_ndx

gmx_index_file = pdbCode + "_gmx_ndx.ndx"

prop = { 'selection': 3 }

make_ndx(input_structure_path=prot_ca,
         output_ndx_path=gmx_index_file,
         properties=prop)
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
from biobb_analysis.gromacs.gmx_cluster import gmx_cluster

cluster_concat_pdb = pdbCode + "_concat_cluster.pdb"

prop = {
    'fit_selection': 'System',
    'output_selection': 'System',
    'method': 'linkage',
    'cutoff': 0.12 # (0.12 nm = 1.2 Angstroms) 
    #'cutoff': 0.15 # (0.15 nm = 1.5 Angstroms) 
}

gmx_cluster(input_structure_path=prot_ca,
            input_traj_path=concat_trr,
            input_index_path=gmx_index_file,
            output_pdb_path=cluster_concat_pdb,
            properties=prop)
824
825
826
827
828
# Show protein
view = nglview.show_structure_file(cluster_concat_pdb, default_representation=False)
view.add_representation(repr_type='tube', selection='all', color='modelindex')
view._remote_call('setSize', target='Widget', args=['','600px'])
view
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
from biobb_analysis.ambertools.cpptraj_rms import cpptraj_rms

meta_traj_rmsd = pdbCode + "_meta_traj_rmsd.dat" 
meta_traj_fitted = pdbCode + "_meta_traj_fitted.crd"

prop = {
    'start': 1,
    'end': -1,
    'steps': 1,
    'mask': 'c-alpha',
    'reference': 'experimental'
}
cpptraj_rms(input_top_path=prot_ca,
            input_traj_path=cluster_concat_pdb,
            output_cpptraj_path=meta_traj_rmsd,
            output_traj_path=meta_traj_fitted,
            input_exp_path= prot_ca,
            properties=prop)
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
from biobb_flexserv.pcasuite.pcz_zip import pcz_zip

concat_pcz = pdbCode + '_concat_ensemble.pcz'
concat_pcz_gaussian = pdbCode + '_concat_ensemble_gaussian.pcz'

# Classical RMSd fitting
prop = {
    'variance': 90,
    'neigenv' : 10
}

pcz_zip( input_pdb_path=prot_ca,
        input_crd_path=meta_traj_fitted,
        output_pcz_path=concat_pcz,
        properties=prop)

# Gaussian (weighted) RMSd fitting
prop = {
    'variance': 90,
    'neigenv' : 10,
    'gauss_rmsd' : True
}

pcz_zip( input_pdb_path=prot_ca,
        input_crd_path=meta_traj_fitted,
        output_pcz_path=concat_pcz_gaussian,
        properties=prop)
883
884
885
886
887
888
889
890
from biobb_flexserv.pcasuite.pcz_info import pcz_info

pcz_report = pdbCode + "_pcz_report.json"

pcz_info( 
    input_pcz_path=concat_pcz,
    output_json_path=pcz_report
)
894
895
896
with open(pcz_report, 'r') as f:
  pcz_info = json.load(f)
print(json.dumps(pcz_info, indent=2))
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
# Plotting Variance Profile
y = np.array(pcz_info['Eigen_Values'])
x = list(range(1,len(y)+1))

plotly.offline.init_notebook_mode(connected=True)

fig = {
    "data": [go.Scatter(x=x, y=y)],
    "layout": go.Layout(title="Variance Profile",
                        xaxis=dict(title = "Principal Component"),
                        yaxis=dict(title = "Variance")
                       )
}

plotly.offline.iplot(fig)
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
# Plotting Dimensionality/quality profile
y = np.array(pcz_info['Eigen_Values_dimensionality_vs_total'])
x = list(range(1,len(y)+1))

plotly.offline.init_notebook_mode(connected=True)

fig = {
    "data": [go.Scatter(x=x, y=y)],
    "layout": go.Layout(title="Dimensionality/Quality profile",
                        xaxis=dict(title = "Principal Component"),
                        yaxis=dict(title = "Accumulated Quality (%)")
                       )
}

plotly.offline.iplot(fig)
936
937
938
939
940
941
942
943
944
945
946
947
from biobb_flexserv.pcasuite.pcz_evecs import pcz_evecs

pcz_evecs_report = pdbCode + "_pcz_evecs.json"

prop = {
    'eigenvector': 1
}

pcz_evecs( 
        input_pcz_path=concat_pcz,
        output_json_path=pcz_evecs_report,
        properties=prop)
951
952
953
with open(pcz_evecs_report, 'r') as f:
  pcz_evecs_report_json = json.load(f)
print(json.dumps(pcz_evecs_report_json, indent=2))
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
# Plotting Eigen Value Residue Components
y = np.array(pcz_evecs_report_json['projs'])
x = list(range(1,len(y)+1))

plotly.offline.init_notebook_mode(connected=True)

fig = {
    "data": [go.Bar(x=x, y=y)],
    "layout": go.Layout(title="Eigen Value Residue Components",
                        xaxis=dict(title = "Residue Number"),
                        yaxis=dict(title = "\u00C5")
                       )
}

plotly.offline.iplot(fig)
975
976
977
978
979
980
981
982
983
984
985
from biobb_flexserv.pcasuite.pcz_animate import pcz_animate

proj1 = pdbCode + "_pcz_proj1.crd"

prop = {
    'eigenvector': 1  # Try changing the eigenvector number!
}

pcz_animate( input_pcz_path=concat_pcz,
        output_crd_path=proj1,
        properties=prop)
 989
 990
 991
 992
 993
 994
 995
 996
 997
 998
 999
1000
from biobb_analysis.ambertools.cpptraj_convert import cpptraj_convert

proj1_dcd = pdbCode + '_pcz_proj1.dcd'

prop = {
    'format': 'dcd'
}

cpptraj_convert(input_top_path=prot_ca,
                input_traj_path=proj1,
                output_cpptraj_path=proj1_dcd,
                properties=prop)
1004
1005
1006
1007
1008
1009
1010
# Show trajectory
view = nglview.show_simpletraj(nglview.SimpletrajTrajectory(proj1_dcd, prot_ca), gui=True)
#view.add_representation(repr_type='spacefill', radius=0.7, selection='all')
view.add_representation(repr_type='surface', selection='all')
view.center()
view._remote_call('setSize', target='Widget', args=['','600px'])
view
1014
1015
1016
1017
1018
1019
1020
1021
1022
1023
1024
1025
1026
1027
1028
1029
from biobb_flexserv.pcasuite.pcz_bfactor import pcz_bfactor

bfactor_all_dat = pdbCode + "_bfactor_all.dat"
bfactor_all_pdb = pdbCode + "_bfactor_all.pdb"

prop = {
    'eigenvector': 0,
    'pdb': True
}

pcz_bfactor( 
    input_pcz_path=concat_pcz,
    output_dat_path=bfactor_all_dat,
    output_pdb_path=bfactor_all_pdb,
    properties=prop
)
1033
1034
1035
1036
1037
1038
1039
1040
1041
1042
1043
1044
1045
1046
1047
# Plotting the B-factors x Residue x PCA mode
y = np.loadtxt(bfactor_all_dat)
x = list(range(1,len(y)+1))

plotly.offline.init_notebook_mode(connected=True)

fig = {
    "data": [go.Scatter(x=x, y=y)],
    "layout": go.Layout(title="Bfactor x Residue x PCA Modes (All)",
                        xaxis=dict(title = "Residue Number"),
                        yaxis=dict(title = "Bfactor (" + '\u00C5' +'\u00B2' + ")")
                       )
}

plotly.offline.iplot(fig)
1051
1052
1053
1054
1055
1056
1057
1058
1059
1060
1061
1062
1063
1064
1065
1066
1067
1068
1069
1070
1071
1072
1073
1074
1075
# Show trajectory
view = nglview.show_simpletraj(nglview.SimpletrajTrajectory(proj1_dcd, bfactor_all_pdb))
view.add_representation(repr_type='spacefill', selection='all', colorScheme='bfactor')
view.add_representation(repr_type='tube', radius='0.4', selection='all', color='white')
view._remote_call('setSize', target='Widget', args=['','600px'])

stop = False
def loop(view):
    import time
    def do():
        while True and not stop:
            if view.frame == view.max_frame:
                direction = -1
            if view.frame == 0:
                direction = 1
            view.frame = view.frame + direction
            time.sleep(0.2)
    view._run_on_another_thread(do)

view.on_displayed(loop)

view._iplayer.children[0].disabled = True
view._iplayer.children[1].disabled = True

view
1079
1080
1081
1082
1083
1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
1103
1104
1105
1106
1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
1120
from biobb_flexserv.pcasuite.pcz_hinges import pcz_hinges

hinges_bfactor_report = pdbCode + "_hinges_bfactor_report.json"
hinges_dyndom_report = pdbCode + "_hinges_dyndom_report.json"
hinges_fcte_report = pdbCode + "_hinges_fcte_report.json"

bfactor_method = "Bfactor_slope"
dyndom_method = "Dynamic_domain"
fcte_method = "Force_constant"

bfactor_prop = {
    'eigenvector': 0, # 0 = All modes
    'method': bfactor_method
}

dyndom_prop = {
    'eigenvector': 0, # 0 = All modes
    'method': dyndom_method
}

fcte_prop = {
    'eigenvector': 0, # 0 = All modes
    'method': fcte_method
}

pcz_hinges( 
        input_pcz_path=concat_pcz_gaussian,
        output_json_path=hinges_bfactor_report,
        properties=bfactor_prop
)

pcz_hinges( 
        input_pcz_path=concat_pcz_gaussian,
        output_json_path=hinges_dyndom_report,
        properties=dyndom_prop
)

pcz_hinges( 
        input_pcz_path=concat_pcz_gaussian,
        output_json_path=hinges_fcte_report,
        properties=fcte_prop
)
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
with open(hinges_bfactor_report, 'r') as f:
  hinges_bfactor = json.load(f)
print(json.dumps(hinges_bfactor, indent=2))

with open(hinges_dyndom_report, 'r') as f:
  hinges_dyndom = json.load(f)
print(json.dumps(hinges_dyndom, indent=2))

with open(hinges_fcte_report, 'r') as f:
  hinges_fcte = json.load(f)
print(json.dumps(hinges_fcte, indent=2))
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
# Show trajectory
view1 = nglview.show_simpletraj(nglview.SimpletrajTrajectory(proj1_dcd, bfactor_all_pdb), gui=True)
view1.add_representation(repr_type='surface', selection=hinges_dyndom["clusters"][0]["residues"], color='red')
view1.add_representation(repr_type='surface', selection=hinges_dyndom["clusters"][1]["residues"], color='green')
#view1.add_representation(repr_type='surface', selection=hinges_dyndom["clusters"][2]["residues"], color='yellow')
#view1.add_representation(repr_type='surface', selection=hinges_dyndom["hinge_residues"], color='red')
view1._remote_call('setSize', target='Widget', args=['350px','350px'])
view1
view2 = nglview.show_simpletraj(nglview.SimpletrajTrajectory(proj1_dcd, bfactor_all_pdb), gui=True)
view2.add_representation(repr_type='surface', selection=hinges_bfactor["hinge_residues"], color='red')
view2._remote_call('setSize', target='Widget', args=['350px','350px'])
view2
view3 = nglview.show_simpletraj(nglview.SimpletrajTrajectory(proj1_dcd, bfactor_all_pdb), gui=True)
view3.add_representation(repr_type='surface', selection=str(hinges_fcte["hinge_residues"]), color='red')    
view3._remote_call('setSize', target='Widget', args=['350px','350px'])
view3
ipywidgets.HBox([view1, view2, view3])
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
from biobb_flexserv.pcasuite.pcz_stiffness import pcz_stiffness

stiffness_report = pdbCode + "_pcz_stiffness.json"

prop = {
    'eigenvector': 0 # 0 = All modes
}

pcz_stiffness( 
        input_pcz_path=concat_pcz,
        output_json_path=stiffness_report,
        properties=prop
)
1174
1175
1176
with open(stiffness_report, 'r') as f:
  pcz_stiffness_report = json.load(f)
print(json.dumps(pcz_stiffness_report, indent=2))
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
y = np.array(pcz_stiffness_report['stiffness'])
x = list(range(1,len(y)))

plotly.offline.init_notebook_mode(connected=True)

fig = {
    "data": [go.Heatmap(x=x, y=x, z=y, type = 'heatmap', colorscale = 'reds')],
    "layout": go.Layout(title="Apparent Stiffness",
                        xaxis=dict(title = "Residue Number"),
                        yaxis=dict(title = "Residue Number"),
                        width=800, height=800
                       )
}

plotly.offline.iplot(fig)
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
y = np.array(pcz_stiffness_report['stiffness_log'])
x = list(range(1,len(y)))

plotly.offline.init_notebook_mode(connected=True)

fig = {
    "data": [go.Heatmap(x=x, y=x, z=y, type = 'heatmap', colorscale = 'reds')],
    "layout": go.Layout(title="Apparent Stiffness (Logarithmic Scale)",
                        xaxis=dict(title = "Residue Number"),
                        yaxis=dict(title = "Residue Number"),
                        width=800, height=800
                       )
}

plotly.offline.iplot(fig)
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
from biobb_flexserv.pcasuite.pcz_collectivity import pcz_collectivity

pcz_collectivity_report = pdbCode + "_pcz_collectivity.json"

prop = {
    'eigenvector':0 # 0 = All modes
}

pcz_collectivity( 
    input_pcz_path=concat_pcz,
    output_json_path=pcz_collectivity_report,
    properties=prop
)
1232
1233
1234
with open(pcz_collectivity_report, 'r') as f:
  pcz_collectivity_report_json = json.load(f)
print(json.dumps(pcz_collectivity_report_json, indent=2))
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
z = np.array(pcz_collectivity_report_json['collectivity'])
x = list(range(1,len(z)+1))
x = ["PC" + str(pc) for pc in x]

y = [""]

plotly.offline.init_notebook_mode(connected=True)

fig = {
    "data": [go.Heatmap(x=x, y=y, z=[z], type = 'heatmap', colorscale = 'reds')],
    "layout": go.Layout(title="Collectivity Index",
                        xaxis=dict(title = "Principal Component"),
                        yaxis=dict(title = "Collectivity"),
                        width=1000, height=300
                       )
}

plotly.offline.iplot(fig)
ShowHide 66 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...