Use Public Resources to answer a biological question

public public 1yr ago Version: Version 1 0 bookmarks

Learning Objectives

  • How to access genomic resource via its Python API
  • How to access image resource via its Python API
  • Relate image data to genomic data

Diabetes related genes expressed in pancreas

This notebook looks at the question Which diabetes related genes are expressed in the pancreas? Tissue and disease can be modified.

Steps:

  • Query humanmine.org , an integrated database of Homo sapiens genomic data using the intermine API to find the genes.
  • Using the list of found genes, search in the Image Data Resource (IDR) for images linked to the genes, tissue and disease.
  • Analyse the images found.

Launch

This notebook uses the environment.yml file.

See Setup .

Code Snippets

2
3
4
5
6
# Package required to interact with HumanMine
%pip install git+https://github.com/jburel/intermine-ws-python.git@python_3_10

# Package required to interact with IDR or OMERO
%pip install omero-py
10
11
12
13
14
15
# libraries to interact with intermine
from intermine.webservice import Service

# libraries to interact with IDR
import requests
import json
19
service = Service("https://www.humanmine.org/humanmine/service")
23
query = service.new_query("Gene")
27
28
29
30
31
query.add_view(
    "primaryIdentifier", "symbol", "proteinAtlasExpression.cellType",
    "proteinAtlasExpression.level", "proteinAtlasExpression.reliability",
    "proteinAtlasExpression.tissue.name"
)
35
36
TISSUE = "Pancreas"
DISEASE = "diabetes"
40
41
42
43
query.add_constraint("proteinAtlasExpression.tissue.name", "=", TISSUE)
query.add_constraint("proteinAtlasExpression.level", "ONE OF", ["Medium", "High"])
query.add_constraint("organism.name", "=", "Homo sapiens")
query.add_constraint("diseases.name", "CONTAINS", DISEASE)
47
48
49
50
upin_tissue = set()
for row in query.rows():
    upin_tissue.add(row["symbol"])
genes = sorted(upin_tissue, reverse=True)
54
55
56
57
for i, a in enumerate(genes):
    print(a, end=' ')
    if i % 8 == 7: 
        print("")
61
62
63
64
65
66
67
68
69
INDEX_PAGE = "https://idr.openmicroscopy.org/webclient/?experimenter=-1"

# create http session
with requests.Session() as session:
    request = requests.Request('GET', INDEX_PAGE)
    prepped = session.prepare_request(request)
    response = session.send(prepped)
    if response.status_code != 200:
        response.raise_for_status()
73
74
75
SEARCH_URL = "https://idr.openmicroscopy.org/searchengine/api/v1/resources/{type}/search/"
KEY_VALUE_SEARCH = SEARCH_URL + "?key={key}&value={value}"
KEY = "Gene Symbol"
79
80
81
82
83
84
85
86
87
88
89
%%time
import collections
from collections import defaultdict

results = {}
for gene in genes:
    qs1 = {'type': 'image', 'key': KEY, 'value': gene}
    url = KEY_VALUE_SEARCH.format(**qs1)
    json = session.get(url).json()
    images = json['results']['results']
    results[gene] = images
93
94
95
96
# Annotation key in IDR to find and filter by.
EXPRESSION_KEY = "Expression Pattern Description"
EXPRESSION = "Islets"
STAGE = "Developmental Stage"
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
development_stage = {}
for k in results:
    images = results[k]
    result_images = defaultdict(list)
    for image in images:
        values = image["key_values"]
        stage = ""
        for v in values:
            name = v["name"]
            value = v['value']
            if name == STAGE:
                stage = value
            if name == EXPRESSION_KEY and EXPRESSION in value:
                result_images[stage].append(image["id"])
    development_stage[k] = result_images.items()
118
print(development_stage)
122
123
124
125
# URLs to retrieve the thumbnails and link to the images in IDR
BASE_URL = "https://idr.openmicroscopy.org/webclient"
IMAGE_DATA_URL = BASE_URL + "/render_thumbnail/{id}"
LINK_URL = BASE_URL + "/?show=image-{id}"
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
# Display the images
from ipywidgets import AppLayout, widgets

table_widget = widgets.HTML("")

html = "<table>"
for gene in development_stage:
    images = development_stage[gene]
    if len(images) > 0:
        html += '<tr><td><h2>Gene: '+gene+'</h2></td></tr><tr>'
        for k, v in images:
            html += '<tr><td><h4>Developmental stage: '+k+'</h4></td></tr><tr>'
            for i in v:
                qs = {'id': i}
                url = IMAGE_DATA_URL.format(**qs)
                url_link = LINK_URL.format(**qs)
                html += '<td><a href="'+url_link+'" target="_blank"><img src="'+url+'"/></a></td>'
            html += "</tr>"
        html += "</tr>"
html += "</table>"

table_widget.value = html
AppLayout(header=None,
          left_sidebar=None,
          center=table_widget,
          right_sidebar=None,
          footer=None)
159
160
161
PART_KEY = "Organism Part"
PATHOLOGY_KEY = "Pathology"
PATHOLOGY_NORMAL_VALUE = "Normal"
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
pathology_images = {}
for k in results:
    images = results[k]
    result_images = defaultdict(list)
    for image in images:
        values = image["key_values"]
        part = None
        for v in values:
            name = v["name"]
            value = v['value']
            if name is not None and PART_KEY in name and (TISSUE or EXPRESSION in value):
                part = value
        for v in values:
            name = v["name"]
            value = v['value']
            if part is not None and name == PATHOLOGY_KEY:
                if PATHOLOGY_NORMAL_VALUE in value:
                    result_images[PATHOLOGY_NORMAL_VALUE].append(image["id"])
                else:
                    result_images[value].append(image["id"])
    pathology_images[k] = result_images.items()
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
import pandas as pd
import matplotlib.pyplot as plt
disease_map = {}
gene = "PDX1"
images = pathology_images[gene]
if len(images) == 0:
    print("No images found")
else:
    for k, v in images:
        if k != PATHOLOGY_NORMAL_VALUE:
            disease_map[k] = len(v)

    disease_ordered = collections.OrderedDict(sorted(disease_map.items()))
    df = pd.DataFrame({'disease':disease_ordered.items(),
                       'number of images':disease_ordered.values()})
    df.plot(kind='barh', x='disease', y='number of images', figsize=(10,10))
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
from ipywidgets import GridspecLayout, widgets

increase = 8
max_value = increase
min_value = 0

disease = ""

def display_images(images, min, max):
    html = "<table>"
    html += '<tr>'
    if min < 0:
        min = 0
    if max >= len(images):
        max = len(images)


    for i in images[min:max]:
        qs = {'id': i}
        url = IMAGE_DATA_URL.format(**qs)
        url_link = LINK_URL.format(**qs)
        html += '<td><a href="'+url_link+'" target="_blank"><img src="'+url+'"/></a>&nbsp;</td>'
    html += "</tr>"
    html += "</table>"
    html_widget.value = html

    # Set the number of images found
    count_widget.value = "<b>Number of images found: " + str(len(images)) + "</b>"

def on_selection_change(change):
    global disease
    if change['name'] == 'value':
        values = get_images(change['new']) 
        if values is None:
            return
        disease = change['new']
        min_value = 0
        max_value = increase 
        display_images(values, min_value, max_value)

def get_images(disease):
    for k, v in images:
        if k == disease:
            return v
    return None

def on_click_next(b):
    global min_value
    global max_value
    max_value = max_value + increase
    min_value = min_value + increase
    values = get_images(disease)
    button_previous.disabled = False
    if values is None:
        return
    if max_value > len(values):
        button_next.disabled = True

    display_images(values, min_value, max_value)

def on_click_previous(b):
    global min_value
    global max_value
    max_value = max_value - increase
    min_value = min_value - increase
    button_next.disabled = False
    if min_value <= 0:  # reset 
        min_value = 0
        max_value = increase
        button_previous.disabled = True
    values = get_images(disease)
    if values is not None:
        display_images(values, min_value, max_value)

def dropdown_widget(disease_list,
                    dropdown_widget_name,
                    displaywidget=False):

    selection = widgets.Dropdown(
        options=disease_list,
        value=disease_list[0],
        description=dropdown_widget_name,
        disabled=False,
    )
    selection.observe(on_selection_change)
    display_images(get_images(selection.value), min_value, max_value)
    return selection

disease_list = list(disease_ordered.keys())
disease = disease_list[0]
gene_widget = widgets.HTML("")
count_widget = widgets.HTML("")
html_widget = widgets.HTML("")
disease_box = dropdown_widget(
    disease_list,
    'Disease: ', True
)

button_next = widgets.Button(description="Next>>")
button_next.on_click(on_click_next)

button_previous = widgets.Button(description="<<Previous", disabled=True)
button_previous.on_click(on_click_previous)

gene_widget.value = "Gene: <b>" + gene + "</b>"

grid = GridspecLayout(3, 3)
grid[0, 0] = gene_widget
grid[0, 1] = disease_box
grid[0, 2] = count_widget
grid[2, 0] = button_previous
grid[1, :] = html_widget
grid[2, 2] = button_next
grid
325
326
327
BASE_SEARCH_URL = "https://idr.openmicroscopy.org/searchengine/api/v1/"
IMAGE_SEARCH = "/resources/image/searchannotation/"
IMAGE_SEARCH_PAGE = "/resources/image/searchannotation_page/"
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
'''
**Query 1**
Organism Part           Small intestine OR Duodenum
Pathology               Adenocarcinoma (all) ==> contains (adenocarcinoma)
Gene Symbol             PDX1

'''
query_1 = {
    "query_details": {
        "and_filters": [
            {
                "name": "Gene Symbol",
                "value": "PDX1",
                "operator": "equals",
                "resource": "image"
            },
            {
                "name": "Pathology",
                "value": "adenocarcinoma",
                "operator": "contains",
                "resource": "image"
            }
        ],
        "or_filters": [[
            {
                "name": "Organism Part",
                "value": "Duodenum",
                "operator": "equals",
                "resource": "image"
            },
            {
                "name": "Organism Part",
                "value": "Small intestine",
                "operator": "equals",
                "resource": "image"
            }

        ]

        ],
        "case_sensitive": False
    }
}
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
'''
**Query 2**
Organism Part           Small intestine OR Duodenum
Pathology               normal nos ==> normal tissue, nos
Gene Symbol             PDX1
'''
query_2 = {
    "query_details": {
        "and_filters": [
            {
                "name": "Gene Symbol",
                "value": "PDX1",
                "operator": "equals",
                "resource": "image"
            },
            {
                "name": "Pathology",
                "value": "normal tissue, nos",
                "operator": "equals",
                "resource": "image"
            }
        ],
        "or_filters": [[
            {
                "name": "Organism Part",
                "value": "Duodenum",
                "operator": "equals",
                "resource": "image"
            },
            {
                "name": "Organism Part",
                "value": "Small intestine",
                "operator": "equals",
                "resource": "image"
            }

        ]

        ],
        "case_sensitive": False
    }
}
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
import json
def query_the_search_ending(query):
    received_results_data = []
    query_data = {"query": query}
    resp = requests.post(
        url="%s%s" % (BASE_SEARCH_URL, IMAGE_SEARCH), data=json.dumps(query_data)
    )
    res = resp.text
    try:
        returned_results = json.loads(res)
    except Exception:
        return []
    if not returned_results.get("results") or len(returned_results["results"]) == 0:
        print("Your query returns no results")
        return []
    total_results = returned_results["results"]["size"]
    print("Total no of result records %s" % total_results)
    for res in returned_results["results"]["results"]:
        received_results_data.append(res)

    received_results = len(returned_results["results"]["results"])
    bookmark = returned_results["results"]["bookmark"]
    total_pages = returned_results["results"]["total_pages"]
    page = 1
    while received_results < total_results:
        page += 1
        query_data = {
            "query": {"query_details": returned_results["query_details"]},
            "bookmark": bookmark,
        }
        query_data_json = json.dumps(query_data)
        resp = requests.post(
            url="%s%s" % (BASE_URL, IMAGE_SEARCH_PAGE), data=query_data_json
        )
        res = resp.text
        try:
            returned_results = json.loads(res)
        except Exception as e:
            return
        received_results = received_results + len(returned_results["results"]["results"])
        for res in returned_results["results"]["results"]:
            received_results_data.append(res)
        bookmark = returned_results["results"]["bookmark"]
    return received_results_data
469
results_1 = query_the_search_ending(query_1)
473
results_2 = query_the_search_ending(query_2)
477
478
479
480
481
482
483
484
485
486
487
488
489
490
html = "<table>"
for r in results_2:
    id = r["id"]
    qs = {'id': id}
    url = IMAGE_DATA_URL.format(**qs)
    url_link = LINK_URL.format(**qs)
    html += '<tr><td><b>Image ID: '+str(id)+'</b></td></tr><td><a href="'+url_link+'" target="_blank"><img src="'+url+'"/></a>&nbsp;</td>'
html += "</table>"
table_widget = widgets.HTML("")
table_widget.value = html
AppLayout(header=None, left_sidebar=None,
              center=table_widget,
              right_sidebar=None,
              footer=None)
494
image_id = 4387380
498
499
500
501
502
503
from omero.gateway import BlitzGateway
HOST = 'ws://idr.openmicroscopy.org/omero-ws'
conn = BlitzGateway('public', 'public',
                    host=HOST, secure=True)
print(conn.connect())
conn.c.enableKeepAlive(60)
507
508
image = conn.getObject("Image", image_id)
print(image.getName())
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
import numpy
def load_numpy_array(image):
    pixels = image.getPrimaryPixels()
    size_c = image.getSizeC()
    size_x = image.getSizeX()
    size_y = image.getSizeY()
    z, t = 0, 0  # first plane of the image

    c_list = []
    for c in range(size_c):  # all channels
        c_list.append((z, c, t))

    values = []
    # Load all the planes as YX numpy array
    planes = pixels.getPlanes(c_list)
    print("Downloading image %s" % image.getName())
    all_planes = numpy.dstack(list(planes))
    return all_planes
533
data = load_numpy_array(image)
537
538
539
540
541
542
543
544
def disconnect(conn):
    """
    Disconnect from an OMERO server
    :param conn: The BlitzGateway
    """
    conn.close()

disconnect(conn)
548
549
550
from skimage.color import rgb2hed
# Convert the image to HED using the pre-built skimage method
ihc_hed = rgb2hed(data)
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
# Create an RGB image for each of the stains
from skimage.color import hed2rgb
null = numpy.zeros_like(ihc_hed[:, :, 0],)
ihc_h = hed2rgb(numpy.stack((ihc_hed[:, :, 0], null, null), axis=-1))
ihc_e = hed2rgb(numpy.stack((null, ihc_hed[:, :, 1], null), axis=-1))
ihc_d = hed2rgb(numpy.stack((null, null, ihc_hed[:, :, 2]), axis=-1))

# Display
fig, axes = plt.subplots(2, 2, figsize=(7, 6), sharex=True, sharey=True)
ax = axes.ravel()

ax[0].imshow(data)
ax[0].set_title("Original image")

ax[1].imshow(ihc_h)
ax[1].set_title("Hematoxylin")

ax[2].imshow(ihc_e)
ax[2].set_title("Eosin")  # Note that there is no Eosin stain in this image

ax[3].imshow(ihc_d)
ax[3].set_title("DAB")

for a in ax.ravel():
    a.axis('off')

fig.tight_layout()
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
from skimage.exposure import rescale_intensity

# Rescale hematoxylin and DAB channels and give them a fluorescence look

h = rescale_intensity(ihc_hed[:, :, 0], out_range=(0, 1),
                      in_range=(0, numpy.percentile(ihc_hed[:, :, 0], 99)))
d = rescale_intensity(ihc_hed[:, :, 2], out_range=(0, 1),
                      in_range=(0, numpy.percentile(ihc_hed[:, :, 2], 99)))

# Cast the two channels into an RGB image, as the blue and green channels
# respectively
zdh = numpy.dstack((null, d, h))

fig = plt.figure()
axis = plt.subplot(1, 1, 1, sharex=ax[0], sharey=ax[0])
axis.imshow(zdh)
axis.set_title('Stain-separated image (rescaled)')
axis.axis('off')
plt.show()
606
607
608
609
610
def image_show(image, nrows=1, ncols=1, cmap='gray', **kwargs):
    fig, ax = plt.subplots(nrows=nrows, ncols=ncols, figsize=(16, 16))
    ax.imshow(image, cmap='gray')
    ax.axis('off')
    return fig, ax
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
from skimage import data
from skimage import filters
from skimage.color import rgb2gray
import matplotlib.pyplot as plt

# Setting plot size to 15, 15
plt.figure(figsize=(15, 15))

gray_ihc_d = rgb2gray(ihc_d)

# Computing Otsu's thresholding value
threshold = filters.threshold_otsu(gray_ihc_d)

# Computing binarized values using the obtained
# threshold
binarized_ihc_d = (gray_ihc_d > threshold)*1
plt.subplot(2,2,1)
plt.title("Threshold: >"+str(threshold))

# Displaying the binarized image
plt.imshow(binarized_ihc_d, cmap = "gray")

# Computing Ni black's local pixel
# threshold values for every pixel
threshold = filters.threshold_niblack(gray_ihc_d)

# Computing binarized values using the obtained
# threshold
binarized_ihc_d = (gray_ihc_d > threshold)*1
plt.subplot(2,2,2)
plt.title("Niblack Thresholding")

# Displaying the binarized image
plt.imshow(binarized_ihc_d, cmap = "gray")

# Computing Sauvola's local pixel threshold
# values for every pixel - Not Binarized
threshold = filters.threshold_sauvola(gray_ihc_d)
plt.subplot(2,2,3)
plt.title("Sauvola Thresholding")

# Displaying the local threshold values
plt.imshow(threshold, cmap = "gray")

# Computing Sauvola's local pixel
# threshold values for every pixel - Binarized
binarized_ihc_d = (gray_ihc_d > threshold)*1
plt.subplot(2,2,4)
plt.title("Sauvola Thresholding - Converting to 0's and 1's")

# Displaying the binarized image
plt.imshow(binarized_ihc_d, cmap = "gray")
ShowHide 33 more snippets with no or duplicated tags.

Login to post a comment if you would like to share your experience with this workflow.

Do you know this workflow well? If so, you can request seller status , and start supporting this workflow.

Free

Created: 1yr ago
Updated: 1yr ago
Maitainers: public
URL: https://github.com/ome/EMBL-EBI-imaging-course-05-2023/blob/main/Day_4/PublicResources.ipynb
Name: use-public-resources-to-answer-a-biological-questi
Version: Version 1
Badge:
workflow icon

Insert copied code into your website to add a link to this workflow.

Downloaded: 0
Copyright: Public Domain
License: BSD 2-Clause "Simplified" License
  • Future updates

Related Workflows

cellranger-snakemake-gke
snakemake workflow to run cellranger on a given bucket using gke.
A Snakemake workflow for running cellranger on a given bucket using Google Kubernetes Engine. The usage of this workflow ...