OMERO.tables

The OMERO.tables API unifies the storage of columnar data from various sources, such as automated analysis results or script-based processing, and makes them available within OMERO.

Large and small volumes of tabular data can be stored via named columns, and retrieved in bulk or via paging. A limited query language provides basic filtering and selecting.

Since 5.6, the client library omero-py is available on PyPI and Conda. We recommend to install the library in a Python virtual environment. In the same environment, you should now install PyTables by running pip install tables. Note that if you are installing on Ubuntu 16.04 or Debian 9, you will have to cap the version to 3.4.4.

The interface

The slice definition file for the OMERO.tables API primarily defines two service interfaces and a type hierarchy.

class omero.grid.Table

The central service for dealing with tabular data, described below.

class omero.grid.Tables

An internal service used for managing table services, and can be ignored for almost all purposes.

class omero.grid.Column

The base class for column types which permit returning arrays of columnar values (Ice doesn’t provide an Any type, so it is necessary to group values of the same type). All columns in a table must have the same number of rows.

Note

Attribute names (including column names) beginning with __ (double underscore) are reserved for internal use. This restriction was introduced in OMERO 5.1, Tables created by older versions should continue to work.

Single value columns

These columns store a single value in each row.

class omero.grid.FileColumn(name, description[, values])
class omero.grid.ImageColumn(name, description[, values])
class omero.grid.RoiColumn(name, description[, values])
class omero.grid.WellColumn(name, description[, values])
class omero.grid.PlateColumn(name, description[, values])

Id-based (long) columns which reference omero.model.File, Image, Roi, Well and Plate instances respectively.

class omero.grid.BoolColumn(name, description[, values])

A value column with bool (non-null) values.

class omero.grid.LongColumn(name, description[, values])

A value column with long (non-null, 64-bit) values.

class omero.grid.DoubleColumn(name, description[, values])

A value column with double (non-null, 64-bit) values.

Parameters
  • name (string) – The name of the column, each column in a table must have a unique name.

  • description (string) – The column description, may be empty.

  • values ([]) – A list of values (one value per row) used to initialize a column (optional).

values

A class member holding the list of values stored in the column.

class omero.grid.StringColumn(name, description, size[, values])

A value column which holds strings

Parameters
  • name (string) – The column name.

  • description (string) – The column description.

  • size (long) – The maximum string length that can be stored in this column, >= 1

  • values (string[]) – A list of strings (optional).

Array value columns

These columns store an array in each row.

class omero.grid.FloatArrayColumn(name, description, size[, values])

A value column with fixed-width arrays of float (32 bit) values.

class omero.grid.DoubleArrayColumn(name, description, size[, values])

A value column with fixed-width arrays of double (64 bit) values.

class omero.grid.LongArrayColumn(name, description, size[, values])

A value column with fixed-width arrays of long (64 bit) values.

Parameters
  • name (string) – The column name.

  • description (string) – The column description.

  • size (long) – The width of the array, >= 1

  • values ([][]) – A list of arrays, each of length size (optional).

Warning

The OMERO.tables service currently does limited validation of string and array lengths. When adding or modifying data it is essential that the size parameter of a column matches that of the underlying table.

Warning

Array value columns should be considered experimental for now.

Main methods

class omero.grid.Data

Holds the data retrieved from a table, also used to update a table.

lastModification

The timestamp of the last update to the table.

rowNumbers

The row indices of the values retrieved from the table.

columns

A list of columns

class omero.grid.Table

The main interface to the Tables service.

getHeaders()
Returns

An empty list of columns describing the table. Fill in the values of these columns to add a new row to the table.

getNumberOfRows()
Returns

The number of rows in the table.

readCoordinates(rowNumbers)

Read a set of entire rows in the table.

Parameters

rowNumbers (long[]) – A list of row indices to be retrieved from the table.

Returns

The requested rows as a Data object.

read(colNumbers, start, stop)

Read a subset of columns and consecutive rows from a table.

Parameters
  • colNumber (long[]) – A list of column indices to be retrieved from the table (may be non-consecutive).

  • start (long) – The index of the first row to retrieve.

  • stop (long) – The index of the last+1 row to retrieve (uses similar semantics to range()).

Returns

The requested columns and rows as a Data object.

Note

start=0, stop=0 currently returns the first row instead of empty as would be expected using the normal Python range semantics. This may change in future.

slice(colNumbers, rowNumbers)

Read a subset of columns and rows (may be non-consecutive) from a table.

Parameters
  • colNumbers (long[]) – A list of column indices to be retrieved. The results will be returned in the same order as these indices.

  • rowNumbers (long[]) – A list of row indices to be retrieved. The results will be returned in the same order as these indices.

Returns

The requested columns and rows as a Data object.

getWhereList(condition, variables, start, stop, step)

Run a query on a table, see Query language.

Parameters
  • condition (string) – The query string

  • variables – A mapping of strings and variable values to be substituted into condition. This can often be left empty.

  • start (long) – The index of the first row to consider.

  • stop (long) – The index of the last+1 row to consider.

  • step (long) – The stepping interval between the start and stop rows to consider, using the same semantics as range(). Set to 0 to disable stepping.

Returns

A list of row indices matching the condition which can be passed as the first parameter of readCoordinates() or read().

Note

variables seems to add unnecessary complexity, should it be removed?

initialize(columns)

Initialize a new table. Any column values are ignored, use addData() to add these values.

Parameters

columns (Column[]) – A list of columns whose names and types are used to setup the table.

addData(columns)

Append one or more full rows to the table.

Parameters

columns (Column[]) – A list of columns, such as those returned by getHeaders(), whose values are the rows to be added to the table.

update(data)

Modify one or more columns and/or rows in a table.

Parameters

data (Data) – A Data object previously obtained using read() or readCoordinates() with column values to be updated.

setMetadata(key, value)

Store additional properties associated with a Table.

Parameters
  • key (string) – A key name.

  • value (string/int/float/long) – The value of the property.

setAllMetadata(keyvalues)

Store multiple additional properties associated with a Table. See setMetadata().

Parameters

keyvalues (dict) – A dictionary of key-value pairs.

getMetadata(key)

Get the value of a property.

Parameters

key (string) – The property name.

Returns

A property.

getAllMetadata()

Get all additional properties. See getMetadata().

Returns

All key-value properties.

You many find the Python and Java annotated code samples helpful, in addition to the examples and documentation on the API. These are only an introduction to using OMERO.tables and do not show its full potential, see Going forward for some inspiration.

The implementation

Currently, each table is backed by a single HDF table. Since PyTables (and HDF in the general case) do not support concurrent access, OMERO.tables provides a global locking mechanism which permits multiple views of the same data. Each OMERO.tables file (registered as an OriginalFile in the database), is composed of a single HDF table with any number of certain limited column types.

Query language

The query language mentioned above is currently the PyTables Condition syntax. Columns are referenced by name. The following operators are supported:

  • Logical operators: &, |, ~

  • Comparison operators: <, <=, ==, !=, >=, >

  • Unary arithmetic operators: -

  • Binary arithmetic operators: +, -, *, /, **, %

and the following functions:

  • where(bool, number1, number2): number — number1 if the bool condition is true, number2 otherwise.

  • {sin,cos,tan}(float|complex): float|complex — trigonometric sine, cosine or tangent.

  • {arcsin,arccos,arctan}(float|complex): float|complex — trigonometric inverse sine, cosine or tangent.

  • arctan2(float1, float2): float — trigonometric inverse tangent of float1/float2.

  • {sinh,cosh,tanh}(float|complex): float|complex — hyperbolic sine, cosine or tangent.

  • {arcsinh,arccosh,arctanh}(float|complex): float|complex — hyperbolic inverse sine, cosine or tangent.

  • {log,log10,log1p}(float|complex): float|complex — natural, base-10 and log(1+x) logarithms.

  • {exp,expm1}(float|complex): float|complex — exponential and exponential minus one.

  • sqrt(float|complex): float|complex — square root.

  • {real,imag}(complex): float — real or imaginary part of complex.

  • complex(float, float): complex — complex from real and imaginary parts.

for example, if id is the name of a LongColumn

table.getWhereList(condition='(id>x)', variables={'x':omero.rtypes.rint(5)},
    start=2, stop=10, step=3)

will extract a subset of rows (2, 5, 8) as indicated by start, stop and step, substitute 5 in place of x in the condition, and evaluate condition so as to return the indices of rows where column id is greater than 5.

Going forward

The Tables API itself provides little more than a remotely accessible store, think of it as a server for Excel-like spreadsheets. We are currently looking into the facilities that can be built on top of it, and are very open to suggestions. For example, the IRoi interface has been extended to filter ROIs by a given measurement. This allows seeing only those results from a particular analysis run. The following example shows how to set up such a measurement and retrieve its results:

iroi.py

For an example of production code that parses out such measurements, see populate_roi.py.

The IRoi interface has been integrated into OMERO.insight, allowing for the visualization and export of OMERO.tables:

Choice between multiple measurements

Choice between multiple measurements

We are also looking into a NoSQL-style storage mechanism for OMERO, either as an alternative back-end to OMERO.tables or as an additional key-value type store. Any suggestions or ideas would be very welcome.

See also

PyTables

Software on which OMERO.tables is built.

Condition Syntax

The PyTables condition syntax.

slice definition file

The API definition for OMERO.tables

The Tables test suite

The testsuite for OMERO.tables