OMERO.tables
The OMERO.tables API unifies the storage of columnar data from various sources,
such as automated analysis results or script-based processing, and makes them
available within OMERO.
Large and small volumes of tabular data can be stored via named columns, and
retrieved in bulk or via paging. A limited query language provides basic
filtering and selecting.
For installation instructions, see OMERO.tables
The interface
The slice definition file
for the OMERO.tables API primarily defines two service interfaces and a type
hierarchy.
- class omero.grid.Table
- The central service for dealing with tabular data, described
below.
-
class omero.grid.Tables
An internal service used for managing table services, and can be ignored
for almost all purposes.
-
class omero.grid.Column
The base class for column types which permit returning arrays of
columnar values (Ice doesn’t provide an Any type, so it is
necessary to group values of the same type). All columns in a table
must have the same number of rows.
Note
Attribute names (including column names) beginning with __
(double underscore) are reserved for internal use. This restriction was
introduced in OMERO 5.1, Tables created by older versions should continue
to work.
Single value columns
These columns store a single value in each row.
-
class omero.grid.FileColumn(name, description[, values])
-
class omero.grid.ImageColumn(name, description[, values])
-
class omero.grid.RoiColumn(name, description[, values])
-
class omero.grid.WellColumn(name, description[, values])
-
class omero.grid.PlateColumn(name, description[, values])
Id-based (long) columns which reference omero.model.File,
Image, Roi, Well and Plate
instances respectively.
-
class omero.grid.BoolColumn(name, description[, values])
A value column with bool (non-null) values.
-
class omero.grid.LongColumn(name, description[, values])
A value column with long (non-null, 64-bit) values.
-
class omero.grid.DoubleColumn(name, description[, values])
A value column with double (non-null, 64-bit) values.
Parameters: |
- name (string) – The name of the column, each column in a table must
have a unique name.
- description (string) – The column description, may be empty.
- values ([]) – A list of values (one value per row) used to initialize a
column (optional).
|
-
values
A class member holding the list of values stored in the column.
-
class omero.grid.StringColumn(name, description, size[, values])
A value column which holds strings
Parameters: |
- name (string) – The column name.
- description (string) – The column description.
- size (long) – The maximum string length that can be stored in this
column, >= 1
- values (string[]) – A list of strings (optional).
|
Array value columns
These columns store an array in each row.
-
class omero.grid.FloatArrayColumn(name, description, size[, values])
A value column with fixed-width arrays of float (32 bit) values.
-
class omero.grid.DoubleArrayColumn(name, description, size[, values])
A value column with fixed-width arrays of double (64 bit) values.
-
class omero.grid.LongArrayColumn(name, description, size[, values])
A value column with fixed-width arrays of long (64 bit) values.
Parameters: |
- name (string) – The column name.
- description (string) – The column description.
- size (long) – The width of the array, >= 1
- values ([][]) – A list of arrays, each of length size
(optional).
|
Warning
The OMERO.tables service currently does limited validation of string
and array lengths. When adding or modifying data it is essential that the
size parameter of a column matches that of the underlying table.
Warning
Array value columns should be considered experimental for now.
Main methods
-
class omero.grid.Data
Holds the data retrieved from a table, also used to update a table.
-
lastModification
The timestamp of the last update to the table.
-
rowNumbers
The row indices of the values retrieved from the table.
-
columns
A list of columns
-
class omero.grid.Table
The main interface to the Tables service.
Returns: | An empty list of columns describing the table. Fill in the
values of these columns to add a new row to the table. |
-
getNumberOfRows()
Returns: | The number of rows in the table. |
-
readCoordinates(rowNumbers)
Read a set of entire rows in the table.
Parameters: | rowNumbers (long[]) – A list of row indices to be retrieved from
the table. |
Returns: | The requested rows as a Data object. |
-
read(colNumbers, start, stop)
Read a subset of columns and consecutive rows from a table.
Parameters: |
- colNumber (long[]) – A list of column indices to be retrieved
from the table (may be non-consecutive).
- start (long) – The index of the first row to retrieve.
- stop (long) – The index of the last+1 row to retrieve (uses
similar semantics to range()).
|
Returns: | The requested columns and rows as a
Data object.
|
Note
start=0, stop=0 currently returns the first row instead
of empty as would be expected using the normal Python range
semantics. This may change in future.
-
slice(colNumbers, rowNumbers)
Read a subset of columns and rows (may be non-consecutive) from a
table.
Parameters: |
- colNumbers (long[]) – A list of column indices to be retrieved.
The results will be returned in the same order as these indices.
- rowNumbers (long[]) – A list of row indices to be retrieved.
The results will be returned in the same order as these indices.
|
Returns: | The requested columns and rows as a
Data object.
|
-
getWhereList(condition, variables, start, stop, step)
Run a query on a table, see Query language.
Parameters: |
- condition (string) – The query string
- variables – A mapping of strings and variable values to be
substituted into condition. This can often be left empty.
- start (long) – The index of the first row to consider.
- stop (long) – The index of the last+1 row to consider.
- step (long) – The stepping interval between the start and stop
rows to consider, using the same semantics as range(). Set
to 0 to disable stepping.
|
Returns: | A list of row indices matching the condition which can be
passed as the first parameter of readCoordinates() or
read().
|
Note
variables seems to add unnecessary complexity, should it
be removed?
-
initialize(columns)
Initialize a new table. Any column values are ignored, use
addData() to add these values.
Parameters: | columns (Column[]) – A list of columns whose names and types are
used to setup the table. |
-
addData(columns)
Append one or more full rows to the table.
Parameters: | columns (Column[]) – A list of columns, such as those returned by
getHeaders(), whose values are the rows to be added to the
table. |
-
update(data)
Modify one or more columns and/or rows in a table.
-
setMetadata(key, value)
Store additional properties associated with a Table.
Parameters: |
- key (string) – A key name.
- value (string/int/float/long) – The value of the property.
|
-
setAllMetadata(keyvalues)
Store multiple additional properties associated with a Table. See
setMetadata().
Parameters: | keyvalues (dict) – A dictionary of key-value pairs. |
-
getMetadata(key)
Get the value of a property.
Parameters: | key (string) – The property name. |
Returns: | A property. |
-
getAllMetadata()
Get all additional properties. See getMetadata().
Returns: | All key-value properties. |
You many find the Python and
Java annotated code samples helpful,
in addition to the examples and
documentation on the API.
These are only an introduction to using OMERO.tables and do not show its full
potential, see Going forward for some inspiration.
The implementation
Currently, each table is backed by a single HDF table. Since PyTables
(and HDF in the general case) do not support concurrent access, OMERO.tables
provides a global locking mechanism which permits multiple views of the same
data. Each OMERO.tables file (registered as an OriginalFile in the
database), is composed of a single HDF table with any number of certain
limited column types.
Query language
The query language mentioned above is currently the PyTables
condition syntax.
Columns are referenced by name. The following operators are supported:
- Logical operators: &, |, ~
- Comparison operators: <, <=, ==, !=, >=, >
- Unary arithmetic operators: -
- Binary arithmetic operators: +, -, *, /, **, %
and the following functions:
- where(bool, number1, number2): number — number1 if the bool
condition is true, number2 otherwise.
- {sin,cos,tan}(float|complex): float|complex — trigonometric
sine, cosine or tangent.
- {arcsin,arccos,arctan}(float|complex): float|complex —
trigonometric inverse sine, cosine or tangent.
- arctan2(float1, float2): float — trigonometric inverse tangent of
float1/float2.
- {sinh,cosh,tanh}(float|complex): float|complex — hyperbolic
sine, cosine or tangent.
- {arcsinh,arccosh,arctanh}(float|complex): float|complex —
hyperbolic inverse sine, cosine or tangent.
- {log,log10,log1p}(float|complex): float|complex — natural,
base-10 and log(1+x) logarithms.
- {exp,expm1}(float|complex): float|complex — exponential and
exponential minus one.
- sqrt(float|complex): float|complex — square root.
- {real,imag}(complex): float — real or imaginary part of complex.
- complex(float, float): complex — complex from real and imaginary
parts.
for example, if id is the name of a LongColumn
table.getWhereList(condition='(id>x)', variables={'x':omero.rtypes.rint(5)},
start=2, stop=10, step=3)
will extract a subset of rows (2, 5, 8) as indicated by start, stop and
step, substitute 5 in place of x in the condition, and evaluate
condition so as to return the indices of rows where column id is greater
than 5.
Going forward
The Tables API itself provides little more than a remotely accessible
store, think of it as a server for Excel-like spreadsheets. We are
currently looking into the facilities that can be built on top of it,
and are very open to suggestions. For example, the
IRoi interface
has been extended to filter ROIs by a given
measurement. This allows seeing only those results from a particular
analysis run. The following example shows how to set up such a
measurement and retrieve its results:
iroi.py
For an example of production code that parses out such measurements,
see populate_roi.py.
The IRoi interface has been integrated into OMERO.insight, allowing for
the visualization and export of OMERO.tables:
We are also looking into a NoSQL-style storage mechanism for OMERO, either as
an alternative back-end to OMERO.tables or as an additional key-value type
store. Any suggestions or ideas would be
very welcome.