OMERO.tables¶
The OMERO.tables API unifies the storage of columnar data from various sources, such as automated analysis results or script-based processing, and makes them available within OMERO.
Large and small volumes of tabular data can be stored via named columns, and retrieved in bulk or via paging. A limited query language provides basic filtering and selecting.
Since 5.6, PyTables are installed when installing
omero-py
.
The interface¶
The slice definition file for the OMERO.tables API primarily defines two service interfaces and a type hierarchy.
- class
omero.grid.Table
The central service for dealing with tabular data, described below.
-
class
omero.grid.
Tables
¶ An internal service used for managing table services, and can be ignored for almost all purposes.
-
class
omero.grid.
Column
¶ The base class for column types which permit returning arrays of columnar values (Ice doesn’t provide an
Any
type, so it is necessary to group values of the same type). All columns in a table must have the same number of rows.
Note
Attribute names (including column names) beginning with __ (double underscore) are reserved for internal use. This restriction was introduced in OMERO 5.1, Tables created by older versions should continue to work.
Single value columns¶
These columns store a single value in each row.
-
class
omero.grid.
FileColumn
(name, description[, values])¶ -
class
omero.grid.
ImageColumn
(name, description[, values])¶ -
class
omero.grid.
RoiColumn
(name, description[, values])¶ -
class
omero.grid.
WellColumn
(name, description[, values])¶ -
class
omero.grid.
PlateColumn
(name, description[, values])¶ Id-based (long) columns which reference
omero.model.File
,Image
,Roi
,Well
andPlate
instances respectively.
-
class
omero.grid.
BoolColumn
(name, description[, values])¶ A value column with bool (non-null) values.
-
class
omero.grid.
LongColumn
(name, description[, values])¶ A value column with long (non-null, 64-bit) values.
-
class
omero.grid.
DoubleColumn
(name, description[, values])¶ A value column with double (non-null, 64-bit) values.
- Parameters
name (string) – The name of the column, each column in a table must have a unique name.
description (string) – The column description, may be empty.
values ([]) – A list of values (one value per row) used to initialize a column (optional).
-
values
¶ A class member holding the list of values stored in the column.
-
class
omero.grid.
StringColumn
(name, description, size[, values])¶ A value column which holds strings
- Parameters
name (string) – The column name.
description (string) – The column description.
size (long) – The maximum string length that can be stored in this column, >= 1
values (string[]) – A list of strings (optional).
Array value columns¶
These columns store an array in each row.
-
class
omero.grid.
FloatArrayColumn
(name, description, size[, values])¶ A value column with fixed-width arrays of float (32 bit) values.
-
class
omero.grid.
DoubleArrayColumn
(name, description, size[, values])¶ A value column with fixed-width arrays of double (64 bit) values.
-
class
omero.grid.
LongArrayColumn
(name, description, size[, values])¶ A value column with fixed-width arrays of long (64 bit) values.
- Parameters
name (string) – The column name.
description (string) – The column description.
size (long) – The width of the array, >= 1
values ([][]) – A list of arrays, each of length
size
(optional).
Warning
The OMERO.tables service currently does limited validation of string
and array lengths. When adding or modifying data it is essential that the
size
parameter of a column matches that of the underlying table.
Warning
Array value columns should be considered experimental for now.
Main methods¶
-
class
omero.grid.
Data
¶ Holds the data retrieved from a table, also used to update a table.
-
lastModification
¶ The timestamp of the last update to the table.
-
rowNumbers
¶ The row indices of the values retrieved from the table.
-
columns
¶ A list of columns
-
-
class
omero.grid.
Table
¶ The main interface to the Tables service.
-
getHeaders
()¶ - Returns
An empty list of columns describing the table. Fill in the
values
of these columns to add a new row to the table.
-
getNumberOfRows
()¶ - Returns
The number of rows in the table.
-
readCoordinates
(rowNumbers)¶ Read a set of entire rows in the table.
- Parameters
rowNumbers (long[]) – A list of row indices to be retrieved from the table.
- Returns
The requested rows as a
Data
object.
-
read
(colNumbers, start, stop)¶ Read a subset of columns and consecutive rows from a table.
- Parameters
colNumber (long[]) – A list of column indices to be retrieved from the table (may be non-consecutive).
start (long) – The index of the first row to retrieve.
stop (long) – The index of the last+1 row to retrieve (uses similar semantics to
range()
).
- Returns
The requested columns and rows as a
Data
object.
Note
start=0, stop=0 currently returns the first row instead of empty as would be expected using the normal Python range semantics. This may change in future.
-
slice
(colNumbers, rowNumbers)¶ Read a subset of columns and rows (may be non-consecutive) from a table.
- Parameters
colNumbers (long[]) – A list of column indices to be retrieved. The results will be returned in the same order as these indices.
rowNumbers (long[]) – A list of row indices to be retrieved. The results will be returned in the same order as these indices.
- Returns
The requested columns and rows as a
Data
object.
-
getWhereList
(condition, variables, start, stop, step)¶ Run a query on a table, see Query language.
- Parameters
condition (string) – The query string
variables – A mapping of strings and variable values to be substituted into condition. This can often be left empty.
start (long) – The index of the first row to consider.
stop (long) – The index of the last+1 row to consider.
step (long) – The stepping interval between the start and stop rows to consider, using the same semantics as
range()
. Set to 0 to disable stepping.
- Returns
A list of row indices matching the condition which can be passed as the first parameter of
readCoordinates()
orread()
.
Note
variables seems to add unnecessary complexity, should it be removed?
-
initialize
(columns)¶ Initialize a new table. Any column values are ignored, use
addData()
to add these values.- Parameters
columns (Column[]) – A list of columns whose names and types are used to setup the table.
-
addData
(columns)¶ Append one or more full rows to the table.
- Parameters
columns (Column[]) – A list of columns, such as those returned by
getHeaders()
, whose values are the rows to be added to the table.
-
update
(data)¶ Modify one or more columns and/or rows in a table.
- Parameters
data (Data) – A
Data
object previously obtained usingread()
orreadCoordinates()
with column values to be updated.
-
setMetadata
(key, value)¶ Store additional properties associated with a Table.
- Parameters
key (string) – A key name.
value (string/int/float/long) – The value of the property.
-
setAllMetadata
(keyvalues)¶ Store multiple additional properties associated with a Table. See
setMetadata()
.- Parameters
keyvalues (dict) – A dictionary of key-value pairs.
-
getMetadata
(key)¶ Get the value of a property.
- Parameters
key (string) – The property name.
- Returns
A property.
-
getAllMetadata
()¶ Get all additional properties. See
getMetadata()
.- Returns
All key-value properties.
-
You many find the Python and Java annotated code samples helpful, in addition to the examples and documentation on the API. These are only an introduction to using OMERO.tables and do not show its full potential, see Going forward for some inspiration.
Examples¶
Hello World: examples/OmeroTables/first.py
Creating a Measurement Table: examples/OmeroTables/MeasurementTable.java
Querying a Table: examples/OmeroTables/FindMeasurements.java
The implementation¶
Currently, each table is backed by a single HDF table. Since PyTables
(and HDF in the general case) do not support concurrent access, OMERO.tables
provides a global locking mechanism which permits multiple views of the same
data. Each OMERO.tables file (registered as an OriginalFile
in the
database), is composed of a single HDF table with any number of certain
limited column types.
Query language¶
The query language mentioned above is currently the PyTables condition syntax. Columns are referenced by name. The following operators are supported:
Logical operators:
&, |, ~
Comparison operators:
<, <=, ==, !=, >=, >
Unary arithmetic operators:
-
Binary arithmetic operators:
+, -, *, /, **, %
and the following functions:
where(bool, number1, number2)
: number — number1 if the bool condition is true, number2 otherwise.{sin,cos,tan}(float|complex)
: float|complex — trigonometric sine, cosine or tangent.{arcsin,arccos,arctan}(float|complex)
: float|complex — trigonometric inverse sine, cosine or tangent.arctan2(float1, float2)
: float — trigonometric inverse tangent of float1/float2.{sinh,cosh,tanh}(float|complex)
: float|complex — hyperbolic sine, cosine or tangent.{arcsinh,arccosh,arctanh}(float|complex)
: float|complex — hyperbolic inverse sine, cosine or tangent.{log,log10,log1p}(float|complex)
: float|complex — natural, base-10 and log(1+x) logarithms.{exp,expm1}(float|complex)
: float|complex — exponential and exponential minus one.sqrt(float|complex)
: float|complex — square root.{real,imag}(complex)
: float — real or imaginary part of complex.complex(float, float)
: complex — complex from real and imaginary parts.
for example, if id is the name of a LongColumn
table.getWhereList(condition='(id>x)', variables={'x':omero.rtypes.rint(5)},
start=2, stop=10, step=3)
will extract a subset of rows (2, 5, 8) as indicated by start, stop and step, substitute 5 in place of x in the condition, and evaluate condition so as to return the indices of rows where column id is greater than 5.
Going forward¶
The Tables API itself provides little more than a remotely accessible store, think of it as a server for Excel-like spreadsheets. We are currently looking into the facilities that can be built on top of it, and are very open to suggestions. For example, the IRoi interface has been extended to filter ROIs by a given measurement. This allows seeing only those results from a particular analysis run. The following example shows how to set up such a measurement and retrieve its results:
For an example of production code that parses out such measurements, see populate_roi.py.
The IRoi interface has been integrated into OMERO.insight, allowing for the visualization and export of OMERO.tables:
We are also looking into a NoSQL-style storage mechanism for OMERO, either as an alternative back-end to OMERO.tables or as an additional key-value type store. Any suggestions or ideas would be very welcome.
See also
- PyTables
Software on which OMERO.tables is built.
- Condition Syntax
The PyTables condition syntax.
- slice definition file
The API definition for OMERO.tables
- The Tables test suite
The testsuite for OMERO.tables