Beginning with 3.0-Beta3, the OMERO server will use Lucene to index all string and timestamp information in the database, as well as all OriginalFiles which can be parsed to simple text (see File parsers for more information). The index is stored under /OMERO/FullText (or the FullText subdirectory of your ${omero.data.dir}, and can be searched with Google-like queries.
Each row in the database becomes a single Lucene Document parsed into the several Fields. A field is referenced by prefixing a search term with the field name followed by a colon. For example, name:myImage searches for myImage anywhere in the name field.
Field | Comments |
---|---|
Any unprefixed field searches the combination of all fields together i.e. a search for cell AND name:myImage gets translated to combined_fields:cell AND name:myImage. | |
<field name> | Each string, timestamp, or Details field of the entity also gets its own Field entry, like the name field above |
details.owner.omeName | Login name of the owner of the object |
details.owner.firstName | First name of the owner of the object |
details.owner.lastName | Last name of the owner of the object |
details.group.name | Group name of the owning group of the object |
details.creationEvent.id | Id of the Event of this objects creation |
details.creationEvent.time | When that Event took place |
details.updateEvent.id | Id of the Event of this objects last modification |
details.updateEvent.time | When that Event took place |
details.permissions | Permissions in the form rwrwrw or rw- |
tag | Contents from a TagAnnotation. |
annotation | Contents from any annotations, including TagAnnotation and any TextAnnotation on another TextAnnotation (a.k.a. a description) |
annotation.ns | Namespace (if present) for any annotations on an object |
annotation.type | Short type name, e.g. TextAnnotation or FileAnnotation for any annotations on an object |
file.name | For FileAnnotations and objects they are attached to, the name of the OriginalFile |
file.format | For FileAnnotations and objects they are attached to, the format of the OriginalFile |
file.path | For FileAnnotations and objects they are attached to, the path of the OriginalFile |
file.sha1 | For FileAnnotations and objects they are attached to, the sha1 of the OriginalFile |
file.contents | For FileAnnotations and objects they are attached to as well as the OriginalFile itself, the file contents themselves if their Format is configured with the File parsers. |
Internal, | |
combined_fields | The default field prefix. |
_hibernate_class | Used by Hibernate Search to record the entity type. The class value, e.g. ome.model.core.Image is also entered in combined_fields. Unimportant for the casual users. |
id | The primary key of the entity. Unimportant for the casual user |
Search queries are very similar to Google searches. When search terms are entered without a prefix (“name:”), then the default field will be used which combines all available fields. Otherwise, a prefix can be added to restrict the search.
Successful searching depends on understanding how the text is indexed. The default analyzer used is the FullTextAnalyzer.
1. Desktop/image_GFP-H2B_1.dv ---> "desktop", "image", "gfp", "h2b", "1", "dv"
2. Desktop/image_GFP-H2B_2.dv ---> "desktop", "image", "gfp", "h2b", "2", "dv
3. Desktop/image_GFP_01-H2B.dv ---> "desktop", "image", "gfp", "01", "h2b", "dv"
4. Desktop/image_GFP-CSFV_a.dv ---> "desktop", "image", "gfp", "csfv", "a", "dv"
Assuming these entries above for Image.name:
Indexing is not driven by the user, but happens automatically in the background. Automatic indexing occurs at the frequency defined in etc/omero.properties:
omero.search.cron=0,30 * * * * ?
omero.search.batch=100
which implies every thirty seconds of every hour, day, month, year, etc. During each iteration, 100 EventLogs will be loaded from the database and processed. Upon successful completion, the persistent count in the configuration table, will be incremented.
omero3=# select value from configuration where name = 'PersistentEventLogLoader.current_id';
value
-------
30983
(1 row)
If you have more than one PersistentEventLogLoader.* value in your database, then you have run indexing with multiple versions of the server. This is fine. To allow a new server version to force an update, the configuration key may be changed. For example,
PersistentEventLogLoader.currend_id
became
PersistentEventLogLoader.v2.current_id
in r2460.
Once an entity is indexed, it is possible to start writing querying against the server via IQuery.findAllByFullText(). Use new Parameters(new Filter().owner()) and .group() to restrict your search. Or alternatively use the oma.api.Search interface (below).
There are a few reasons that you may need to re-index your database, e.g. if the index has become corrupt or you would like to have large files, that were previously skipped, added to the index (see omero.search.max_file_size). Under most circumstances, you should be able to re-index the database while the server is still running.
If you need to make any adjustments to the server configuration or the process heap size, first shut the server down and make these changes before restarting the server. Then, with the server running, using the following steps to initiate a re-indexing
Disable the search indexer process and stop any currently running indexer processes:
> bin/omero admin ice server disable Indexer-0
> bin/omero admin ice server stop Indexer-0
Remove the existing search Indexes by deleting the contents of the FullText subdirectory of your ${omero.data.dir}
Reset the indexer’s progress counter in the database
> psql -U <omero-db-user> <omero-db-name> -c "update configuration set value = 0 where name like 'PersistentEventLogLoader%';"
substituting in your local omero database’s user and name
Re-enable/restart the indexer process (the Ice grid will handle automatically restarting the process as soon as it is re-enabled)
> bin/omero admin ice server enable Indexer-0
Depending on the size of your database, it may take the indexer some time to finish re-indexing. During this time, your OMERO server will remain available for use, however the search functionality will be degraded until the re-indexing is finished.
It is also possible to re-index the database with the server off-line. First, shutdown the OMERO server as normal and make any adjustments to the configuration that need to be made. Clear the contents of the FullText directory, then run
> bin/omero admin reindex --full
Re-indexing the database in off-line mode will use a 1 GB heap by default (as opposed to the default 256MB heap for the indexer process in the running server). You can further adjust the size of the heap by passing an alternate value in the JAVA_OPTS variable on the command line
> JAVA_OPTS="-Xmx2056MB" bin/omero admin reindex --full
You may also want to increase the omero.search.batch size to take advantage of the larger heap. The combination of a larger heap and batch size should enable the re-index to complete sooner in off-line mode than it might in the context of a running server.
Alternatively, you can re-index a specific class of objects in off-line mode, followed by a later re-index with the server running. Start by shutting down the server and clearing the contents of the FullText directory. Then reindex a specific class of object with
> bin/omero admin reindex --class ome.model.core.Image
Multiple classes can be re-indexed together by appending extra --class ... arguments on the command-line. Once this limited re-indexing is completed, you can restart the server and search capabilities will be available in a limited fashion. If you would then like to re-index the remaining objects in the system, follow the steps for the on-line reindexing above, skipping the step that involves clearing the FullText directory.
The current IQuery implementation restricts searches to a single class at a time.
The Search API offers a number of different queries along with various filters and settings which are all maintained on the server.
The matrix below show which combinations of parameters and queries are supported (S), will throw an exception (X), and which will simply silently be ignored (I).
Query Method –> | byFullText/SomeMustNone | byGroupForTags/byTagsForGroup | byAnnotatedWith |
Parameters | |||
annotated between | S | S | S |
annotated by | S | S | S |
annotated with | S | I | I |
created between | S | S | S |
modified between | S | I (Immutable) | S |
owned by | S | S | S |
all types | X | I | X |
1 type | S | I | S |
N types | X | I | X |
only ids | S | I | S |
Ordering / Fetches | |||
orderBy | S | I | S |
fetchAnnotations | [1] | I | [2] |
Other | |||
setProjections [3] | X | X | X |
current*Metdata [4] | X | X | X |
setProjections [3] | X | X | X |
Footnotes
[1] | Any fetchAnnotation() argument to byFullText() or related queries, returns all annotations. |
[2] | byAnnotatedWith() does not accept a fetchAnnotation() argument of Annotation.class. |
[3] | (1, 2) setProjects may need to be removed if Lucene cannot handle OMERO’s security requirements. |
[4] | Not yet implemented. |
Leading wildcard searches are disallowed by default. ”?omething” or “*hatever”, for example, would both throw exceptions. They can be run by using:
Search search = serviceFactory.createSearchService();
search.setAllowLeadingWildcards(true);
There is a performance penalty, however. In addition, wildcard searches get expanded on the server to boolean queries. For example, assuming “ACELL”, “BCELL”, and “CCELL” are all terms in your index, then the query:
*CELL
gets expanded to:
ACELL OR BCELL OR CCELL
If there are more than “omero.search.maxclause” terms in the expansion (default is 4096), then an exception will be thrown. This requires the user to enter a more refined search, but not because there are too many results, only because there is not enough room in memory to search on all terms at once.
Two extension points are currently available for searching. The first are the File parsers mentioned above. By configuring the map of Formats (roughly mime-types) of files to parser instances, extracting information from attached binary files can be made quick and straightforward.
Similarly, Search bridges provide a mechanism for parsing all metadata entering the system. One built in bridge (the FullTextBridge) parses out the fields mentioned above, but by creating your own bridge it is possible to extract more information specific to your site.
See also
Structured annotations, Search bridges, File parsers, Query Parser Syntax,