FS configuration options

Background

Users import their image files to the OMERO.fs server. The contents of these files are kept intact by the server and the import process preserves the files’ path and name (at least within the rules of omero.fs.repo.path_rules below), so that OMERO.fs can become a trusted repository for the master copy of users’ data. While the default server configuration from Configuration properties glossary should typically suffice, omero config set may be used to adjust settings related to file uploads. These settings are explained below.

Repository location

Several properties determine where FS-imported files are stored:

  • omero.data.dir - singleton property (i.e. once globally) which points to the legacy repository location for OMERO. For OMERO to run on multiple systems, the contents of this directory must be on a shared volume.
  • omero.managed.dir - singleton property which points to the default ManagedRepository. In an OMERO install in which there is only one Blitz server, this will be the only repository. This need not be located under omero.data.dir but is by default.
  • omero.repo.dir (experimental) - value passed to all non-legacy, standalone repositories. This is not actively used, but would allow hosting repositories on multiple physical systems without the need for a shared volume. For example, after running bin/omero admin start on the main machine, it would be possible to launch nodes on various machines via bin/omero node start fs-B, bin/omero node start fs-C, etc. Each of these would pass a different omero.repo.dir value to its process.

Template path

When files are uploaded to the managed repository, a parent directory is created to receive the upload. A multi-file image has all its files stored in the same parent directory, though they may be in different subdirectories of that parent to mirror the original directory structure before upload. The omero.fs.repo.path setting defines the creation of that parent directory. It is this value which makes the ManagedRepository “managed”.

Path naming constraints

There is some flexibility in how this parent directory is named. The constraints are:

  • The path components (individual directories in the path) must be separated by / characters.
  • A path component separator may be written as // only if followed by at least one more path component. In this case:
    • The server ensures that the path components preceding the // are owned by the root user.
    • Any newly created path components following the // are owned by the user who owns the images.
  • If no // is present then all newly created path components are owned by the user who owns the images.
  • The path must be unique for each import. It is for this reason that the %time% term expands to a time with millisecond resolution.
  • To avoid confusion with the expansion terms enumerated below, avoid other uses of the % character in path components.

In the above, ownership of path components is in the context of OMERO users accessing the OMERO managed repository through its API. It does not relate to operating system users’ permissions for the underlying filesystem.

Expansion terms

Special terms may be used within path components: these are replaced with text that depends on the import.

For any directory in the template path

%userId%
expands to the user’s numerical ID
%user%
expands to the user’s name
%institution%
expands to the user’s institution name; this path component is wholly omitted if the user has no institution set
%institution:default%
expands to the user’s institution name, or to the supplied “default” if the user has no institution set; for instance, %institution:State College of Florida, Manatee-Sarasota% is permitted
%groupId%
expands to the OMERO group’s numerical ID
%group%
expands to the OMERO group’s name
%perms%
expands to the group’s six-character permissions string, for example rw---- for a private group
%year%
expands to the current year number, for example 2014
%month%
expands to the current month number, zero-padded, for example 08
%monthname%
expands to the current month name, for example August
%day%
expands to the current day number in the month, zero-padded, for example 04
%sessionId%
expands to the session’s numerical ID
%session%
expands to the session key (UUID) of the session, for example 6c2dae43-cfad-48ce-af6f-025569f9e6df

For user-owned directories only

These expansion terms may not precede // in the template path.

%time%
expands to the current time, in hours, minutes, seconds, milliseconds, for example 13-49-07.727
%hash%
expands to an eight-digit hexadecimal hash code that is constant for the set of files being imported, for example 0554E3A1
%hash:digits%
expands as %hash%, where digits is a comma-separated list of how many digits of the hash to use in different subdirectories; for example, hash-%hash:3,3,2% expands to a form like hash-123/456/78
%increment%
expands to an integer that increases consecutively so as to create the next new directory, for example using inc-%increment% with preexisting directories up to inc-24 would expand to inc-25
%increment:digits%
expands as %increment% where digits specifies a minimum length to which to zero-pad the integer, for example using inc-%increment:3% with preexisting directories up to inc-024 would expand to inc-025
%subdirs%
expands to nothing until the preceding directory has more than one thousand entries, in which case it expands to an integer that increases consecutively to similarly limit the entry count in subdirectories; applies recursively to extend the number of path components as needed, so, using example/below-%subdirs% in the path, with example/below-000 to example/below-999 all “full”, three-digit subdirectories below those are created, such as example/below-123/456
%subdirs:digits%
expands as %subdirs% where digits specifies to how many digits %subdirs% may expand for each path component: for example, example/%subdirs:4%-below allows ten thousand directory entries in example before creating example/1234-below and, much later, example/1234-below/5678

No more than one of either %subdirs% or %increment% may be used in any one path component, although they may each be used many times in the whole path.

Checksum algorithm

As the client uploads each file to the server, it calculates a checksum for the file. After the upload is complete the client reports that checksum to the server. The server then calculates the checksum for the corresponding file from its local filesystem and checks that it matches what the client reported. File integrity is thus assured because corruption during transmission or writing would be revealed by a checksum mismatch.

There are various algorithms by which checksums may be calculated. The list of available algorithms is given by omero.checksum.supported. To calculate comparable checksums the client and server use the same algorithm. The server API permits clients to specify the algorithm, but it is expected that they will typically accept the server default.

The number that suffixes each of the checksum algorithm names specifies the bit width of the resulting checksum. A larger bit width makes it less likely that different files will have the same checksum by coincidence, but lengthens the checksum hex strings that are reported to the user and stored in the hash column of the originalfile table in the database.