FS configuration options
Background
Users import their image files to the OMERO.fs server. The contents of
these files are kept intact by the server and the import process
preserves the files’ path and name (at least within the rules of
omero.fs.repo.path_rules below), so that OMERO.fs can become
a trusted repository for the master copy of users’ data. While the
default server configuration from Configuration properties should typically suffice,
omero config set may be used to adjust settings related to file uploads.
These settings are explained below.
Repository location
Several properties determine where FS-imported files are stored:
- omero.data.dir - singleton property (i.e. once globally) which
points to the legacy repository location for OMERO. For OMERO to run on
multiple systems, the contents of this directory must be on a shared volume.
- omero.managed.dir - singleton property which points to the default
ManagedRepository. In an OMERO install in which there is only one Blitz
server, this will be the only repository. This need not be located under
omero.data.dir but is by default.
- omero.repo.dir (experimental) - value passed to all non-legacy, standalone
repositories. This is not actively used, but would allow hosting
repositories on multiple physical systems without the need for a shared
volume. For example, after running bin/omero admin start on the main
machine, it would be possible to launch nodes on various machines via
bin/omero node start fs-B, bin/omero node start fs-C, etc. Each of
these would pass a different omero.repo.dir value to its process.
Template path
When files are uploaded to the managed repository, a parent directory is
created to receive the upload. A multi-file image has all its files
stored in the same parent directory, though they may be in different
subdirectories of that parent to mirror the original directory
structure before upload. The omero.fs.repo.path setting
defines the creation of that parent directory. It is this value which
makes the ManagedRepository “managed”.
Path naming constraints
There is some flexibility in how this parent directory is named. The
constraints are:
- The path components (individual directories in the path) must be
separated by / characters, even on Windows systems.
- A path component separator may be written as // only if
followed by at least one more path component. In this case:
- The server ensures that the path components preceding the
// are owned by the root user.
- Any newly created path components following the // are
owned by the user who owns the images.
- If no // is present then all newly created path
components are owned by the user who owns the images.
- The path must be unique for each import. It is for this reason that
the %time% term expands to a time with millisecond
resolution.
- To avoid confusion with the expansion terms enumerated below, avoid
other uses of the % character in path components.
In the above, ownership of path components is in the context of OMERO
users accessing the OMERO managed repository through its API. It does
not relate to operating system users’ permissions for the underlying
filesystem.
Expansion terms
Special terms may be used within path components: these are replaced
with text that depends on the import.
For any directory in the template path
- %userId%
- expands to the user’s numerical ID
- %user%
- expands to the user’s name
- %institution%
- expands to the user’s institution name; this path component is wholly
omitted if the user has no institution set
- %institution:default%
- expands to the user’s institution name, or to the supplied “default”
if the user has no institution set; for instance,
%institution:State College of Florida, Manatee-Sarasota% is
permitted
- %groupId%
- expands to the OMERO group’s numerical ID
- %group%
- expands to the OMERO group’s name
- %perms%
- expands to the group’s six-character permissions string, for example
rw---- for a private group
- %year%
- expands to the current year number, for example 2014
- %month%
- expands to the current month number, zero-padded, for example
08
- %monthname%
- expands to the current month name, for example August
- %day%
- expands to the current day number in the month, zero-padded, for
example 04
- %sessionId%
- expands to the session’s numerical ID
- %session%
- expands to the session key (UUID) of the session, for example
6c2dae43-cfad-48ce-af6f-025569f9e6df
For user-owned directories only
These expansion terms may not precede // in the template
path.
- %time%
- expands to the current time, in hours, minutes, seconds, milliseconds,
for example 13-49-07.727
- %hash%
- expands to an eight-digit hexadecimal hash code that is constant for
the set of files being imported, for example 0554E3A1
- %hash:digits%
- expands as %hash%, where digits is a
comma-separated list of how many digits of the hash to use in
different subdirectories; for example, hash-%hash:3,3,2%
expands to a form like hash-123/456/78
- %increment%
- expands to an integer that increases consecutively so as to create the
next new directory, for example using inc-%increment% with
preexisting directories up to inc-24 would expand to
inc-25
- %increment:digits%
- expands as %increment% where digits specifies a
minimum length to which to zero-pad the integer, for example using
inc-%increment:3% with preexisting directories up to
inc-024 would expand to inc-025
- %subdirs%
- expands to nothing until the preceding directory has more than one
thousand entries, in which case it expands to an integer that
increases consecutively to similarly limit the entry count in
subdirectories; applies recursively to extend the number of path
components as needed, so, using example/below-%subdirs% in
the path, with example/below-000 to
example/below-999 all “full”, three-digit subdirectories
below those are created, such as example/below-123/456
- %subdirs:digits%
- expands as %subdirs% where digits specifies to
how many digits %subdirs% may expand for each path
component: for example, example/%subdirs:4%-below allows
ten thousand directory entries in example before creating
example/1234-below and, much later,
example/1234-below/5678
No more than one of either %subdirs% or
%increment% may be used in any one path component, although
they may each be used many times in the whole path.
Legal file names
Although OMERO.fs attempts to preserve file naming, the server’s
operating system or file system is likely to somehow constrain what
file names may be stored by OMERO.fs. This is of particular concern
when a user may upload from a more permissive system to a server on a
less permissive system, or when it is anticipated that the server
itself may be migrated to a less permissive system. The server never
accepts Unicode control characters in file names.
The omero.fs.repo.path_rules setting defines the combination
of restrictions that the server must apply in accepting file uploads.
The restrictions are grouped into named sets:
- Windows required
- prohibits names with the characters ",
*, /, :, <,
>, ?, \, |,
names beginning with $, the names AUX,
CLOCK$, CON, NUL,
PRN, COM1 to COM9,
LPT1 to LPT9, and anything beginning
with one of those names followed by .
- Windows optional
- prohibits names ending with . or a space
- UNIX required
- prohibits names with the character /
- UNIX optional
- prohibits names beginning with . or -
These rules are applied to each separate path component of the file
name on the client’s system. So, for instance, an upload of a file
/tmp/myfile.tif from a Linux system would satisfy the
UNIX required restrictions because neither of the path
components tmp and myfile.tif contains a
/ character.
Applying the “optional” restrictions does not assist OMERO.fs at all;
those restrictions are designed to ease manual maintenance of the
directory specified by the omero.managed.dir setting, being
where the server stores users’ uploaded files.
Checksum algorithm
As the client uploads each file to the server, it calculates a
checksum for the file. After the upload is complete the client reports
that checksum to the server. The server then calculates the checksum
for the corresponding file from its local filesystem and checks that
it matches what the client reported. File integrity is thus
assured because corruption during transmission or writing would be
revealed by a checksum mismatch.
There are various algorithms by which checksums may be calculated. The list of
available algorithms is given by omero.checksum.supported. To
calculate comparable checksums the client and server use the same
algorithm. The server API permits clients to specify the algorithm,
but it is expected that they will typically accept the server default.
The number that suffixes each of the checksum algorithm names
specifies the bit width of the resulting checksum. A larger bit width
makes it less likely that different files will have the same checksum
by coincidence, but lengthens the checksum hex strings that are
reported to the user and stored in the hash column of the
originalfile table in the database.