Bulk imports

The CLI import option --bulk specifies a configuration file that can be used to perform a batch of imports with the same or similar options. The file is written in a simple YAML syntax and can be named whatever you would like. It does not need to be placed in the folder from which the OMERO commands are run.

A minimal YAML file might look like:

---
path: "my-files.txt"

Assuming that my-files.txt is a list of files such as

fileA
fileB
directoryC

this is equivalent to:

$ omero import -k --transfer=ln_s fileA fileB directoryC

where the files fileA and fileB and all the files of directoryC will be imported.

Bulk-only options

Path

The path key specifies a file from which each individual line will be processed as a separate import. In the simplest case, a single file is placed per line as above. For more complex usages, path can point to a tab-separated value (TSV) or a comma-separate value (CSV) file where each field will be interpreted based on columns.

Columns

A fairly regular requirement in importing many files is that for each file a similar but slightly different configuration is needed. This can be accomplished with the columns key. It specifies how each of the separated fields of the path file should be interpreted.

For example, a bulk.yml file specifying:

---
path: "files.tsv"
columns:
- name
- path

along with a files.tsv of the form:

import-1    fileA
import-2    fileB

would match the two calls:

$ omero import --name import-1 fileA
$ omero import --name import-2 fileB

but in a single call. The same could be achieved with this CSV file:

import-1,fileA
import-2,fileB

Other options like target can also be added as a separate field:

Dataset:name:training-set   import-1        fileA
Dataset:name:training-set   import-2        fileB
Dataset:name:test-set-001   import-3        fileC

by defining columns in your bulk.yml as:

columns:
- target
- name
- path

which will create the named datasets if they do not exist. See Import targets for more information on import targets and see below for more examples of options you can use.

Include

The include key specifies another bulk YAML file that should be included in the current processing. For example, if there is a global configuration file omero-imports.yml that all users should use, such as:

---
checksum_algorithm: "File-Size-64"
exclude: "clientpath"
transfer: "ln_s"

then users can make use of this configuration by adding the following line to their bulk.yml file:

include: /etc/omero-imports.yml

Dry-run

The dry-run key can either be set to true in which case no import will occur, and only the potential actions will be shown, or additionally it can be set to a file path of the form my_import_%s.sh where %s will be replaced by an number and a file with the given name will be written out. Each of these scripts can then be used independently.

Other options

Otherwise, all the regular options from the CLI are available for configuration via --bulk:

  • checksum_algorithm for faster processing of large files
  • continue for processing all files even if one errors
  • exclude for skipping files that have already been imported
  • parallel_fileset for concurrent imports
  • parallel_upload for concurrent uploads
  • target for placing imported images into specific containers
  • transfer for alternative methods of shipping files to the server

See Import images for more information.