Writing a new file format reader

This document is a brief guide to writing new Bio-Formats file format readers.

All format readers should extend either loci.formats.FormatReader or an existing reader.

Methods to override

  • isSingleFile(java.lang.String) Whether or not the named file is expected to be the only file in the dataset. This only needs to be overridden for formats whose datasets can contain more than one file.

  • isThisType(loci.common.RandomAccessInputStream) Check the first few bytes of a file to determine if the file can be read by this reader. You can assume that index 0 in the stream corresponds to the index 0 in the file. Return true if the file can be read; false if not (or if there is no way of checking).

  • fileGroupOption(java.lang.String) Returns an indication of whether or not the files in a multi-file dataset can be handled individually. The return value should be one of the following:

    This method only needs to be overridden for formats whose datasets can contain more than one file.

  • getSeriesUsedFiles(boolean) You only need to override this if your format uses multiple files in a single dataset. This method should return a list of all files associated with the given file name and the current series (i.e. every file needed to display the current series). If the noPixels flag is set, then none of the files returned should contain pixel data. For an example of how this works, see loci.formats.in.PerkinElmerReader. It is recommended that the first line of this method be FormatTools.assertId(currentId, true, 1) - this ensures that the file name is non-null.

  • openBytes(int, byte[], int, int, int, int) Returns a byte array containing the pixel data for a specified subimage from the given file. The dimensions of the subimage (upper left X coordinate, upper left Y coordinate, width, and height) are specified in the final four int parameters. This should throw a FormatException if the image number is invalid (less than 0 or >= the number of images). The ordering of the array returned by openBytes should correspond to the values returned by isLittleEndian and isInterleaved. Also, the length of the byte array should be [image width * image height * bytes per pixel]. Extra bytes will generally be truncated. It is recommended that the first line of this method be FormatTools.checkPlaneParameters(this, no, buf.length, x, y, w, h) - this ensures that all of the parameters are valid.

  • initFile(java.lang.String) The majority of the file parsing logic should be placed in this method. The idea is to call this method once (and only once!) when the file is first opened. Generally, you will want to start by calling super.initFile(String). You will also need to set up the stream for reading the file, as well as initializing any dimension information and metadata. Most of this logic is up to you; however, you should populate the core variable (see loci.formats.CoreMetadata).

    Note that each variable is initialized to 0 or null when super.initFile(String) is called. Also, super.initFile(String) constructs a Hashtable called metadata where you should store any relevant metadata.

    The most common way to set up the OME-XML metadata for the reader is to initialize the MetadataStore using the makeFilterMetadata() method and populate the Pixels elements of the metadata store from the core variable using the MetadataTools.populatePixels(MetadataStore, FormatReader) method:

    # Initialize the OME-XML metadata from the core variable
    MetadataStore store = makeFilterMetadata();
    MetadataTools.populatePixels(store, this);
    

    If the reader includes metadata at the plane level, you can initialize the Plane elements under the Pixels using MetadataTools.populatePixels(MetadataStore, FormatReader, doPlane):

    MetadataTools.populatePixels(store, this, true);
    

    Once the metadatastore has been initialized with the core properties, additional metadata can be added to it using the setter methods. Note that for each of the model components, the setObjectID() method should be called before any of the setObjectProperty() methods, e.g.:

    # Add an oil immersion objective with achromat
    String objectiveID = MetadataTools.createLSID("Objective", 0, 0);
    store.setObjectiveID(objectiveID, 0, 0);
    store.setObjectiveImmersion(getImmersion("Oil"), 0, 0);
    
  • close(boolean) Cleans up any resources used by the reader. Global variables should be reset to their initial state, and any open files or delegate readers should be closed.

Note that if the new format is a variant of a format currently supported by Bio-Formats, it is more efficient to make the new reader a subclass of the existing reader (rather than subclassing loci.formats.FormatReader). In this case, it is usually sufficient to override initFile(java.lang.String) and isThisType(byte[]).

Every reader also has an instance of loci.formats.CoreMetadata. All readers should populate the fields in CoreMetadata, which are essential to reading image planes.

If you read from a file using something other than loci.common.RandomAccessInputStream or loci.common.Location, you must use the file name returned by Location.getMappedId(String), not the file name passed to the reader. Thus, a stub for initFile(String) might look like this:

protected void initFile(String id) throws FormatException, IOException {
  super.initFile(id);

  RandomAccessInputStream in = new RandomAccessInputStream(id);
  // alternatively,
  //FileInputStream in = new FileInputStream(Location.getMappedId(id));

  // read basic file structure and metadata from stream
}

For more details, see loci.common.Location.mapId(java.lang.String, java.lang.String) and loci.common.Location.getMappedId(java.lang.String).

Variables to populate

There are a number of global variables defined in loci.formats.FormatReader that should be populated in the constructor of any implemented reader.

These variables are:

  • suffixNecessary Indicates whether or not a file name suffix is required; true by default
  • suffixSufficient Indicates whether or not a specific file name suffix guarantees that this reader can open a particular file; true by default
  • hasCompanionFiles Indicates whether or not there is at least one file in a dataset of this format that contains only metadata (no images); false by default
  • datasetDescription A brief description of the layout of files in datasets of this format; only necessary for multi-file datasets
  • domains An array of imaging domains for which this format is used. Domains are defined in loci.formats.FormatTools.

Other useful things

  • loci.common.RandomAccessInputStream is a hybrid RandomAccessFile/InputStream class that is generally more efficient than either RandomAccessFile or InputStream, and implements the DataInput interface. It is recommended that you use this for reading files.
  • loci.common.Location provides an API similar to java.io.File, and supports File-like operations on URLs. It is highly recommended that you use this instead of File. See the Javadocs for additional information.
  • loci.common.DataTools provides a number of methods for converting bytes to shorts, ints, longs, etc. It also supports reading most primitive types directly from a RandomAccessInputStream (or other DataInput implementation).
  • loci.formats.ImageTools provides several methods for manipulating primitive type arrays that represent images. Consult the source or Javadocs for more information.
  • If your reader relies on third-party code which may not be available to all users, it is strongly suggested that you make a corresponding service class that interfaces with the third-party code. Please see Bio-Formats service and dependency infrastructure for a description of the service infrastructure, as well as the loci.formats.services package.
  • Several common image compression types are supported through subclasses of loci.formats.codec.BaseCodec. These include JPEG, LZW, LZO, Base64, ZIP and RLE (PackBits).
  • If you wish to convert a file’s metadata to OME-XML (strongly encouraged), please see Bio-Formats metadata processing for further information.
  • Once you have written your file format reader, add a line to the readers.txt file with the fully qualified name of the reader, followed by a ‘#’ and the file extensions associated with the file format. Note that loci.formats.ImageReader, the master file format reader, tries to identify which format reader to use according to the order given in readers.txt, so be sure to place your reader in an appropriate position within the list.
  • The easiest way to test your new reader is by calling “java loci.formats.tools.ImageInfo <file name>”. If all goes well, you should see all of the metadata and dimension information, along with a window showing the images in the file. loci.formats.ImageReader can take additional parameters; a brief listing is provided below for reference, but it is recommended that you take a look at the contents of loci.formats.tools.ImageInfo to see exactly what each one does.
Argument Action
-version print the library version and exit
file the image file to read
-nopix read metadata only, not pixels
-nocore do not output core metadata
-nometa do not parse format-specific metadata table
-nofilter do not filter metadata fields
-thumbs read thumbnails instead of normal pixels
-minmax compute min/max statistics
-merge combine separate channels into RGB image
-nogroup force multi-file datasets to be read as individual files
-stitch stitch files with similar names
-separate split RGB image into separate channels
-expand expand indexed color to RGB
-omexml populate OME-XML metadata
-normalize normalize floating point images*
-fast paint RGB images as quickly as possible*
-debug turn on debugging output
-range specify range of planes to read (inclusive)
-series specify which image series to read
-swap override the default input dimension order
-shuffle override the default output dimension order
-map specify file on disk to which name should be mapped
-preload pre-read entire file into a buffer; significantly reduces the time required to read the images, but requires more memory
-crop crop images before displaying; argument is ‘x,y,w,h’
-autoscale used in combination with ‘-fast’ to automatically adjust brightness and contrast
-novalid do not perform validation of OME-XML
-omexml-only only output the generated OME-XML
-format read file with a particular reader (e.g., ZeissZVI)

* = may result in loss of precision

  • If you wish to test using TestNG, loci.tests.testng.FormatReaderTest provides several basic tests that work with all Bio-Formats readers. See the FormatReaderTest source code for additional information.
  • For more details, please look at the source code and Javadocs. Studying existing readers is probably the best way to get a feel for the API; we would recommend first looking at loci.formats.in.ImarisReader (this is the most straightforward one). loci.formats.in.LIFReader and InCellReader are also good references that show off some of the nicer features of Bio-Formats.

If you have questions about Bio-Formats, please contact the forums.