Extracting, processing and validating OME-XML¶
Extracting the OME-XML from an OME-TIFF file¶
If you install the Bio-Formats command line tools, you can produce a nicely formatted OME-XML string from an OME-TIFF file with:
$ tiffcomment file.ome.tif | xmlindent
Alternatively, if you have ImageMagick installed, one easy way to extract the OME-XML embedded in the TIFF headers is to use it from the command line:
$ identify -verbose
If you are working in C/C++, we recommend the open source LibTIFF library or the OME Files C++ implementation.
If you are looking for a solution in Java, there are several options. Bio-Formats can read OME-TIFF files, as well as convert from many third-party formats into OME-TIFF format—see the example source code page for specific examples. Alternatively, the open source ImageJ application reads multi-page TIFF files, storing the TIFF comment into the associated FileInfo object’s “description” field.
Processing an OME-XML block¶
If the XML was stored without line breaks, it can still be difficult to read after being extracted. There are several solutions to this problem, such as using an XML viewer or editor (web browsers work well), or processing the XML with a SAX or DOM library.
On most Linux distributions, you can install the libxml package and use
the xmllint
program:
$ xmllint --format file.xml
Here is a Perl script that uses XML::LibXML to “pretty print” an XML document with appropriate whitespace:
formatxml.pl
use XML::LibXML;
$file = $ARGV[0];
$parser = XML::LibXML->new(); die "Cannot create XML parser" unless defined $parser;
$parser->validation(0);
if (defined $file) { $doc = $parser->parse_file($file); }
else { $doc = $parser->parse_fh(STDIN); } print $doc->toString(1);
Unfortunately, both xmllint
and the above Perl script can be somewhat
fragile; if there are any errors or abnormalities in the XML, they
generally fail to produce any indentation. Thus, we have also written
some Java code to do the same thing; just download the
Bio-Formats command line tools and run:
$ xmlindent file.xml
Another option is to feed the XML into our OME-XML Java library, which provides methods for querying and manipulating the OME-XML (using DOM and SAX). This library is what Bio-Formats uses to work with OME-XML.
Validating OME-XML¶
We have created a command line tool in Java for validating OME-XML, and included it as part of the Bio-Formats bftools.zip download. Please refer to the Bio-Formats command line tools documentation for more details but, in brief, you download and unzip the tools to produce a collection of command line scripts for Unix/Mac and batch files for Windows. The two commands we will use are:
- xmlvalid
- A command line XML validation tool
- tiffcomment
- Extracts the OME-XML block in an OME-TIFF file from the comment in the TIFF’s first IFD entry.
All scripts require bioformats_package.jar
to be downloaded into the
same directory as the command line tools. Then to validate an OME-XML file
sample.ome
use:
$ xmlvalid sample.ome
This validates the XML directly.
Then to validate an OME-TIFF file sample.ome.tif
use:
$ tiffcomment sample.ome.tif | xmlvalid
This extracts the OME-XML from the TIFF then passes it to the validator. Typical successful output is:
$ ./xmlvalid sample.ome
Parsing schema path
http://www.openmicroscopy.org/Schemas/OME/2010-06/ome.xsd
Validating sample.ome
No validation errors found.
$
If any errors are found they are reported. When correcting errors, it is usually best to work from the top of the file as errors higher up can cause extra errors further down. In this example the output shows 3 errors but there are only 2 mistakes in the file.
$ ./xmlvalid broken.ome
Parsing schema path
http://www.openmicroscopy.org/Schemas/OME/2010-06/ome.xsd
Validating broken.ome
cvc-complex-type.4: Attribute 'SizeY' must appear on element 'Pixels'.
cvc-enumeration-valid: Value 'Non Zero' is not facet-valid with respect
to enumeration '[EvenOdd, NonZero]'. It must be a value from the enumeration.
cvc-attribute.3: The value 'Non Zero' of attribute 'FillRule' on element
'ROI:Shape' is not valid with respect to its type, 'null'.
Error validating document: 3 errors found
$
Alternatively, you can use the more general online W3C XML validator to validate your OME-XML blocks. For best results, be sure to check the “keep going” option.
Another option is to use a commercial XML application such as Turbo XML to work with and validate your OME-XML documents.