Extracting, processing and validating OME-XML ============================================= Extracting the OME-XML from an OME-TIFF file -------------------------------------------- If you install the :bf_doc:`Bio-Formats command line tools `, you can produce a nicely formatted OME-XML string from an OME-TIFF file with: :: $ tiffcomment file.ome.tif | xmlindent Alternatively, if you have ImageMagick installed, one easy way to extract the OME-XML embedded in the TIFF headers is to use it from the command line: :: $ identify -verbose If you are working in C/C++, we recommend the open source `LibTIFF `_ library or the :cpp_downloads:`OME Files C++ implementation <>`. If you are looking for a solution in Java, there are several options. Bio-Formats can read OME-TIFF files, as well as convert from many third-party formats into OME-TIFF format—see the :doc:`example source code page ` for specific examples. Alternatively, the open source `ImageJ `_ application reads multi-page TIFF files, storing the TIFF comment into the associated FileInfo object's "description" field. Processing an OME-XML block --------------------------- If the XML was stored without line breaks, it can still be difficult to read after being extracted. There are several solutions to this problem, such as using an XML viewer or editor (web browsers work well), or processing the XML with a SAX or DOM library. On most Linux distributions, you can install the libxml package and use the ``xmllint`` program: :: $ xmllint --format file.xml Here is a Perl script that uses `XML::LibXML `_ to "pretty print" an XML document with appropriate whitespace: :: formatxml.pl use XML::LibXML; $file = $ARGV[0]; $parser = XML::LibXML->new(); die "Cannot create XML parser" unless defined $parser; $parser->validation(0); if (defined $file) { $doc = $parser->parse_file($file); } else { $doc = $parser->parse_fh(STDIN); } print $doc->toString(1); Unfortunately, both ``xmllint`` and the above Perl script can be somewhat fragile; if there are any errors or abnormalities in the XML, they generally fail to produce any indentation. Thus, we have also written some Java code to do the same thing; just download the :bf_doc:`Bio-Formats command line tools ` and run: :: $ xmlindent file.xml Another option is to feed the XML into our :doc:`OME-XML Java library `, which provides methods for querying and manipulating the OME-XML (using DOM and SAX). This library is what Bio-Formats uses to work with OME-XML. Validating OME-XML ------------------ We have created a command line tool in Java for validating OME-XML, and included it as part of the Bio-Formats :bf_downloads:`bftools.zip download <>`. Please refer to the :bf_doc:`Bio-Formats command line tools ` documentation for more details but, in brief, you download and unzip the tools to produce a collection of command line scripts for Unix/Mac and batch files for Windows. The two commands we will use are: .. glossary:: xmlvalid A command line XML validation tool tiffcomment Extracts the OME-XML block in an OME-TIFF file from the comment in the TIFF's first IFD entry. All scripts require :file:`bioformats_package.jar` to be downloaded into the same directory as the command line tools. Then to validate an OME-XML file :file:`sample.ome` use: :: $ xmlvalid sample.ome This validates the XML directly. Then to validate an OME-TIFF file :file:`sample.ome.tif` use: :: $ tiffcomment sample.ome.tif | xmlvalid This extracts the OME-XML from the TIFF then passes it to the validator. Typical successful output is: :: $ ./xmlvalid sample.ome Parsing schema path http://www.openmicroscopy.org/Schemas/OME/2010-06/ome.xsd Validating sample.ome No validation errors found. $ If any errors are found they are reported. When correcting errors, it is usually best to work from the top of the file as errors higher up can cause extra errors further down. In this example the output shows 3 errors but there are only 2 mistakes in the file. :: $ ./xmlvalid broken.ome Parsing schema path http://www.openmicroscopy.org/Schemas/OME/2010-06/ome.xsd Validating broken.ome cvc-complex-type.4: Attribute 'SizeY' must appear on element 'Pixels'. cvc-enumeration-valid: Value 'Non Zero' is not facet-valid with respect to enumeration '[EvenOdd, NonZero]'. It must be a value from the enumeration. cvc-attribute.3: The value 'Non Zero' of attribute 'FillRule' on element 'ROI:Shape' is not valid with respect to its type, 'null'. Error validating document: 3 errors found $ .. Also available is a web-based `OME-XML validator `_ for checking files in OME-XML or OME-TIFF formats, including: - conformation to the OME-XML schema - listing missing internal references - listing external references - correct TiffData block usage - frame counts in TIFF files Alternatively, you can use the more general online `W3C XML validator `_ to validate your OME-XML blocks. For best results, be sure to check the "keep going" option. Another option is to use a commercial XML application such as Turbo XML to work with and validate your OME-XML documents.