C++ conversion details

The C++ codebase has been primarily a conversion of the original Java codebase, with some additional helper functions and classes added where needed. The intention is that the basic interfaces and classes should be identical between the two languages unless this is prevented by fundamental differences between the languages.

This section is intended to be useful for

  • Users of the existing Java interface, who wish to understand the differences between the two implementations
  • Developers who wish to work on the C++ interface

In addition to documenting the specific language and class compatibility issues, this section also documents the idioms in use in the C++ code which might not be immediately clear by looking at the API reference, and which may not be familiar to Java developers.

C++ and Java type incompatibility

While C++ and Java have some basic syntactical similarities, there are several basic differences in their type systems.

Java types

Java has primitive types and classes.

int i;
double d;
  • No unsigned primitive integer types
Pixels pixels = new Pixels();
  • All classes are derived from root Object
  • Objects are by reference only
  • Objects and arrays are always allocated with new
  • Destruction is non-deterministic
  • All passing is by value (primitives and object references)
Pixels[] array = new Pixels[5];
  • Arrays have an intrinsic size.
  • Arrays are safe to index out of bounds (an exception is thrown).

C++ types

C++ has primitive types, structures and classes.

int16_t i1;
uint32_t i2;
double d;
  • Primitive integer types may be signed or unsigned.
  • Integer types are of defined size.
// Allocate on the stack, or as a struct or class member:
Pixels         pixels;

// Allocate on the heap
Pixels        *pixelsptr1 = new Pixels();

// Pointer to existing instance
const Pixels  *pixelsptr2 = &pixels;

// Reference to existing instance
Pixels&        pixelsref(pixels);
  • Classes have no common root
  • All types may be instances, pointers or references
  • Object construction may be on the stack, on the heap using new or in place using placement new.
  • Pointers and references may refer to const type
  • Pointers may be const
  • References are implicitly const (similar to final)
  • Destruction is deterministic
  • new should never be used in modern C++ code (see below)
Pixels array[5];
  • Arrays “decay” to bare pointers
  • Arrays are not safe to index out of bounds
  • Size information lost at runtime
  • Never use arrays outside static initializers

Simplified type names

typedef is used to create an alias for an existing type.

typedef std::vector<std::string> string_list;
string_list l;
string_list::const_iterator i = l.begin();
// NOT std::vector<std::string>::const_iterator

typedef std::vector<Pixels> plist;
plist pl(6);
plist::size_type idx = 2;
// size_type NOT unsigned int or uint32_t
pl.at(idx) = ...;

Used in standard container types e.g. size_type, value_type and in classes and class templates. Consistency is needed for generic programming—use the standard type names to enable interoperability with standard algorithms.

Exception handling

Java

throws details which exceptions are thrown by a method. Java exceptions are also “checked”, requiring the caller to catch and handle all exceptions which might be thrown, aside from RuntimeException and its subclasses.

C++

C++ has exception specifications like Java, however they are useless aside from nothrow. This is because if an exception is thrown which does not match the specification, it will abort the program with a bad_exception which makes them unusable in practice.

Exceptions can be thrown at any point with the exception that they should never be thrown in a destructor. It is not necessary or typical to check exceptions except where needed. All code must be exception-safe given that an exception could be thrown at any point; the design considerations for exception safety are covered below.

Interfaces

Java supports single-inheritance, plus interfaces. C++ supports true multiple-inheritance, which is rather more flexible, at the expense of being rather more complicated and dangerous. However, the Java single-inheritance-plus-interfaces model can be implemented in C++ using a subset of the facilities provided by multiple inheritance. Rather than being enforced by the language, it is a set of idioms. These must be rigorously followed or else things will fail horribly!

C++ interfaces are classes with:

  • No instance variables
  • Pure virtual methods
  • protected default constructor
  • public virtual destructor
  • Deleted copy constructor and assignment operator

C++ classes implementing interfaces:

  • Use public inheritance for parent class
  • Use virtual public inheritance for implemented interfaces
  • Have a virtual destructor

When compiled with optimization enabled, the interface classes should have zero storage overhead. If implementing classes do not use virtual public inheritance, compilation will fail as soon as a second class in the inheritance hierarchy also implements the interface.

Reference handling and memory management

Pointer problems

Plain (or “dumb”) C++ pointers can be dangerous if used incorrectly. The OME-Files API make a point of never using them unless absolutely necessary. For automatic objects allocated on the stack, allocation and deallocation is automatic and safe:

{
  Image i(filename);
  i.read_plane();

  // Object destroyed when i goes out of scope
}

In this case, the object’s destructor was run and the memory freed automatically.

Looking at the case where a pointer is used to reference manually-allocated memory on the heap:

{
  Image *i = new Image(filename);

  i->read_plane();

  // Memory not freed when pointer i goes out of scope
}

In this case new was not paired with the corresponding delete, resulting in a memory leak. This is the code with the “leak” fixed:

  {
  Image *i = new Image(filename);

  i->read_plane(); // throws exception; memory leaked

  delete i; // never called
}

new and delete are now paired, but the code is not exception-safe. If an exception is thrown, memory will still be leaked. Manual memory management requires correct clean up for every exit point in the function, including both all return statements and thrown exceptions. Here, we handle this correctly:

{
  Image *i = new Image(filename);

  try {
    i->read_plane(); // throws exception
  } catch (const std::runtime_error& e) {
    delete i; // clean up
    throw; // rethrow
  }

  delete i; // never called for exceptions
}

However, this does not scale. This is painful and error prone when scaled to an entire codebase. Even within this simple function, there is only a single variable with a single exception and single return to deal with. Imagine the combinatorial explosion when there are several variables with different lifetimes and scopes, multiple return points and several exceptions to handle–this is easy to get wrong, so a more robust approach is needed.

Use of new is not in the general case safe or sensible. The OME-Files API never passes pointers allocated with new, nor requires any manual memory management. Instead, “smart” pointers are used throughout to manage memory safely and automatically.

std::shared_ptr as a “smart” pointer

The unsafe example above, has been rewritten to use std::shared_ptr:

// Start of block
{
  std::shared_ptr<Image> i(std::make_shared<Image>(filename));

  i->read_plane(); // throws exception

  // Memory freed when i's destructor is
  // run at exit of block scope
}

Rather than managing the memory by hand, responsibility for this is delegated to a “smart” pointer, std::shared_ptr. The memory is freed by the std::shared_ptr destructor which is run at the end of the block scope, on explicit return, or when cleaned up by exception stack unwinding.

  • shared_ptr object lifetime manages the resource
  • new replaced with std::make_shared
  • May be used as class members; lifetime is tied to class instance
  • Clean up for all exit points is automatic and safe
  • Allows ownership transfer and sharing
  • Allows reference without ownership using weak_ptr
  • weak_ptr references the object but does not prevent it being freed when the last shared_ptr reference is lost; this is useful for cycle breaking and is used by the OME XML model objects for references

Resource Acquisition Is Initialization

Resource Acquisition Is Initialization (RAII) is a programming idiom used throughout modern C++ libraries and applications, including the Standard Library,

  • A class is a proxy for a resource
  • The resource is acquired when object is initialised
  • The resource is released when object is destroyed
  • Any resource may be managed (e.g. memory, files, locks, mutexes)
  • The C++ language and runtime guarantees make resource management deterministic and reliable
  • Safe for use in any scope
  • Exception safe
  • Used throughout modern C++ libraries and applications

Because this relies implicitly upon the deterministic object destruction guarantees made by the C++ language, this is not used widely in Java APIs which often require manual management of resources such as open files. Used carefully, RAII will prevent resource leaks and result in robust, safe code.

The FormatReader API is currently not using RAII due to the use of the FormatHandler::setId() interface.

C++ reference variants

 //              Non-constant                                 Constant
 // -----------------------------  --------------------------------------
 // Pointer
                        Image *i;                         const Image *i;
                 Image * const i;                  const Image * const i;

 // Reference
                        Image& i;                         const Image& i;

 // Shared pointer
        std::shared_ptr<Image> i;         std::shared_ptr<const Image> i;
  const std::shared_ptr<Image> i;   const std::shared_ptr<const Image> i;

 // Shared pointer reference
       std::shared_ptr<Image>& i;        std::shared_ptr<const Image>& i;
 const std::shared_ptr<Image>& i;  const std::shared_ptr<const Image>& i;

 // Weak pointer
          std::weak_ptr<Image> i;           std::weak_ptr<const Image> i;
    const std::weak_ptr<Image> i;     const std::weak_ptr<const Image> i;

// Weak pointer reference
         std::weak_ptr<Image>& i;          std::weak_ptr<const Image>& i;
   const std::weak_ptr<Image>& i;    const std::weak_ptr<const Image>& i;

Java has one reference type. Here, we have 22. Clearly, not all of these will typically be used. Below, a subset of these are shown for use for particular purposes.

Class member types:

Image i;                     // Concrete instance
std::shared_ptr<Image> i;    // Reference
std::weak_ptr<Image> i;      // Weak reference

Wherever possible, a concrete instance should be preferred. This is not possible for polymorphic types, where a reference is required. In this situation, an std::shared_ptr is preferred if the class owns the member and/or needs control over its lifetime. If the class does not have ownership then an std::weak_ptr will allow safe access to the object if it still exists. In circumstances where manual lifetime management is required, e.g. for performance, and the member is guaranteed to exist for the duration of the object’s lifetime, a plain pointer or reference may be used. A pointer will be used if it is possible for it to be null, or it may be reassigned more than once, or if is assigned after initial construction. If properly using RAII, using references should be possible and preferred over bare pointers in all cases.

Argument types:

// Ownership retained
void read_plane(const Image& image);
// Ownership shared or transferred
void read_plane(const std::shared_ptr<Image>& image);

Passing primitive types by value is acceptable. However, passing a struct or class by value will implicitly copy the object into the callee’s stack frame, which may be expensive (and requires a copy constructor which will not be guaranteed or even possible for polymorphic types). Passing by reference avoids the need for any copying, and passing by const reference will prevent the callee from modifying the object, also making it clear that there is no transfer of ownership. Passing using an std::shared_ptr is possible but not recommended—the copy will involve reference counting overhead which can kill multi-threaded performance since it requires synchronization between all threads; use a const reference to an std::shared_ptr to avoid the overhead. If ownership should be transferred or shared with the callee, use a non-const reference.

To be absolutely clear, plain pointers are never used and are not acceptable for ownership transfer. A plain reference also makes it clear there is no ownership transfer.

Return types:

                 Image get_image(); // Ownership transferred
                Image& get_image(); // Ownership retained
std::shared_ptr<Image> get_image(); // Ownership shared/trans

If the callee does not retain a copy of the original object, it can’t pass by reference since it can’t guarantee the object remaining in scope after it returns, hence it must create a temporary value and pass by value. If the callee does retain a copy, it has the option of passing by reference. Passing by reference is preferred when possible. Passing by value implies ownership transfer. Passing by reference implies ownership retention. Passing an std::shared_ptr by value or reference implies sharing ownership since the caller can retain a reference; if passing by value ownership may be transferred since this implies the callee is not retaining a reference to it (but this is not guaranteed).

Again, to be absolutely clear, plain pointers are never used and are not acceptable for ownership transfer. A plain reference also makes it clear there is no ownership transfer.

  • Safety: References cannot be null
  • Storing polymorphic types requires use of a shared_ptr
  • Referencing polymorphic types may require use of a shared_ptr
  • Safety: To avoid cyclic dependencies, use weak_ptr
  • Safety: To allow object destruction while maintaining a safe reference, use weak_ptr
  • weak_ptr is not directly usable
  • weak_ptr is convertible back to shared_ptr for use if the object is still in existence
  • C++11 move semantics (&&) improve the performance of ownership transfer

Containers

Safe array passing

C++ arrays are not safe to pass in or out of functions since the size is not known unless passed separately.

class Image
{
  // Unsafe; size unknown
  uint8_t[] getLUT();
       void setLUT(uint8_t[]& lut);
};

C++ arrays “decay” to “bare” pointers, and pointers have no associated size information. std::array is a safe alternative.

class Image
{
  typedef std::array<uint8_t, 256> LUT;

  // Safe; size defined
  const LUT& getLUT() const;
        void setLUT(const LUT&);
};

std::array is a array-like object (a class which behaves like an array). Its type and size are defined in the template, and it may be passed around like any other object. Its array::at() method provides strict bounds checking, while its index array::operator[]() provides unchecked access.

Storing and passing unrelated types

Types with a common base

std::vector<std::shared_ptr<Base>> v;
v.push_back(std::make_shared<Derived>());

This can store any type derived from Base. An std::shared_ptr is essential. Without it, bare pointers to the base would be stored, and memory would be leaked when elements are removed from the container (unless externally managed [generally unsafe]). The same applies to passing polymorphic types.

Java containers can be problematic:

  • Java can store root Object in containers
  • Java can pass and return root Object in methods.
  • This is not possible in C++: there is no root object.
  • An alternative approach is needed.

Arbitrary types

boost::any may be used to store any type:

std::vector<boost::any> v;
v.push_back(Anything);
  • Assign and store any type
  • Type erasure (similar to Java generics)
  • Use for containers of arbitrary types
  • Flexible, but need to cast to each type used to extract
  • Code will not be able to handle all possible types meaningfully

This is the most flexible solution, but in order to get a value back out, requires casting it to its specific type. This can mean a situation could arise where values are stored of types which cannot be handled since it is not possible to write the code to handle every single possibility ahead of time. However, if the open-ended flexibility is needed, this is available.

A fixed set of types

boost::variant may be used to store a limited set of different types: This avoids the boost::any problem of not being able to handle all possible types, since the scope is limited to a set of allowed types, and a static_visitor can ensure that all types are supported by the code at compile time.

typedef boost::variant<int, std::string> variants;
std::vector<variants> v;
v.push_back(43);
v.push_back("ATTO 647N");
  • Store a set of discriminated types
  • “External polymorphism” via static_visitor
  • Used to store original metadata
  • Used to store nD pixel data of different pixel types

This is not an alternative to a common root object. Instead, this is a discriminated union, which can store one of a defined set of “variant” types. A static visitor pattern may be used to generate code to operate on all of the supported types. The variant type may be used as a class member, passed by value, passed by reference or stored in a container like any other type. Due to the way it is implemented to store values, it does not necessarily need wrapping in an std::shared_ptr since it can behave as a value type (depending upon the context).

Java uses polymorphism to store and pass the root Object around. The boost::variant and boost::any approaches use templates to (internally) create a common base and manage the stored objects. However, the end user does not need to deal with this complexity directly—the use of the types is quite transparent.

Variant example: MetadataMap

This example demonstrates the use of variants with a simple expansion for two different categories of type (scalars and vectors of scalars).

The MetadataMap class stores key-value pairs, where the value can be either a string, Boolean, or several integer and floating point types, or vectors of any of these types. When converting the data to other forms, it is necessary to flatten the vector types to a set of separate key-value pairs with the key having a numbered suffix, one for each element in the vector.

{
  MetadataMap map;
  MetadataMap flat_map (map.flatten());
}

A flattened map is created using the following method:

MetadataMap MetadataMap::flatten() const {
  MetadataMap newmap;

  for (MetadataMap::const_iterator i = oldmap.begin();
       i != oldmap.end(); ++i) {
    MetadataMapFlattenVisitor v(newmap, i->first);
    boost::apply_visitor(v, i->second);
  }

  return newmap;
}

The MetadataMapFlattenVisitor is implemented thusly:

// Flatten MetadataMap vector values
struct MetadataMapFlattenVisitor : public boost::static_visitor<> {
  MetadataMap& map; // Map of flattened elements
  const MetadataMap::key_type& key; // Current key

  MetadataMapFlattenVisitor
    (MetadataMap&                 map,
     const MetadataMap::key_type& key):
    map(map), key(key) {}

  // Output a scalar value of arbitrary type.
  template <typename T>
  void operator() (const T& v) const {
    map.set(key, v);
  }

  // Output a vector value of arbitrary type.
  template <typename T>
  void operator() (const std::vector<T>& c) const {
    typename std::vector<T>::size_type idx = 1;
    for (typename std::vector<T>::const_iterator i = c.begin();
         i != c.end(); ++i, ++idx) {
      std::ostringstream os;
      os << key << " #" << idx;
      map.set(os.str(), *i);
    }
  }
};

The MetadataMapFlattenVisitor is derived from boost::static_visitor, and its templated operator method is specialized and expanded once for each type supported by the variant type used by the map. In the above example, two separate overloaded operators are provided, one for scalar values which is a simple copy, and one for vector values which splits the elements into separate keys in the new map. The important part is the call to apply_visitor(), which takes as arguments the visitor object and the variant to apply it to.

This could be done with a large set of conditionals using boost::get<T>(value) for each supported type. The benefit of the boost::static_visitor approach is that it ensures that all the types are supported at compile time, and in effect results in the same code. If any types are not supported, the code will fail to compile.

Variant example: VariantPixelBuffer equality comparison

This example demonstrates the use of variants with a combinatorial expansion of types.

The VariantPixelBuffer class can contain PixelBuffer classes of various pixel types. Comparing for equality is only performed if the pixel types of the two objects are the same:

{
  VariantPixelBuffer a, b;
  if (a == b) {
    // Buffers are the same.
  }
}

This is implemented using an overloaded equality operator:

bool VariantPixelBuffer::operator ==
    (const VariantPixelBuffer& rhs) const
{
  return boost::apply_visitor(PBCompareVisitor(),
                              buffer, rhs.buffer);
}

As before, this is implemented in terms of a boost::static_visitor, but note that this time it is specialized for bool, meaning that the return type of apply_visitor() will also be bool, and the operator methods must also return this type.

struct PBCompareVisitor : public boost::static_visitor<bool> {
  template <typename T, typename U>
  bool operator() (const T& /* lhs */,
                   const U& /* rhs */) const {
    return false;
  }

  template <typename T>
  bool operator() (const T& lhs,
                   const T& rhs) const {
    return lhs && rhs && (*lhs == *rhs);
  }
};

Unlike the last example, the operator methods now have two arguments, both of which are variant types, and the apply_visitor() call is passed two variant objects in addition to the visitor object. This causes the templates to be expanded for all pairwise combinations of the possible types. When the types are not equal, the first templated operator is called, which always returns false. When the types are equal the second operator is called; this checks both operands are not null and then performs an equality comparison using the buffer contents. Given that all the operators are inline, we would hope that a good compiler would cause all the false cases to be optimized out after expansion.

Variant example: VariantPixelBuffer SFINAE

This example demonstrates the use of variants with SFINAE.

C++ has a concept known as Substitution Failure Is Not An Error (SFINAE), which refers to it not being an error for a candidate template to fail argument substitution during overload resolution. While this is in and of itself a fairly obscure language detail, it enables overloading of a method not just on type, but different categories of type, for example integer and floating point types, signed and unsigned integer types, simple and complex types, or combinations of all of these. This is particularly useful when writing algorithms to process pixel data.

Use of SFINAE has been made accessible through the creation of boost::enable_if (std::enable_if in C++11), and type traits (type category checking classes such as is_integer). The following code is an example of how one might write a visitor for adapting an algorithm to separate integer, floating point, complex floating point and bitmask cases.

struct TypeCategoryVisitor : public boost::static_visitor<>
{
  typedef ::ome::files::PixelProperties<::ome::xml::model::enums::PixelType::BIT>::std_type bit_type;

  TypeCategoryVisitor()
  {}

  // Integer pixel types
  template <typename T>
  typename boost::enable_if_c<
    boost::is_integral<T>::value, void
    >::type
  operator() (std::shared_ptr<::ome::files::PixelBuffer<T>>& buf)
  {
    // Integer algorithm.
  }

  // Floating point pixel types
  template <typename T>
  typename boost::enable_if_c<
    boost::is_floating_point<T>::value, void
    >::type
  operator() (std::shared_ptr<::ome::files::PixelBuffer<T>>& buf)
  {
    // Floating point algorithm.
  }

  // Complex floating point pixel types
  template <typename T>
  typename boost::enable_if_c<
    boost::is_complex<T>::value, void
    >::type
  operator() (std::shared_ptr<::ome::files::PixelBuffer<T>>& buf)
  {
    // Complex floating point algorithm.
  }

  // BIT/bool pixel type.  Note this is a simple overload since it is
  // a simple type, not a category of different types.
  void
  operator() (std::shared_ptr<::ome::files::PixelBuffer<bit_type>>& buf)
  {
    // Boolean algorithm.
  }
};

This visitor may be used with apply_visitor() in a similar manner to the previously demonstrated visitors.

enable_if has two parameters, the first being a conditional, the second being the return type (in this example, all the methods return void). If the conditional is true, then the type expands to the return type and the template is successfully substituted. If the conditional is false (types do not match), then the substitution fails and the template will not be used. Note that the conditional is itself a type, which can be confusing, since all this logic is driven by conditional template expansion.

Normal templates are specialized for a type. This approach allows specialization for different categories of type. Without this approach it would be necessary to write separate overloads for each individual type (each integer type, each floating point type, each complex type, etc.), even when the logic would be identical for e.g. the different integer types. This approach therefore removes the need for unnecessary code duplication, and the type traits checks make each type category explicit to the reader.

Behavior differences

Pixel data buffering

Pixel data is handled differently between the Java and C++ implementations. The primary reason for the difference is that the Java code uses raw byte[] arrays to contain pixel data. This could not be implemented in C++ due to the limitations of C++ arrays discussed above, as well as having a number of additional limitations:

  • they are not using the native pixel type, requiring conversion to the required type, potentially also including endian conversion
  • they are unstructured, having no dimension ordering or dimension size information

The solution was to create a dedicated PixelBuffer template class which could represent pixels of any type. This is contained by a VariantPixelBuffer class which can contain any of the supported pixel types. This is therefore both flexible and strongly-typed. The C++ code is slightly more complex as a result, but it is safer, and the buffer can be passed around without the need for any additional metadata to describe its type, size and ordering. This can make passing pixel data between different libraries much more transparent.

Pixel sample interleaving

Additional differences include the semantics of how the FormatReader::openBytes() and FormatWriter::saveBytes() methods are implemented. The API is the same, but the default behavior is a little different. All well-written code should cope with the differences, but code making assumptions may require some attention.

The Java TIFF reader classes’ FormatReader::isInterleaved() method will always return false, irrespective of the TIFF PlanarConfiguration tag. As a result, FormatReader::openBytes() will always return pixel data with samples on separate contiguous planes. In contrast, the C++ TIFF reader classes’ FormatReader::isInterleaved() method will return true if the TIFF PlanarConfiguration is CONTIG and false if SEPARATE, and the FormatReader::openBytes() method will return pixel data with the appropriate interleaving, matching the same format in the TIFF file. The Java behavior is due to the implementation details of its TIFF reading code; the C++ code uses libtiff and simply passes back the pixel data without any rearrangement. Java code which assumes it will never receive interleaved data will need to be updated to cope with it when porting to C++.

The Java TIFF writer will always set interleaving if the number of samples per pixel is one (which is the recommended behaviour), overriding FormatWriter::setInterleaved(). The C++ TIFF writer will always set interleaving based upon FormatWriter::setInterleaved(), and will not override the request of the caller. This discrepancy will be rectified in a future release to match the behavior of the Java reader; in practice there is no difference in the pixel ordering since interleaving is irrelevant when there is only one sample per pixel.

To obtain the Java TIFF reader behavior in C++, i.e. to obtain non-interleaved pixel data, create a VariantPixelBuffer with the desired pixel type and interleaving (use the PixelBufferBase::make_storage_order() helper method to create the dimension order without interleaving), and then assign the buffer filled by FormatRead::openBytes() to this buffer; the data will be transparently converted to the desired ordering on assignment.