C++ conversion details¶
The C++ codebase has been primarily a conversion of the original Java codebase, with some additional helper functions and classes added where needed. The intention is that the basic interfaces and classes should be identical between the two languages unless this is prevented by fundamental differences between the languages.
This section is intended to be useful for
- Users of the existing Java interface, who wish to understand the differences between the two implementations
- Developers who wish to work on the C++ interface
In addition to documenting the specific language and class compatibility issues, this section also documents the idioms in use in the C++ code which might not be immediately clear by looking at the API reference, and which may not be familiar to Java developers.
C++ and Java type incompatibility¶
While C++ and Java have some basic syntactical similarities, there are several basic differences in their type systems.
Java types¶
Java has primitive types and classes.
int i;
double d;
- No unsigned primitive integer types
Pixels pixels = new Pixels();
- All classes are derived from root
Object
- Objects are by reference only
- Objects and arrays are always allocated with
new
- Destruction is non-deterministic
- All passing is by value (primitives and object references)
Pixels[] array = new Pixels[5];
- Arrays have an intrinsic size.
- Arrays are safe to index out of bounds (an exception is thrown).
C++ types¶
C++ has primitive types, structures and classes.
int16_t i1;
uint32_t i2;
double d;
- Primitive integer types may be signed or unsigned.
- Integer types are of defined size.
// Allocate on the stack, or as a struct or class member:
Pixels pixels;
// Allocate on the heap
Pixels *pixelsptr1 = new Pixels();
// Pointer to existing instance
const Pixels *pixelsptr2 = &pixels;
// Reference to existing instance
Pixels& pixelsref(pixels);
- Classes have no common root
- All types may be instances, pointers or references
- Object construction may be on the stack, on the heap using
new
or in place using placementnew
. - Pointers and references may refer to
const
type - Pointers may be
const
- References are implicitly
const
(similar tofinal
) - Destruction is deterministic
new
should never be used in modern C++ code (see below)
Pixels array[5];
- Arrays “decay” to bare pointers
- Arrays are not safe to index out of bounds
- Size information lost at runtime
- Never use arrays outside static initializers
Simplified type names¶
typedef
is used to create an alias for an existing type.
typedef std::vector<std::string> string_list;
string_list l;
string_list::const_iterator i = l.begin();
// NOT std::vector<std::string>::const_iterator
typedef std::vector<Pixels> plist;
plist pl(6);
plist::size_type idx = 2;
// size_type NOT unsigned int or uint32_t
pl.at(idx) = ...;
Used in standard container types e.g. size_type
,
value_type
and in classes and class templates.
Consistency is needed for generic programming—use the standard type
names to enable interoperability with standard algorithms.
Exception handling¶
Java¶
throws
details which exceptions are thrown by a method.
Java exceptions are also “checked”, requiring the caller to catch and
handle all exceptions which might be thrown, aside from
RuntimeException
and its subclasses.
C++¶
C++ has exception specifications like Java, however they are useless
aside from nothrow
. This is because if an exception is
thrown which does not match the specification, it will abort the
program with a bad_exception
which makes them unusable in
practice.
Exceptions can be thrown at any point with the exception that they should never be thrown in a destructor. It is not necessary or typical to check exceptions except where needed. All code must be exception-safe given that an exception could be thrown at any point; the design considerations for exception safety are covered below.
Interfaces¶
Java supports single-inheritance, plus interfaces. C++ supports true multiple-inheritance, which is rather more flexible, at the expense of being rather more complicated and dangerous. However, the Java single-inheritance-plus-interfaces model can be implemented in C++ using a subset of the facilities provided by multiple inheritance. Rather than being enforced by the language, it is a set of idioms. These must be rigorously followed or else things will fail horribly!
C++ interfaces are classes with:
- No instance variables
- Pure virtual methods
protected
default constructorpublic virtual
destructor- Deleted copy constructor and assignment operator
C++ classes implementing interfaces:
- Use
public
inheritance for parent class - Use
virtual public
inheritance for implemented interfaces - Have a
virtual
destructor
When compiled with optimization enabled, the interface classes should
have zero storage overhead. If implementing classes do not use
virtual public
inheritance, compilation will fail as soon as
a second class in the inheritance hierarchy also implements the
interface.
Reference handling and memory management¶
Pointer problems¶
Plain (or “dumb”) C++ pointers can be dangerous if used incorrectly. The OME-Files API make a point of never using them unless absolutely necessary. For automatic objects allocated on the stack, allocation and deallocation is automatic and safe:
{
Image i(filename);
i.read_plane();
// Object destroyed when i goes out of scope
}
In this case, the object’s destructor was run and the memory freed automatically.
Looking at the case where a pointer is used to reference manually-allocated memory on the heap:
{
Image *i = new Image(filename);
i->read_plane();
// Memory not freed when pointer i goes out of scope
}
In this case new
was not paired with the corresponding
delete
, resulting in a memory leak. This is the code with
the “leak” fixed:
{
Image *i = new Image(filename);
i->read_plane(); // throws exception; memory leaked
delete i; // never called
}
new
and delete
are now paired, but the code is not
exception-safe. If an exception is thrown, memory will still be
leaked. Manual memory management requires correct clean up for every
exit point in the function, including both all return
statements and thrown exceptions. Here, we handle this correctly:
{
Image *i = new Image(filename);
try {
i->read_plane(); // throws exception
} catch (const std::runtime_error& e) {
delete i; // clean up
throw; // rethrow
}
delete i; // never called for exceptions
}
However, this does not scale. This is painful and error prone when scaled to an entire codebase. Even within this simple function, there is only a single variable with a single exception and single return to deal with. Imagine the combinatorial explosion when there are several variables with different lifetimes and scopes, multiple return points and several exceptions to handle–this is easy to get wrong, so a more robust approach is needed.
Use of new
is not in the general case safe or sensible. The
OME-Files API never passes pointers allocated with new
, nor
requires any manual memory management. Instead, “smart” pointers are
used throughout to manage memory safely and automatically.
Resource Acquisition Is Initialization¶
Resource Acquisition Is Initialization (RAII) is a programming idiom used throughout modern C++ libraries and applications, including the Standard Library,
- A class is a proxy for a resource
- The resource is acquired when object is initialised
- The resource is released when object is destroyed
- Any resource may be managed (e.g. memory, files, locks, mutexes)
- The C++ language and runtime guarantees make resource management deterministic and reliable
- Safe for use in any scope
- Exception safe
- Used throughout modern C++ libraries and applications
Because this relies implicitly upon the deterministic object destruction guarantees made by the C++ language, this is not used widely in Java APIs which often require manual management of resources such as open files. Used carefully, RAII will prevent resource leaks and result in robust, safe code.
The FormatReader
API is currently not using RAII due to
the use of the FormatHandler::setId()
interface.
C++ reference variants¶
// Non-constant Constant
// ----------------------------- --------------------------------------
// Pointer
Image *i; const Image *i;
Image * const i; const Image * const i;
// Reference
Image& i; const Image& i;
// Shared pointer
std::shared_ptr<Image> i; std::shared_ptr<const Image> i;
const std::shared_ptr<Image> i; const std::shared_ptr<const Image> i;
// Shared pointer reference
std::shared_ptr<Image>& i; std::shared_ptr<const Image>& i;
const std::shared_ptr<Image>& i; const std::shared_ptr<const Image>& i;
// Weak pointer
std::weak_ptr<Image> i; std::weak_ptr<const Image> i;
const std::weak_ptr<Image> i; const std::weak_ptr<const Image> i;
// Weak pointer reference
std::weak_ptr<Image>& i; std::weak_ptr<const Image>& i;
const std::weak_ptr<Image>& i; const std::weak_ptr<const Image>& i;
Java has one reference type. Here, we have 22. Clearly, not all of these will typically be used. Below, a subset of these are shown for use for particular purposes.
Class member types:
Image i; // Concrete instance
std::shared_ptr<Image> i; // Reference
std::weak_ptr<Image> i; // Weak reference
Wherever possible, a concrete instance should be preferred. This is
not possible for polymorphic types, where a reference is required. In
this situation, an std::shared_ptr
is preferred if the
class owns the member and/or needs control over its lifetime. If the
class does not have ownership then an std::weak_ptr
will
allow safe access to the object if it still exists. In circumstances
where manual lifetime management is required, e.g. for performance,
and the member is guaranteed to exist for the duration of the object’s
lifetime, a plain pointer or reference may be used. A pointer will be
used if it is possible for it to be null
, or it may be
reassigned more than once, or if is assigned after initial
construction. If properly using RAII, using references should be
possible and preferred over bare pointers in all cases.
Argument types:
// Ownership retained
void read_plane(const Image& image);
// Ownership shared or transferred
void read_plane(const std::shared_ptr<Image>& image);
Passing primitive types by value is acceptable. However, passing a
struct or class by value will implicitly copy the object into the
callee’s stack frame, which may be expensive (and requires a copy
constructor which will not be guaranteed or even possible for
polymorphic types). Passing by reference avoids the need for any
copying, and passing by const
reference will prevent the
callee from modifying the object, also making it clear that there is
no transfer of ownership. Passing using an
std::shared_ptr
is possible but not recommended—the
copy will involve reference counting overhead which can kill
multi-threaded performance since it requires synchronization between
all threads; use a const
reference to an
std::shared_ptr
to avoid the overhead. If ownership
should be transferred or shared with the callee, use a
non-const
reference.
To be absolutely clear, plain pointers are never used and are not acceptable for ownership transfer. A plain reference also makes it clear there is no ownership transfer.
Return types:
Image get_image(); // Ownership transferred
Image& get_image(); // Ownership retained
std::shared_ptr<Image> get_image(); // Ownership shared/trans
If the callee does not retain a copy of the original object, it can’t
pass by reference since it can’t guarantee the object remaining in
scope after it returns, hence it must create a temporary value and
pass by value. If the callee does retain a copy, it has the option of
passing by reference. Passing by reference is preferred when
possible. Passing by value implies ownership transfer. Passing by
reference implies ownership retention. Passing an
std::shared_ptr
by value or reference implies sharing
ownership since the caller can retain a reference; if passing by value
ownership may be transferred since this implies the callee is not
retaining a reference to it (but this is not guaranteed).
Again, to be absolutely clear, plain pointers are never used and are not acceptable for ownership transfer. A plain reference also makes it clear there is no ownership transfer.
- Safety: References cannot be
null
- Storing polymorphic types requires use of a
shared_ptr
- Referencing polymorphic types may require use of a
shared_ptr
- Safety: To avoid cyclic dependencies, use
weak_ptr
- Safety: To allow object destruction while maintaining a safe
reference, use
weak_ptr
weak_ptr
is not directly usableweak_ptr
is convertible back toshared_ptr
for use if the object is still in existence- C++11 move semantics (
&&
) improve the performance of ownership transfer
Containers¶
Safe array passing¶
C++ arrays are not safe to pass in or out of functions since the size is not known unless passed separately.
class Image
{
// Unsafe; size unknown
uint8_t[] getLUT();
void setLUT(uint8_t[]& lut);
};
C++ arrays “decay” to “bare” pointers, and pointers have no associated
size information. std::array
is a safe alternative.
class Image
{
typedef std::array<uint8_t, 256> LUT;
// Safe; size defined
const LUT& getLUT() const;
void setLUT(const LUT&);
};
std::array
is a array-like object (a class which behaves
like an array). Its type and size are defined in the template, and it
may be passed around like any other object. Its
array::at()
method provides strict bounds checking, while
its index array::operator[]()
provides unchecked access.
Behavior differences¶
Pixel data buffering¶
Pixel data is handled differently between the Java and C++
implementations. The primary reason for the difference is that the
Java code uses raw byte[]
arrays to contain pixel data. This
could not be implemented in C++ due to the limitations of C++ arrays
discussed above, as well as having a number of additional limitations:
- they are not using the native pixel type, requiring conversion to the required type, potentially also including endian conversion
- they are unstructured, having no dimension ordering or dimension size information
The solution was to create a dedicated PixelBuffer
template class which could represent pixels of any type. This is
contained by a VariantPixelBuffer
class which can contain
any of the supported pixel types. This is therefore both flexible and
strongly-typed. The C++ code is slightly more complex as a result,
but it is safer, and the buffer can be passed around without the need
for any additional metadata to describe its type, size and ordering.
This can make passing pixel data between different libraries much more
transparent.
Pixel sample interleaving¶
Additional differences include the semantics of how the
FormatReader::openBytes()
and
FormatWriter::saveBytes()
methods are implemented. The
API is the same, but the default behavior is a little different. All
well-written code should cope with the differences, but code making
assumptions may require some attention.
The Java TIFF reader classes’
FormatReader::isInterleaved()
method will always return
false
, irrespective of the TIFF PlanarConfiguration
tag. As a
result, FormatReader::openBytes()
will always return pixel
data with samples on separate contiguous planes. In contrast, the C++
TIFF reader classes’ FormatReader::isInterleaved()
method
will return true
if the TIFF PlanarConfiguration
is CONTIG
and false
if SEPARATE
, and the
FormatReader::openBytes()
method will return pixel data
with the appropriate interleaving, matching the same format in the
TIFF file. The Java behavior is due to the implementation details of
its TIFF reading code; the C++ code uses libtiff and simply passes
back the pixel data without any rearrangement. Java code which
assumes it will never receive interleaved data will need to be updated
to cope with it when porting to C++.
The Java TIFF writer will always set interleaving if the number of
samples per pixel is one (which is the recommended behaviour),
overriding FormatWriter::setInterleaved()
. The C++ TIFF
writer will always set interleaving based upon
FormatWriter::setInterleaved()
, and will not override the
request of the caller. This discrepancy will be rectified in a future
release to match the behavior of the Java reader; in practice there is
no difference in the pixel ordering since interleaving is irrelevant
when there is only one sample per pixel.
To obtain the Java TIFF reader behavior in C++, i.e. to obtain
non-interleaved pixel data, create a VariantPixelBuffer
with the desired pixel type and interleaving (use the
PixelBufferBase::make_storage_order()
helper method to
create the dimension order without interleaving), and then assign the
buffer filled by FormatRead::openBytes()
to this buffer; the
data will be transparently converted to the desired ordering on
assignment.