|
SerializationSpecial Considerations |
This could cause problems in progams where the copies of different objects are saved from the same address.
template<class Archive>
void save(boost::basic_oarchive & ar, const unsigned int version) const
{
for(int i = 0; i < 10; ++i){
A x = a[i];
ar << x;
}
}
In this case, the data to be saved exists on the stack. Each iteration
of the loop updates the value on the stack. So although the data changes
each iteration, the address of the data doesn't. If a[i] is an array of
objects being tracked by memory address, the library will skip storing
objects after the first as it will be assumed that objects at the same address
are really the same object.
To help detect such cases, output archive operators expect to be passed
const
reference arguments.
Given this, the above code will invoke a compile time assertion. The obvious fix in this example is to use
template<class Archive>
void save(boost::basic_oarchive & ar, const unsigned int version) const
{
for(int i = 0; i < 10; ++i){
ar << a[i];
}
}
which will compile and run without problem.
The usage of const
by the output archive operators
will ensure that the process of serialization doesn't
change the state of the objects being serialized. An attempt to do this
would constitute augmentation of the concept of saving of state with
some sort of non-obvious side effect. This would almost surely be a mistake
and a likely source of very subtle bugs.
Unfortunately, implementation issues currently prevent the detection of this kind of error when the data item is wrapped as a name-value pair.
A similar problem can occur when different objects are loaded to and address which is different from the final location:
template<class Archive>
void load(boost::basic_oarchive & ar, const unsigned int version) const
{
for(int i = 0; i < 10; ++i){
A x;
ar >> x;
std::m_set.insert(x);
}
}
In this case, the address of x
is the one that is tracked rather than
the address of the new item added to the set. Left unaddressed
this will break the features that depend on tracking such as loading object through a pointer.
Subtle bugs will be introduced into the program. This can be
addressed by altering the above code thusly:
template<class Archive>
void load(boost::basic_iarchive & ar, const unsigned int version) const
{
for(int i = 0; i < 10; ++i){
A x;
ar >> x;
std::pair<std::set::const_iterator, bool> result;
result = std::m_set.insert(x);
ar.reset_object_address(& (*result.first), &x);
}
}
This will adjust the tracking information to reflect the final resting place of
the moved variable and thereby rectify the above problem.
If it is known a priori that no pointer values are duplicated, overhead associated with object tracking can be eliminated by setting the object tracking class serialization trait appropriately.
By default, data types designated primitive by
Implementation Level
class serialization trait are never tracked. If it is desired to
track a shared primitive object through a pointer (e.g. a
long
used as a reference count), It should be wrapped
in a class/struct so that it is an identifiable type.
The alternative of changing the implementation level of a long
would affect all long
s serialized in the whole
program - probably not what one would intend.
It is possible that we may want to track addresses even though
the object is never serialized through a pointer. For example,
a virtual base class need be saved/loaded only once. By setting
this serialization trait to track_always
, we can suppress
redundant save/load operations.
BOOST_CLASS_TRACKING(my_virtual_base_class, boost::serialization::track_always)
BOOST_CLASS_EXPORT
. This is used to make the serialization library aware
that code should be instantiated for serialization of a given class even though the
class hasn't been otherwise referred to by the program. This functionality
is necessary to implement serialization of pointers through a virtual base
class pointer. That is, a polymorphic pointer.
This macro specifies a "Globally Unique IDentifier".
This is an string which identifies the class to be created when data is loaded.
Generally a text representation of the class name is sufficient for this purpose,
but in certain cases it maybe necessary to specify a different string by using
BOOST_CLASS_EXPORT_GUID
rather than a simple
BOOST_CLASS_EXPORT
.
BOOST_CLASS_EXPORT
would usually
be specified in the same header file as the class declaration to which it
corresponds. That is, BOOST_CLASS_EXPORT(T)
is a "trait" of the class T. So a program using this class will look
something like:
#include <boost/archive/xml_oarchive.hpp>
.... // any other archive classes
#include "my_class.hpp" // which contains BOOST_CLASS_EXPORT(my_class)
These headers can be in any order. (In boost versions 1.34
and earlier, the archive headers had to go before any headers which
contain BOOST_CLASS_EXPORT
.)
Any code required to serialize types specified
by BOOST_CLASS_EXPORT
will be
instantiated for each archive whose header is included. (note that the code
is instantiated regardless of whether or not it is actually invoked.)
If no archive headers are included - no code should be instantiated.
This will permit BOOST_CLASS_EXPORT
to be a permanent part of the my_class.hpp
.
Note that the implementation of this functionality depends upon vendor specific extensions to the C++ language. So, there is no guarenteed portability of programs which use this facility. However, all C++ compilers which are tested with boost provide the required extensions. The library includes the extra declarations required by each of these compilers. It's reasonable to expect that future C++ compilers will support these extensions or something equivalent.
boost::serialization::object_serializable
.
Turning off tracking and class information serialization will result in pure template inline code that in principle could be optimised down to a simple stream write/read. Elimination of all serialization overhead in this manner comes at a cost. Once archives are released to users, the class serialization traits cannot be changed without invalidating the old archives. Including the class information in the archive assures us that they will be readable in the future even if the class definition is revised. A light weight structure such as display pixel might be declared in a header like this:
#include <boost/serialization/serialization.hpp>
#include <boost/serialization/level.hpp>
#include <boost/serialization/tracking.hpp>
// a pixel is a light weight struct which is used in great numbers.
struct pixel
{
unsigned char red, green, blue;
template<class Archive>
void serialize(Archive & ar, const unsigned int /* version */){
ar << red << green << blue;
}
};
// elminate serialization overhead at the cost of
// never being able to increase the version.
BOOST_CLASS_IMPLEMENTATION(pixel, boost::serialization::object_serializable);
// eliminate object tracking (even if serialized through a pointer)
// at the risk of a programming error creating duplicate objects.
BOOST_CLASS_TRACKING(pixel, boost::serialization::track_never)
wchar_t
while other compilers reserve only 2 bytes.
So its possible that a value could be written that couldn't be represented by the loading program. This is a
fairly obvious situation and easily handled by using the numeric types in
<boost/cstdint.hpp>
A special integral type is std::size_t
which is a typedef
of an integral types guaranteed to be large enough
to hold the size of any collection, but its actual size can differ depending
on the platform. The
collection_size_type
wrapper exists to enable a portable serialization of collection sizes by an archive.
Recommended choices for a portable serialization of collection sizes are to
use either 64-bit or variable length integer representation.
template<class T>
struct my_wrapper {
template<class Archive>
Archive & serialize ...
};
...
class my_class {
wchar_t a;
short unsigned b;
template<<class Archive>
Archive & serialize(Archive & ar, unsigned int version){
ar & my_wrapper(a);
ar & my_wrapper(b);
}
};
If my_wrapper
uses default serialization
traits there could be a problem. With the default traits, each time a new type is
added to the archive, bookkeeping information is added. So in this example, the
archive would include such bookkeeping information for
my_wrapper<wchar_t>
and for
my_wrapper<short_unsigned>
.
Or would it? What about compilers that treat
wchar_t
as a
synonym for unsigned short
?
In this case there is only one distinct type - not two. If archives are passed between
programs with compilers that differ in their treatment
of wchar_t
the load operation will fail
in a catastrophic way.
One remedy for this is to assign serialization traits to the template
my_template
such that class
information for instantiations of this template is never serialized. This
process is described above and
has been used for Name-Value Pairs.
Wrappers would typically be assigned such traits.
Another way to avoid this problem is to assign serialization traits
to all specializations of the template my_wrapper
for all primitive types so that class information is never saved. This is what has
been done for our implementation of serializations for STL collections.
ios::binary
. If this is not done, the archive generated
will be unreadable.
Unfortunately, no way has been found to detect this error before loading the archive. Debug builds will assert when this is detected so that may be helpful in catching this error.
demo_pimpl.cpp
,
demo_pimpl_A.cpp
and
demo_pimpl_A.hpp
where implementation of serializaton is completely separate
from the main program.
Well, not quite.
There are a couple of global data structures for holding
information of serializable types. These structures are
used to dispatch to correct code to handle each pair
of serializable types and archive types. Since this
information is shared among all archives, there is
potential for problems. This has been addressed
carefully implementing the library so that these
structures are all initialized before
main(...)
is called. From then on they are never altered. So
there SHOULD be no problem having mulitple archives
open simultaneously - be it from the same or different
threads.
Well, almost.
With dynamically loaded code - DLLS or Shared Libraries,
these global data structures can be altered when a library
is loaded or unloaded. That is, in this case, these
globa data structures can be altered after
main(...)
is called. So if a thread is dynamically loading/unloading
modules which contain serialization code while an
archive is open there could be problems. Also, if
such loading/unloading is happening concurrently
in different threads, there could also be problems.
It might not be easy to control this. Is possible that some systems may not actually load modules until they are actually needed. So even though we think that there is not dynamic loading/unloading of such code it could be occurring as "help" to manage resources. On such systems, access to archive code would have to be syncronized with some multi-threading construct in order to be functional.
array
wrapper.
Serialization functions for data types containing contiguous arrays of homogeneous
types, such as for std::vector
, std::valarray
or
boost::multiarray
should serialize them using an
array
wrapper to make use of
these optimizations.
Archive types that can provide optimized serialization for contiguous arrays of
homogeneous types should implement these by overloading the serialization of
the array
wrapper, as is done
for the binary archives.
© Copyright Robert Ramey 2002-2004. Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)