C++ Boost

Serialization

Rationale


The term "serialization" is preferred to "persistence"
Archives are not streams
Strings are treated specially in text archives
typeid information is not included in archives
Compile time trap when saving a non-const value

The term "serialization" is preferred to "persistence"

I found that persistence is often used to refer to something quite different. Examples are storage of class instances (objects) in database schema [4] This library will be useful in other contexts besides implementing persistence. The most obvious case is that of marshalling data for transmission to another system.

Archives are not streams

Archive classes are NOT derived from streams even though they have similar syntax rules.

Archive Members are Templates Rather than Virtual Functions

The previous version of this library defined virtual functions for all primitive types. These were overridden by each archive class. There were two issues related to this:
  • Some disliked virtual functions because of the added execution time overhead.
  • This caused implementation difficulties since the set of primitive data types varies between platforms. Attempting to define the correct set of virtual functions, (think long long, __int64, etc.) resulted in messy and fragile code. Replacing this with templates and letting the compiler generate the code for the primitive types actually used, resolved this problem. Of course, the ripple effects of this design change were significant, but in the end led to smaller, faster, more maintainable code.

    std::strings are treated specially in text files

    Treating strings as STL vectors would result in minimal code size. This was not done because:

    typeid information is not included in archives

    I originally thought that I had to save the name of the class specified by std::type_of::name() in the archive. This created difficulties as std::type_of::name() is not portable and not guaranteed to return the class name. This makes it almost useless for implementing archive portability. This topic is explained in much more detail in [7] page 206. It turned out that it was not necessary. As long as objects are loaded in the exact sequence as they were saved, the type is available when loading. The only exception to this is the case of polymorphic pointers never before loaded/saved. This is addressed with the register_type() and/or export facilities described in the reference. In effect, export generates a portable equivalent to typeid information.

    Compile time trap when saving a non-const value

    The following code will fail to compile. The failure will occur on a line with a BOOST_STATIC_ASSERT. Here, we refer to this as a compile time trap.
    T t;
    ar << t;
    
    unless the tracking_level serialization trait is set to "track_never". The following will compile without problem:
    const T t
    ar << t;
    
    Likewise, the following code will trap at compile time:
    T * t;
    ar >> t;
    
    if the tracking_level serialization trait is set to "track_never".

    This behavior has been contraversial and may be revised in the future. The criticism is that it will flag code that is in fact correct and force users to insert const_cast. My view is that:

    The following case illustrates my position. It was originally used as an example in the mailing list by Peter Dimov.
    class construct_from 
    { 
        ... 
    }; 
    
    void main(){ 
        ... 
        Y y; 
        construct_from x(y); 
        ar << x; 
    } 
    
    Suppose that there is no trap as described above.
    1. this example compiles and executes fine. No tracking is done because construct_from has never been serialized through a pointer. Now some time later, the next programmer(2) comes along and makes an enhancement. He wants the archive to be sort of a log.
      void main(){ 
          ... 
          Y y; 
          construct_from x(y); 
          ar << x; 
          ... 
          x.f(); // change x in some way 
         ... 
          ar << x 
      } 
      

      Again no problem. He gets two different of copies in the archive, each one is different. That is he gets exactly what he expects and is naturally delighted.

    2. Now sometime later, a third programmer(3) sees construct_from and says - oh cool, just what I need. He writes a function in a totally disjoint module. (The project is so big, he doesn't even realize the existence of the original usage) and writes something like:
      class K { 
          shared_ptr <construct_from> z; 
          template <class Archive> 
          void serialize(Archive & ar, const unsigned version){ 
              ar << z; 
          } 
      }; 
      

      He builds and runs the program and tests his new functionality. It works great and he's delighted.

    3. Things continue smoothly as before. A month goes by and it's discovered that when loading the archives made in the last month (reading the log). Things don't work. The second log entry is always the same as the first. After a series of very long and increasingly acrimonius email exchanges, its discovered that programmer (3) accidently broke programmer(2)'s code .This is because by serializing via a pointer, the "log" object now being tracked. This is because the default tracking behavior is "track_selectively". This means that class instances are tracked only if they are serialized through pointers anywhere in the program. Now multiple saves from the same address result in only the first one being written to the archive. Subsequent saves only add the address - even though the data might have been changed. When it comes time to load the data, all instances of the log record show the same data. In this way, the behavior of a functioning piece of code is changed due the side effect of a change in an otherwise disjoint module. Worse yet, the data has been lost and cannot not be now recovered from the archives. People are really upset and disappointed with boost (at least the serialization system).

    4. After a lot of investigation, it's discovered what the source of the problem and class construct_from is marked "track_never" by including:
      BOOST_SERIALIZATION_TRACKING(construct_from, track_never) 
      
    5. Now everything works again. Or - so it seems.

    6. shared_ptr<construct_from> is not going to have a single raw pointer shared amongst the instances. Each loaded shared_ptr<construct_from> is going to have its own distinct raw pointer. This will break shared_ptr and cause a memory leak. Again, The cause of this problem is very far removed from the point of discovery. It could well be that the problem is not even discovered until after the archives are loaded. Now we not only have difficult to find and fix program bug, but we have a bunch of invalid archives and lost data.
    Now consider what happens when the trap is enabled:.

    1. Right away, the program traps at
      ar << x; 
      

    2. The programmer curses (another %^&*&* hoop to jump through). If he's in a hurry (and who isn't) and would prefer not to const_cast - because it looks bad. So he'll just make the following change an move on.
      Y y; 
      const construct_from x(y); 
      ar << x; 
      

      Things work fine and he moves on.

    3. Now programer (2) wants to make his change - and again another annoying const issue;
      Y y; 
      const construct_from x(y); 
      ... 
      x.f(); // change x in some way ; compile error f() is not const 
      ... 
      ar << x 
      

      He's mildly annoyed now he tries the following:

      • He considers making f() a const - but presumable that shifts the const error to somewhere else. And his doesn't want to fiddle with "his" code to work around a quirk in the serializaition system

      • He removes the const from const construct_from above - damn now he gets the trap. If he looks at the comment code where the BOOST_STATIC_ASSERT occurs, he'll do one of two things

        1. This is just crazy. Its making my life needlessly difficult and flagging code that is just fine. So I'll fix this with a const_cast and fire off a complaint to the list and mabe they will fix it. In this case, the story branches off to the previous scenario.

        2. Oh, this trap is suggesting that the default serialization isn't really what I want. Of course in this particular program it doesn't matter. But then the code in the trap can't really evaluate code in other modules (which might not even be written yet). OK, I'll add the following to my construct_from.hpp to solve the problem.
          BOOST_SERIALIZATION_TRACKING(construct_from, track_never) 
          

    4. Now programmer (3) comes along and make his change. The behavior of the original (and distant module) remains unchanged because the construct_from trait has been set to "track_never" so he should always get copies and the log should be what we expect.

    5. But now he gets another trap - trying to save an object of a class marked "track_never" through a pointer. So he goes back to construct_from.hpp and comments out the BOOST_SERIALIZATION_TRACKING that was inserted. Now the second trap is avoided, But damn - the first trap is popping up again. Eventually, after some code restructuring, the differing requirements of serializating construct_from are reconciled.
    Note that in this second scenario It's true that these traps may sometimes flag code that is currently correct and that this may be annoying to some programmers. However, this example illustrates my view that these traps are useful and that any such annoyance is small price to pay to avoid particularly vexing programming errors.

    © Copyright Robert Ramey 2002-2004. Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)