The multi pass iterator

Backtracking in Spirit.Qi requires the use of the following types of iterator: forward, bidirectional, or random access. Because of backtracking, input iterators cannot be used. Therefore, the standard library classes std::istreambuf_iterator and std::istream_iterator, that fall under the category of input iterators, cannot be used. Another input iterator that is of interest is one that wraps a lexer, such as LEX.

Note

	Note
In general, Spirit.Qi generates recursive descent parser which require backtracking parsers by design. For this reason we need to provide at least forward iterators to any of Spirit.Qi's API functions. This is not an absolute requirement though. In the future, we shall see more deterministic parsers that require no more than 1 character (token) of lookahead. Such parsers allow us to use input iterators such as the `std::istream_iterator` as is.

In general, Spirit.Qi generates recursive descent parser which require backtracking parsers by design. For this reason we need to provide at least forward iterators to any of Spirit.Qi's API functions. This is not an absolute requirement though. In the future, we shall see more deterministic parsers that require no more than 1 character (token) of lookahead. Such parsers allow us to use input iterators such as the std::istream_iterator as is.

Backtracking can be implemented only if we are allowed to save an iterator position, i.e. making a copy of the current iterator. Unfortunately, with an input iterator, there is no way to do so, and thus input iterators will not work with backtracking in Spirit.Qi. One solution to this problem is to simply load all the data to be parsed into a container, such as a vector or deque, and then pass the begin and end of the container to Spirit.Qi. This method can be too memory intensive for certain applications, which is why the multi_pass iterator was created.

Using the multi_pass

The multi_pass iterator will convert any input iterator into a forward iterator suitable for use with Spirit.Qi. multi_pass will buffer data when needed and will discard the buffer when its contents is not needed anymore. This happens either if only one copy of the iterator exists or if no backtracking can occur.

A grammar must be designed with care if the multi_pass iterator is used. Any rule that may need to backtrack, such as one that contains an alternative, will cause data to be buffered. The rules that are optimal to use are repetition constructs (as kleene and plus).

Sequences of the form a >> b will buffer data as well. This is different from the behavior of Spirit.Classic but for a good reason. Sequences need to reset the current iterator to its initial state if one of the components of a sequence fails to match. To compensate for this behavior we added functionality to the expect parsers (i.e. constructs like a > b). Expectation points introduce deterministic points into the grammar ensuring no backtracking can occur if they match. For this reason we clear the buffers of any multi_pass iterator on each expectation point, ensuring minimal buffer content even for large grammars.

Important

	Important
If you use an error handler in conjunction with the `expect` parser while utilizing a `multi_pass` iterator and you intend to use the error handler to force a `retry` or a `fail` (see the description of error handlers - FIXME: insert link), then you need to instantiate the error handler using `retry` or `fail`, for instance: rule r<iterator_type> r; on_error<retry>(r, std::cout << phoenix::val("Error!")); If you fail to do so the resulting code will trigger an assert statement at runtime.

If you use an error handler in conjunction with the expect parser while utilizing a multi_pass iterator and you intend to use the error handler to force a retry or a fail (see the description of error handlers - FIXME: insert link), then you need to instantiate the error handler using retry or fail, for instance:

rule r<iterator_type> r;
on_error<retry>(r, std::cout << phoenix::val("Error!"));

If you fail to do so the resulting code will trigger an assert statement at runtime.

Any rule that repeats, such as kleene_star (*a) or positive such as (+a), will only buffer the data for the current repetition.

In typical grammars, ambiguity and therefore lookahead is often localized. In fact, many well designed languages are fully deterministic and require no lookahead at all. Peeking at the first character from the input will immediately determine the alternative branch to take. Yet, even with highly ambiguous grammars, alternatives are often of the form *(a | b | c | d). The input iterator moves on and is never stuck at the beginning. Let's look at a Pascal snippet for example:

program =
        programHeading >> block >> '.'
    ;

block =
       *(   labelDeclarationPart
        |   constantDefinitionPart
        |   typeDefinitionPart
        |   variableDeclarationPart
        |   procedureAndFunctionDeclarationPart
        )
    >>  statementPart
    ;

Notice the alternatives inside the Kleene star in the rule block . The rule gobbles the input in a linear manner and throws away the past history with each iteration. As this is fully deterministic LL(1) grammar, each failed alternative only has to peek 1 character (token). The alternative that consumes more than 1 character (token) is definitely a winner. After which, the Kleene star moves on to the next.

Now, after the lecture on the features to be careful with when using multi_pass, you may think that multi_pass is way too restrictive to use. That's not the case. If your grammar is deterministic, you can make use of the flush_multi_pass pseudo parser in your grammar to ensure that data is not buffered when unnecessary (flush_multi_pass is available from the Spirit.Qi parser Repository).

Here we present a minimal example showing a minimal use case. The multi_pass iterator is highly configurable, but the default policies have been chosen so that its easily usable with input iterators such as std::istreambuf_iterator. For the complete source code of this example please refer to multi_pass.cpp.

int main()
{
    namespace spirit = boost::spirit;
    using spirit::ascii::space;
    using spirit::ascii::char_;
    using spirit::qi::double_;
    using spirit::qi::eol;

    std::ifstream in("multi_pass.txt");    // we get our input from this file
    if (!in.is_open()) {
        std::cout << "Could not open input file: 'multi_pass.txt'" << std::endl;
        return -1;
    }

    typedef std::istreambuf_iterator<char> base_iterator_type;
    spirit::multi_pass<base_iterator_type> first =
        spirit::make_default_multi_pass(base_iterator_type(in));

    std::vector<double> v;
    bool result = spirit::qi::phrase_parse(first
      , spirit::make_default_multi_pass(base_iterator_type())
      , double_ >> *(',' >> double_)              // recognize list of doubles
      , space | '#' >> *(char_ - eol) >> eol      // comment skipper
      , v);                                       // data read from file

    if (!result) {
        std::cout << "Failed parsing input file!" << std::endl;
        return -2;
    }

    std::cout << "Successfully parsed input file!" << std::endl;
    return 0;
}

Using the flush_multi_pass parser

The Spirit Repository contains the flush_multi_pass parser component. This is usable in conjunction with the multi_pass iterator to minimize the buffering. It allows to insert explicit synchronization points into your grammar where it is safe to clear any stored input as it is ensured that no backtracking can occur at this point anymore.

When the flush_multi_pass parser is used with multi_pass, it will call multi_pass::clear_queue(). This will cause any buffered data to be erased. This also will invalidate all other copies of multi_pass and they should not be used. If they are, an boost::illegal_backtracking exception will be thrown.

The multi_pass Policies

The multi_pass iterator is a templated class configurable using policies. The description of multi_pass above is how it was originally implemented (before it used policies), and is the default configuration now. But, multi_pass is capable of much more. Because of the open-ended nature of policies, you can write your own policy to make multi_pass behave in a way that we never before imagined.

The multi_pass class has two template parameters:

The multi_pass template parameters

Input: The type multi_pass uses to acquire it's input. This is typically an input iterator, or functor.
Policies: The combined policies to use to create an instance of a multi_pass iterator. This combined policy type is described below

It is possible to implement all of the required functionality of the combined policy in a single class. But it has shown to be more convenient to split this into four different groups of functions, i.e. four separate, but well coordinated policies. For this reason the multi_pass library implements a template iterator_policies::default_policy allowing to combine several different policies, each implementing one of the functionality groups:

Table 12. Policies needed for default_policy template

Template Parameter	Description
`OwnershipPolicy`	This policy determines how `multi_pass` deals with it's shared components.
`CheckingPolicy`	This policy determines how checking for invalid iterators is done.
`InputPolicy`	A class that defines how `multi_pass` acquires its input. The `InputPolicy` is parameterized by the `Input` template parameter to the `multi_pass`.
`StoragePolicy`	The buffering scheme used by `multi_pass` is determined and managed by the StoragePolicy.

The multi_pass library contains several predefined policy implementations for each of the policy types as described above. First we will describe those predefined types. Afterwards we will give some guidelines how you can write your own policy implementations.

Predefined policies

All predefined multi_pass policies are defined in the namespace boost::spirit::iterator_policies.

Table 13. Predefined policy classes

Class name	Description
InputPolicy classes
`input_iterator`	This policy directs `multi_pass` to read from an input iterator of type `Input`.
`buffering_input_iterator`	This policy directs `multi_pass` to read from an input iterator of type `Input`. Additionally it buffers the last character received from the underlying iterator. This allows to wrap iterators not buffering the last character on their own (as `std::istreambuf_iterator`).
`istream`	This policy directs `multi_pass` to read from an input stream of type `Input` (usually a `std::basic_istream`).
`lex_input`	This policy obtains it's input by calling yylex(), which would typically be provided by a scanner generated by Flex. If you use this policy your code must link against a Flex generated scanner.
`functor_input`	This input policy obtains it's data by calling a functor of type `Input`. The functor must meet certain requirements. It must have a typedef called `result_type` which should be the type returned from `operator()`. Also, since an input policy needs a way to determine when the end of input has been reached, the functor must contain a static variable named `eof` which is comparable to a variable of `result_type`.
`split_functor_input`	This is essentially the same as the `functor_input` policy except that the (user supplied) function object exposes separate `unique` and `shared` sub classes, allowing to integrate the functors unique data members with the `multi_pass` data items held by each instance and its shared data members will be integrated with the `multi_pass` members shared by all copies.
OwnershipPolicy classes
`ref_counted`	This class uses a reference counting scheme. The `multi_pass` will delete it's shared components when the count reaches zero.
`first_owner`	When this policy is used, the first `multi_pass` created will be the one that deletes the shared data. Each copy will not take ownership of the shared data. This works well for Spirit, since no dynamic allocation of iterators is done. All copies are made on the stack, so the original iterator has the longest lifespan.
CheckingPolicy classes
`no_check`	This policy does no checking at all.
`buf_id_check`	This policy keeps around a buffer id, or a buffer age. Every time `clear_queue()` is called on a `multi_pass` iterator, it is possible that all other iterators become invalid. When `clear_queue()` is called, `buf_id_check` increments the buffer id. When an iterator is dereferenced, this policy checks that the buffer id of the iterator matches the shared buffer id. This policy is most effective when used together with the `split_std_deque` StoragePolicy. It should not be used with the `fixed_size_queue` StoragePolicy, because it will not detect iterator dereferences that are out of range.
full_check	This policy has not been implemented yet. When it is, it will keep track of all iterators and make sure that they are all valid. This will be mostly useful for debugging purposes as it will incur significant overhead.
StoragePolicy classes
`split_std_deque`	Despite its name this policy keeps all buffered data in a `std::vector`. All data is stored as long as there is more than one iterator. Once the iterator count goes down to one, and the queue is no longer needed, it is cleared, freeing up memory. The queue can also be forcibly cleared by calling `multi_pass::clear_queue()`.
`fixed_size_queue<N>`	This policy keeps a circular buffer that is size `N+1` and stores `N` elements. `fixed_size_queue` is a template with a `std::size_t` parameter that specified the queue size. It is your responsibility to ensure that `N` is big enough for your parser. Whenever the foremost iterator is incremented, the last character of the buffer is automatically erased. Currently there is no way to tell if an iterator is trailing too far behind and has become invalid. No dynamic allocation is done by this policy during normal iterator operation, only on initial construction. The memory usage of this `StoragePolicy` is set at `N+1` bytes, unlike `split_std_deque`, which is unbounded.

Combinations: How to specify your own custom multi_pass

The beauty of policy based designs is that you can mix and match policies to create your own custom iterator by selecting the policies you want. Here's an example of how to specify a custom multi_pass that wraps an std::istream_iterator<char>, and is slightly more efficient than the default multi_pass (as generated by the make_default_multi_pass() API function) because it uses the iterator_policies::first_owner OwnershipPolicy and the iterator_policies::no_check CheckingPolicy:

typedef multi_pass<
    std::istream_iterator<char>
  , iterator_policies::default_policy<
        iterator_policies::first_owner
      , iterator_policies::no_check
      , iterator_policies::buffering_input_iterator
      , iterator_policies::split_std_deque
    >
> first_owner_multi_pass_type;

The default template parameters for iterator_policies::default_policy are:

iterator_policies::ref_counted OwnershipPolicy
iterator_policies::no_check CheckingPolicy, if BOOST_SPIRIT_DEBUG is defined: iterator_policies::buf_id_check CheckingPolicy
iterator_policies::buffering_input_iterator InputPolicy, and
iterator_policies::split_std_deque StoragePolicy.

So if you use multi_pass<std::istream_iterator<char> > you will get those pre-defined behaviors while wrapping an std::istream_iterator<char>.

Dealing with constant look ahead

There is one other pre-defined class called look_ahead. The class look_ahead is another predefine multi_pass iterator type. It has two template parameters: Input, the type of the input iterator to wrap, and a std::size_t N, which specifies the size of the buffer to the fixed_size_queue policy. While the default multi_pass configuration is designed for safety, look_ahead is designed for speed. look_ahead is derived from a multi_pass with the following policies: input_iterator InputPolicy, first_owner OwnershipPolicy, no_check CheckingPolicy, and fixed_size_queue<N> StoragePolicy.

This iterator is defined by including the files:

// forwards to <boost/spirit/home/support/look_ahead.hpp>
#include <boost/spirit/include/support_look_ahead.hpp>

Also, see Include Structure.

Reading from standard input streams

Yet another predefined iterator for wrapping standard input streams (usually a std::basic_istream<>) is called basic_istream_iterator<Char, Traits>. This class is usable as a drop in replacement for std::istream_iterator<Char, Traits>. Its only difference is that it is a forward iterator (instead of the std::istream_iterator, which is an input iterator). basic_istream_iterator is derived from a multi_pass with the following policies: istream InputPolicy, ref_counted OwnershipPolicy, no_check CheckingPolicy, and split_std_deque StoragePolicy.

There exists an additional predefined typedef:

typedef basic_istream_iterator<char, std::char_traits<char> > istream_iterator;

This iterator is defined by including the files:

// forwards to <boost/spirit/home/support/istream_iterator.hpp>
#include <boost/spirit/include/support_istream_iterator.hpp>

Also, see Include Structure.

How to write a functor for use with the `functor_input` InputPolicy

If you want to use the functor_input InputPolicy, you can write your own function object that will supply the input to multi_pass. The function object must satisfy several requirements. It must have a typedef result_type which specifies the return type of its operator(). This is standard practice in the STL. Also, it must supply a static variable called eof which is compared against to know whether the input has reached the end. Last but not least the function object must be default constructible. Here is an example:

#include <iostream>
#include <boost/spirit/home/qi.hpp>
#include <boost/spirit/home/support.hpp>
#include <boost/spirit/home/support/multi_pass.hpp>
#include <boost/spirit/home/support/iterators/detail/functor_input_policy.hpp>

// define the function object
class iterate_a2m
{
public:
    typedef char result_type;

    iterate_a2m() : c_('A') {}
    iterate_a2m(char c) : c_(c) {}

    result_type operator()()
    {
        if (c_ == 'M')
            return eof;
        return c_++;
    }

    static result_type eof;

private:
    char c_;
};

iterate_a2m::result_type iterate_a2m::eof = iterate_a2m::result_type('M');

using namespace boost::spirit;

// create two iterators using the define function object, one of which is 
// an end iterator
typedef multi_pass<iterate_a2m
  , iterator_policies::first_owner
  , iterator_policies::no_check
  , iterator_policies::functor_input
  , iterator_policies::split_std_deque>
functor_multi_pass_type;

int main()
{
    functor_multi_pass_type first = functor_multi_pass_type(iterate_a2m());
    functor_multi_pass_type last;

    // use the iterators: this will print "ABCDEFGHIJKL"
    while (first != last) {
        std::cout << *first;
        ++first;
    }
    std::cout << std::endl;
    return 0;
}

How to write policies for use with multi_pass

All policies to be used with the default_policy template need to have two embedded classes: unique and shared. The unique class needs to implement all required functions for a particular policy type. In addition it may hold all member data items being unique for a particular instance of a multi_pass (hence the name). The shared class does not expose any member functions (except sometimes a constructor), but it may hold all member data items to be shared between all copies of a particular multi_pass.

InputPolicy

An InputPolicy must have the following interface:

struct input_policy
{
    // Input is the same type used as the first template parameter
    // while instantiating the multi_pass
    template <typename Input>
    struct unique
    {
        // these typedef's will be exposed as the multi_pass iterator
        // properties
        typedef __unspecified_type__ value_type;
        typedef __unspecified_type__ difference_type;
        typedef __unspecified_type__ distance_type;
        typedef __unspecified_type__ pointer;
        typedef __unspecified_type__ reference;

        unique() {}
        explicit unique(Input) {}

        // destroy is called whenever the last copy of a multi_pass is
        // destructed (ownership_policy::release() returned true)
        //
        //   mp:    is a reference to the whole multi_pass instance
        template <typename MultiPass>
        static void destroy(MultiPass& mp);

        // swap is called by multi_pass::swap()
        void swap(unique&);

        // get_input is called whenever the next input character/token
        // should be fetched. 
        //
        //   mp:    is a reference to the whole multi_pass instance
        //
        // This method is expected to return a reference to the next 
        // character/token
        template <typename MultiPass>
        static typename MultiPass::reference get_input(MultiPass& mp);

        // advance_input is called whenever the underlying input stream 
        // should be advanced so that the next call to get_input will be 
        // able to return the next input character/token
        //
        //   mp:    is a reference to the whole multi_pass instance
        template <typename MultiPass>
        static void advance_input(MultiPass& mp);

        // input_at_eof is called to test whether this instance is a 
        // end of input iterator.
        //
        //   mp:    is a reference to the whole multi_pass instance
        //
        // This method is expected to return true if the end of input is 
        // reached. It is often used in the implementation of the function
        // storage_policy::is_eof.
        template <typename MultiPass>
        static bool input_at_eof(MultiPass const& mp);

        // input_is_valid is called to verify if the parameter t represents 
        // a valid input character/token
        //
        //   mp:    is a reference to the whole multi_pass instance
        //   t:     is the character/token to test for validity
        // 
        // This method is expected to return true if the parameter t 
        // represents a valid character/token.
        template <typename MultiPass>
        static bool input_is_valid(MultiPass const& mp, value_type const& t);
    };

    // Input is the same type used as the first template parameter passed
    // while instantiating the multi_pass
    template <typename Input>
    struct shared
    {
        explicit shared(Input) {}
    };
};

It is possible to derive the struct unique from the type boost::spirit::detail::default_input_policy. This type implements a minimal sufficient interface for some of the required functions, simplifying the task of writing a new input policy.

This class may implement a function destroy() being called during destruction of the last copy of a multi_pass. This function should be used to free any of the shared data items the policy might have allocated during construction of its shared part. Because of the way multi_pass is implemented any allocated data members in shared should not be deep copied in a copy constructor of shared.

OwnershipPolicy

The OwnershipPolicy must have the following interface:

struct ownership_policy
{
    struct unique
    {
        // destroy is called whenever the last copy of a multi_pass is
        // destructed (ownership_policy::release() returned true)
        //
        //   mp:    is a reference to the whole multi_pass instance
        template <typename MultiPass>
        static void destroy(MultiPass& mp);

        // swap is called by multi_pass::swap()
        void swap(unique&);

        // clone is called whenever a multi_pass is copied
        //
        //   mp:    is a reference to the whole multi_pass instance
        template <typename MultiPass>
        static void clone(MultiPass& mp);

        // release is called whenever a multi_pass is destroyed
        //
        //   mp:    is a reference to the whole multi_pass instance
        //
        // The method is expected to return true if the destructed 
        // instance is the last copy of a particular multi_pass. 
        template <typename MultiPass>
        static bool release(MultiPass& mp);

        // is_unique is called to test whether this instance is the only 
        // existing copy of a particular multi_pass
        //
        //   mp:    is a reference to the whole multi_pass instance
        //
        // The method is expected to return true if this instance is unique
        // (no other copies of this multi_pass exist).
        template <typename MultiPass>
        static bool is_unique(MultiPass const& mp);
    };

    struct shared {};
};

It is possible to derive the struct unique from the type boost::spirit::detail::default_ownership_policy. This type implements a minimal sufficient interface for some of the required functions, simplifying the task of writing a new ownership policy.

CheckingPolicy

The CheckingPolicy must have the following interface:

struct checking_policy
{
    struct unique
    {
        // swap is called by multi_pass::swap()
        void swap(unique&);

        // destroy is called whenever the last copy of a multi_pass is
        // destructed (ownership_policy::release() returned true)
        //
        //   mp:    is a reference to the whole multi_pass instance
        template <typename MultiPass>
        static void destroy(MultiPass& mp);

        // docheck is called before the multi_pass is dereferenced or 
        // incremented. 
        //
        //   mp:    is a reference to the whole multi_pass instance
        //
        // This method is expected to make sure the multi_pass instance is
        // still valid. If it is invalid an exception should be thrown.
        template <typename MultiPass>
        static void docheck(MultiPass const& mp);

        // clear_queue is called whenever the function 
        // multi_pass::clear_queue is called on this instance
        //
        //   mp:    is a reference to the whole multi_pass instance
        template <typename MultiPass>
        static void clear_queue(MultiPass& mp);
    };

    struct shared {};
};

It is possible to derive the struct unique from the type boost::spirit::detail::default_checking_policy. This type implements a minimal sufficient interface for some of the required functions, simplifying the task of writing a new checking policy.

StoragePolicy

A StoragePolicy must have the following interface:

struct storage_policy
{
    // Value is the same type as typename MultiPass::value_type
    template <typename Value>
    struct unique
    {
        // destroy is called whenever the last copy of a multi_pass is
        // destructed (ownership_policy::release() returned true)
        //
        //   mp:    is a reference to the whole multi_pass instance
        template <typename MultiPass>
        static void destroy(MultiPass& mp);

        // swap is called by multi_pass::swap()
        void swap(unique&);

        // dereference is called whenever multi_pass::operator*() is invoked
        //
        //   mp:    is a reference to the whole multi_pass instance
        //
        // This function is expected to return a reference to the current
        // character/token.
        template <typename MultiPass>
        static typename MultiPass::reference dereference(MultiPass const& mp);

        // increment is called whenever multi_pass::operator++ is invoked
        //
        //   mp:    is a reference to the whole multi_pass instance
        template <typename MultiPass>
        static void increment(MultiPass& mp);

        //
        //   mp:    is a reference to the whole multi_pass instance
        template <typename MultiPass>
        static void clear_queue(MultiPass& mp);

        // is_eof is called to test whether this instance is a end of input 
        // iterator.
        //
        //   mp:    is a reference to the whole multi_pass instance
        //
        // This method is expected to return true if the end of input is 
        // reached. 
        template <typename MultiPass>
        static bool is_eof(MultiPass const& mp);

        // less_than is called whenever multi_pass::operator==() is invoked
        //
        //   mp:    is a reference to the whole multi_pass instance
        //   rhs:   is the multi_pass reference this instance is compared 
        //          to
        //
        // This function is expected to return true if the current instance
        // is equal to the right hand side multi_pass instance
        template <typename MultiPass>
        static bool equal_to(MultiPass const& mp, MultiPass const& rhs);

        // less_than is called whenever multi_pass::operator<() is invoked
        //
        //   mp:    is a reference to the whole multi_pass instance
        //   rhs:   is the multi_pass reference this instance is compared 
        //          to
        //
        // This function is expected to return true if the current instance
        // is less than the right hand side multi_pass instance
        template <typename MultiPass>
        static bool less_than(MultiPass const& mp, MultiPass const& rhs);
    };

    // Value is the same type as typename MultiPass::value_type
    template <typename Value>
    struct shared {};
};

It is possible to derive the struct unique from the type boost::spirit::detail::default_storage_policy. This type implements a minimal sufficient interface for some of the required functions, simplifying the task of writing a new storage policy.

Generally, a StoragePolicy is the trickiest policy to implement. You should study and understand the existing StoragePolicy classes before you try and write your own.