...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
Backtracking in Spirit.Qi requires the use of the following
types of iterator: forward, bidirectional, or random access. Because of backtracking,
input iterators cannot be used. Therefore, the standard library classes
std::istreambuf_iterator
and std::istream_iterator
,
that fall under the category of input iterators, cannot be used. Another
input iterator that is of interest is one that wraps a lexer, such as LEX.
Note | |
---|---|
In general, Spirit.Qi generates recursive descent
parser which require backtracking parsers by design. For this reason we
need to provide at least forward iterators to any of Spirit.Qi's
API functions. This is not an absolute requirement though. In the future,
we shall see more deterministic parsers that require no more than 1 character
(token) of lookahead. Such parsers allow us to use input iterators such
as the |
Backtracking can be implemented only if we are allowed to save an iterator
position, i.e. making a copy of the current iterator. Unfortunately, with
an input iterator, there is no way to do so, and thus input iterators will
not work with backtracking in Spirit.Qi. One solution
to this problem is to simply load all the data to be parsed into a container,
such as a vector or deque, and then pass the begin and end of the container
to Spirit.Qi. This method can be too memory intensive
for certain applications, which is why the multi_pass
iterator was created.
The multi_pass
iterator will
convert any input iterator into a forward iterator suitable for use with
Spirit.Qi. multi_pass
will buffer data when needed and will discard the buffer when its contents
is not needed anymore. This happens either if only one copy of the iterator
exists or if no backtracking can occur.
A grammar must be designed with care if the multi_pass
iterator is used. Any rule that may need to backtrack, such as one that contains
an alternative, will cause data to be buffered. The rules that are optimal
to use are repetition constructs (as kleene and plus).
Sequences of the form a >> b
will buffer data as well. This is different from the behavior of Spirit.Classic
but for a good reason. Sequences need to reset the current iterator to its
initial state if one of the components of a sequence fails to match. To compensate
for this behavior we added functionality to the expect
parsers (i.e. constructs like a
> b
).
Expectation points introduce deterministic points into the grammar ensuring
no backtracking can occur if they match. For this reason we clear the buffers
of any multi_pass iterator on each expectation point, ensuring minimal buffer
content even for large grammars.
Important | |
---|---|
If you use an error handler in conjunction with the rule r<iterator_type> r; on_error<retry>(r, std::cout << phoenix::val("Error!")); If you fail to do so the resulting code will trigger an assert statement at runtime. |
Any rule that repeats, such as kleene_star (*a
) or positive such as (+a
), will only buffer the data for the current
repetition.
In typical grammars, ambiguity and therefore lookahead is often localized.
In fact, many well designed languages are fully deterministic and require
no lookahead at all. Peeking at the first character from the input will immediately
determine the alternative branch to take. Yet, even with highly ambiguous
grammars, alternatives are often of the form *(a | b
| c | d)
.
The input iterator moves on and is never stuck at the beginning. Let's look
at a Pascal snippet for example:
program = programHeading >> block >> '.' ; block = *( labelDeclarationPart | constantDefinitionPart | typeDefinitionPart | variableDeclarationPart | procedureAndFunctionDeclarationPart ) >> statementPart ;
Notice the alternatives inside the Kleene star in the rule block . The rule gobbles the input in a linear manner and throws away the past history with each iteration. As this is fully deterministic LL(1) grammar, each failed alternative only has to peek 1 character (token). The alternative that consumes more than 1 character (token) is definitely a winner. After which, the Kleene star moves on to the next.
Now, after the lecture on the features to be careful with when using multi_pass
, you may think that multi_pass
is way too restrictive to use.
That's not the case. If your grammar is deterministic, you can make use of
the flush_multi_pass
pseudo
parser in your grammar to ensure that data is not buffered when unnecessary
(flush_multi_pass
is available
from the Spirit.Qi parser Repository).
Here we present a minimal example showing a minimal use case. The multi_pass
iterator is highly configurable,
but the default policies have been chosen so that its easily usable with
input iterators such as std::istreambuf_iterator
.
For the complete source code of this example please refer to multi_pass.cpp.
int main() { namespace spirit = boost::spirit; using spirit::ascii::space; using spirit::ascii::char_; using spirit::qi::double_; using spirit::qi::eol; std::ifstream in("multi_pass.txt"); // we get our input from this file if (!in.is_open()) { std::cout << "Could not open input file: 'multi_pass.txt'" << std::endl; return -1; } typedef std::istreambuf_iterator<char> base_iterator_type; spirit::multi_pass<base_iterator_type> first = spirit::make_default_multi_pass(base_iterator_type(in)); std::vector<double> v; bool result = spirit::qi::phrase_parse(first , spirit::make_default_multi_pass(base_iterator_type()) , double_ >> *(',' >> double_) // recognize list of doubles , space | '#' >> *(char_ - eol) >> eol // comment skipper , v); // data read from file if (!result) { std::cout << "Failed parsing input file!" << std::endl; return -2; } std::cout << "Successfully parsed input file!" << std::endl; return 0; }
The Spirit Repository
contains the flush_multi_pass
parser component. This is usable in conjunction with the multi_pass
iterator to minimize the buffering. It allows to insert explicit synchronization
points into your grammar where it is safe to clear any stored input as it
is ensured that no backtracking can occur at this point anymore.
When the flush_multi_pass
parser is used with multi_pass
,
it will call multi_pass::clear_queue()
.
This will cause any buffered data to be erased. This also will invalidate
all other copies of multi_pass and they should not be used. If they are,
an boost::illegal_backtracking
exception will be
thrown.
The multi_pass
iterator is
a templated class configurable using policies. The description of multi_pass
above is how it was originally
implemented (before it used policies), and is the default configuration now.
But, multi_pass
is capable
of much more. Because of the open-ended nature of policies, you can write
your own policy to make multi_pass
behave in a way that we never before imagined.
The multi_pass class has two template parameters:
The multi_pass template parameters
The type multi_pass uses to acquire it's input. This is typically an input iterator, or functor.
The combined policies to use to create an instance of a multi_pass iterator. This combined policy type is described below
It is possible to implement all of the required functionality of the combined
policy in a single class. But it has shown to be more convenient to split
this into four different groups of functions, i.e. four separate, but well
coordinated policies. For this reason the multi_pass
library implements a template iterator_policies::default_policy
allowing to combine several different policies, each implementing one of
the functionality groups:
Table 12. Policies needed for default_policy template
Template Parameter |
Description |
---|---|
|
This policy determines how |
|
This policy determines how checking for invalid iterators is done. |
|
A class that defines how |
|
The buffering scheme used by |
The multi_pass
library contains
several predefined policy implementations for each of the policy types as
described above. First we will describe those predefined types. Afterwards
we will give some guidelines how you can write your own policy implementations.
All predefined multi_pass
policies are defined in the namespace boost::spirit::iterator_policies
.
Table 13. Predefined policy classes
Class name |
Description |
---|---|
InputPolicy classes |
|
|
This policy directs |
|
This policy directs |
|
This policy directs |
|
This policy obtains it's input by calling yylex(), which would typically be provided by a scanner generated by Flex. If you use this policy your code must link against a Flex generated scanner. |
|
This input policy obtains it's data by calling a functor of type
|
|
This is essentially the same as the |
OwnershipPolicy classes |
|
|
This class uses a reference counting scheme. The |
|
When this policy is used, the first |
CheckingPolicy classes |
|
|
This policy does no checking at all. |
|
This policy keeps around a buffer id, or a buffer age. Every time
|
full_check |
This policy has not been implemented yet. When it is, it will keep track of all iterators and make sure that they are all valid. This will be mostly useful for debugging purposes as it will incur significant overhead. |
StoragePolicy classes |
|
|
Despite its name this policy keeps all buffered data in a |
|
This policy keeps a circular buffer that is size |
The beauty of policy based designs is that you can mix and match policies
to create your own custom iterator by selecting the policies you want. Here's
an example of how to specify a custom multi_pass
that wraps an std::istream_iterator<char>
,
and is slightly more efficient than the default multi_pass
(as generated by the make_default_multi_pass()
API function) because it uses the iterator_policies::first_owner
OwnershipPolicy and the iterator_policies::no_check
CheckingPolicy:
typedef multi_pass< std::istream_iterator<char> , iterator_policies::default_policy< iterator_policies::first_owner , iterator_policies::no_check , iterator_policies::buffering_input_iterator , iterator_policies::split_std_deque > > first_owner_multi_pass_type;
The default template parameters for iterator_policies::default_policy
are:
iterator_policies::ref_counted
OwnershipPolicy
iterator_policies::no_check
CheckingPolicy, if BOOST_SPIRIT_DEBUG
is defined: iterator_policies::buf_id_check
CheckingPolicy
iterator_policies::buffering_input_iterator
InputPolicy,
and
iterator_policies::split_std_deque
StoragePolicy.
So if you use multi_pass<std::istream_iterator<char>
>
you will get those pre-defined
behaviors while wrapping an std::istream_iterator<char>
.
There is one other pre-defined class called look_ahead
.
The class look_ahead
is another
predefine multi_pass
iterator
type. It has two template parameters: Input
,
the type of the input iterator to wrap, and a std::size_t N
, which specifies the size of the buffer
to the fixed_size_queue
policy.
While the default multi_pass configuration is designed for safety, look_ahead
is designed for speed. look_ahead
is derived from a multi_pass
with the following policies: input_iterator
InputPolicy, first_owner
OwnershipPolicy, no_check
CheckingPolicy, and fixed_size_queue<N>
StoragePolicy.
This iterator is defined by including the files:
// forwards to <boost/spirit/home/support/look_ahead.hpp> #include <boost/spirit/include/support_look_ahead.hpp>
Also, see Include Structure.
Yet another predefined iterator for wrapping standard input streams (usually
a std::basic_istream<>
)
is called basic_istream_iterator<Char, Traits>
. This class is usable as a drop in replacement
for std::istream_iterator<Char, Traits>
.
Its only difference is that it is a forward iterator (instead of the std::istream_iterator
,
which is an input iterator). basic_istream_iterator
is derived from a multi_pass with the following policies: istream
InputPolicy, ref_counted
OwnershipPolicy, no_check
CheckingPolicy, and split_std_deque
StoragePolicy.
There exists an additional predefined typedef:
typedef basic_istream_iterator<char, std::char_traits<char> > istream_iterator;
This iterator is defined by including the files:
// forwards to <boost/spirit/home/support/istream_iterator.hpp> #include <boost/spirit/include/support_istream_iterator.hpp>
Also, see Include Structure.
functor_input
InputPolicy
If you want to use the functor_input
InputPolicy, you can write your own function object that will supply the
input to multi_pass
. The
function object must satisfy several requirements. It must have a typedef
result_type
which specifies
the return type of its operator()
. This is standard practice in the STL.
Also, it must supply a static variable called eof which is compared against
to know whether the input has reached the end. Last but not least the function
object must be default constructible. Here is an example:
#include <iostream> #include <boost/spirit/home/qi.hpp> #include <boost/spirit/home/support.hpp> #include <boost/spirit/home/support/multi_pass.hpp> #include <boost/spirit/home/support/iterators/detail/functor_input_policy.hpp> // define the function object class iterate_a2m { public: typedef char result_type; iterate_a2m() : c_('A') {} iterate_a2m(char c) : c_(c) {} result_type operator()() { if (c_ == 'M') return eof; return c_++; } static result_type eof; private: char c_; }; iterate_a2m::result_type iterate_a2m::eof = iterate_a2m::result_type('M'); using namespace boost::spirit; // create two iterators using the define function object, one of which is // an end iterator typedef multi_pass<iterate_a2m , iterator_policies::first_owner , iterator_policies::no_check , iterator_policies::functor_input , iterator_policies::split_std_deque> functor_multi_pass_type; int main() { functor_multi_pass_type first = functor_multi_pass_type(iterate_a2m()); functor_multi_pass_type last; // use the iterators: this will print "ABCDEFGHIJKL" while (first != last) { std::cout << *first; ++first; } std::cout << std::endl; return 0; }
All policies to be used with the default_policy
template need to have two embedded classes: unique
and shared
. The unique
class needs to implement all required
functions for a particular policy type. In addition it may hold all member
data items being unique for a particular instance of
a multi_pass
(hence the name).
The shared
class does not
expose any member functions (except sometimes a constructor), but it may
hold all member data items to be shared between all
copies of a particular multi_pass
.
An InputPolicy
must have
the following interface:
struct input_policy { // Input is the same type used as the first template parameter // while instantiating the multi_pass template <typename Input> struct unique { // these typedef's will be exposed as the multi_pass iterator // properties typedef __unspecified_type__ value_type; typedef __unspecified_type__ difference_type; typedef __unspecified_type__ distance_type; typedef __unspecified_type__ pointer; typedef __unspecified_type__ reference; unique() {} explicit unique(Input) {} // destroy is called whenever the last copy of a multi_pass is // destructed (ownership_policy::release() returned true) // // mp: is a reference to the whole multi_pass instance template <typename MultiPass> static void destroy(MultiPass& mp); // swap is called by multi_pass::swap() void swap(unique&); // get_input is called whenever the next input character/token // should be fetched. // // mp: is a reference to the whole multi_pass instance // // This method is expected to return a reference to the next // character/token template <typename MultiPass> static typename MultiPass::reference get_input(MultiPass& mp); // advance_input is called whenever the underlying input stream // should be advanced so that the next call to get_input will be // able to return the next input character/token // // mp: is a reference to the whole multi_pass instance template <typename MultiPass> static void advance_input(MultiPass& mp); // input_at_eof is called to test whether this instance is a // end of input iterator. // // mp: is a reference to the whole multi_pass instance // // This method is expected to return true if the end of input is // reached. It is often used in the implementation of the function // storage_policy::is_eof. template <typename MultiPass> static bool input_at_eof(MultiPass const& mp); // input_is_valid is called to verify if the parameter t represents // a valid input character/token // // mp: is a reference to the whole multi_pass instance // t: is the character/token to test for validity // // This method is expected to return true if the parameter t // represents a valid character/token. template <typename MultiPass> static bool input_is_valid(MultiPass const& mp, value_type const& t); }; // Input is the same type used as the first template parameter passed // while instantiating the multi_pass template <typename Input> struct shared { explicit shared(Input) {} }; };
It is possible to derive the struct unique
from the type boost::spirit::detail::default_input_policy
. This type implements
a minimal sufficient interface for some of the required functions, simplifying
the task of writing a new input policy.
This class may implement a function destroy()
being called during destruction of the
last copy of a multi_pass
.
This function should be used to free any of the shared data items the policy
might have allocated during construction of its shared
part. Because of the way multi_pass
is implemented any allocated data members in shared
should not be deep copied in a copy
constructor of shared
.
The OwnershipPolicy
must
have the following interface:
struct ownership_policy { struct unique { // destroy is called whenever the last copy of a multi_pass is // destructed (ownership_policy::release() returned true) // // mp: is a reference to the whole multi_pass instance template <typename MultiPass> static void destroy(MultiPass& mp); // swap is called by multi_pass::swap() void swap(unique&); // clone is called whenever a multi_pass is copied // // mp: is a reference to the whole multi_pass instance template <typename MultiPass> static void clone(MultiPass& mp); // release is called whenever a multi_pass is destroyed // // mp: is a reference to the whole multi_pass instance // // The method is expected to return true if the destructed // instance is the last copy of a particular multi_pass. template <typename MultiPass> static bool release(MultiPass& mp); // is_unique is called to test whether this instance is the only // existing copy of a particular multi_pass // // mp: is a reference to the whole multi_pass instance // // The method is expected to return true if this instance is unique // (no other copies of this multi_pass exist). template <typename MultiPass> static bool is_unique(MultiPass const& mp); }; struct shared {}; };
It is possible to derive the struct unique
from the type boost::spirit::detail::default_ownership_policy
. This type implements
a minimal sufficient interface for some of the required functions, simplifying
the task of writing a new ownership policy.
This class may implement a function destroy()
being called during destruction of the
last copy of a multi_pass
.
This function should be used to free any of the shared data items the policy
might have allocated during construction of its shared
part. Because of the way multi_pass
is implemented any allocated data members in shared
should not be deep copied in a copy
constructor of shared
.
The CheckingPolicy
must have
the following interface:
struct checking_policy { struct unique { // swap is called by multi_pass::swap() void swap(unique&); // destroy is called whenever the last copy of a multi_pass is // destructed (ownership_policy::release() returned true) // // mp: is a reference to the whole multi_pass instance template <typename MultiPass> static void destroy(MultiPass& mp); // docheck is called before the multi_pass is dereferenced or // incremented. // // mp: is a reference to the whole multi_pass instance // // This method is expected to make sure the multi_pass instance is // still valid. If it is invalid an exception should be thrown. template <typename MultiPass> static void docheck(MultiPass const& mp); // clear_queue is called whenever the function // multi_pass::clear_queue is called on this instance // // mp: is a reference to the whole multi_pass instance template <typename MultiPass> static void clear_queue(MultiPass& mp); }; struct shared {}; };
It is possible to derive the struct unique
from the type boost::spirit::detail::default_checking_policy
. This type implements
a minimal sufficient interface for some of the required functions, simplifying
the task of writing a new checking policy.
This class may implement a function destroy()
being called during destruction of the
last copy of a multi_pass
.
This function should be used to free any of the shared data items the policy
might have allocated during construction of its shared
part. Because of the way multi_pass
is implemented any allocated data members in shared
should not be deep copied in a copy
constructor of shared
.
A StoragePolicy
must have
the following interface:
struct storage_policy { // Value is the same type as typename MultiPass::value_type template <typename Value> struct unique { // destroy is called whenever the last copy of a multi_pass is // destructed (ownership_policy::release() returned true) // // mp: is a reference to the whole multi_pass instance template <typename MultiPass> static void destroy(MultiPass& mp); // swap is called by multi_pass::swap() void swap(unique&); // dereference is called whenever multi_pass::operator*() is invoked // // mp: is a reference to the whole multi_pass instance // // This function is expected to return a reference to the current // character/token. template <typename MultiPass> static typename MultiPass::reference dereference(MultiPass const& mp); // increment is called whenever multi_pass::operator++ is invoked // // mp: is a reference to the whole multi_pass instance template <typename MultiPass> static void increment(MultiPass& mp); // // mp: is a reference to the whole multi_pass instance template <typename MultiPass> static void clear_queue(MultiPass& mp); // is_eof is called to test whether this instance is a end of input // iterator. // // mp: is a reference to the whole multi_pass instance // // This method is expected to return true if the end of input is // reached. template <typename MultiPass> static bool is_eof(MultiPass const& mp); // less_than is called whenever multi_pass::operator==() is invoked // // mp: is a reference to the whole multi_pass instance // rhs: is the multi_pass reference this instance is compared // to // // This function is expected to return true if the current instance // is equal to the right hand side multi_pass instance template <typename MultiPass> static bool equal_to(MultiPass const& mp, MultiPass const& rhs); // less_than is called whenever multi_pass::operator<() is invoked // // mp: is a reference to the whole multi_pass instance // rhs: is the multi_pass reference this instance is compared // to // // This function is expected to return true if the current instance // is less than the right hand side multi_pass instance template <typename MultiPass> static bool less_than(MultiPass const& mp, MultiPass const& rhs); }; // Value is the same type as typename MultiPass::value_type template <typename Value> struct shared {}; };
It is possible to derive the struct unique
from the type boost::spirit::detail::default_storage_policy
. This type implements
a minimal sufficient interface for some of the required functions, simplifying
the task of writing a new storage policy.
This class may implement a function destroy()
being called during destruction of the
last copy of a multi_pass
.
This function should be used to free any of the shared data items the policy
might have allocated during construction of its shared
part. Because of the way multi_pass
is implemented any allocated data members in shared
should not be deep copied in a copy
constructor of shared
.
Generally, a StoragePolicy
is the trickiest policy to implement. You should study and understand the
existing StoragePolicy
classes
before you try and write your own.