...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
This section is not for the faint of heart. In here, are distilled the inner workings of Spirit.Qi parsers, using real code from the Spirit library as examples. On the other hand, here is no reason to fear reading on, though. We tried to explain things step by step while highlighting the important insights.
The
class is the base
class for all parsers.
Parser
template <typename Derived> struct parser { struct parser_id; typedef Derived derived_type; typedef qi::domain domain; // Requirement: p.parse(f, l, context, skip, attr) -> bool // // p: a parser // f, l: first/last iterator pair // context: enclosing rule context (can be unused_type) // skip: skipper (can be unused_type) // attr: attribute (can be unused_type) // Requirement: p.what(context) -> info // // p: a parser // context: enclosing rule context (can be unused_type) // Requirement: P::template attribute<Ctx, Iter>::type // // P: a parser type // Ctx: A context type (can be unused_type) // Iter: An iterator type (can be unused_type) Derived const& derived() const { return *static_cast<Derived const*>(this); } };
The
class does not really
know how to parse anything but instead relies on the template parameter
Parser
Derived
to do the actual
parsing. This technique is known as the "Curiously Recurring Template
Pattern" in template meta-programming circles. This inheritance strategy
gives us the power of polymorphism without the virtual function overhead.
In essence this is a way to implement compile time polymorphism.
The Derived parsers,
, PrimitiveParser
, UnaryParser
and BinaryParser
provide the
necessary facilities for parser detection, introspection, transformation
and visitation.
NaryParser
Derived parsers must support the following:
bool parse(f, l, context, skip, attr)
f
, l
first/last iterator pair
context
enclosing rule context (can be unused_type)
skip
skipper (can be unused_type)
attr
attribute (can be unused_type)
The parse is the main parser entry point. skipper
can be an unused_type
.
It's a type used every where in Spirit
to signify "don't-care". There is an overload for skip
for unused_type
that is
simply a no-op. That way, we do not have to write multiple parse functions
for phrase and character level parsing.
Here are the basic rules for parsing:
true
if successful, false
otherwise.
first
is incremented N number of times, where N is the number of characters
parsed. N can be zero --an empty (epsilon) match.
first
is reset to its position before entering the parser function. attr
is untouched.
void what(context)
context
enclosing rule context (can be unused_type
)
The what function should be obvious. It provides some information about “what” the parser is. It is used as a debugging aid, for example.
P::template attribute<context>::type
P
a parser type
context
A context type (can be unused_type)
The attribute metafunction returns the expected attribute type of the parser. In some cases, this is context dependent.
In this section, we will dissect two parser types:
Parsers
PrimitiveParser
A parser for primitive data (e.g. integer parsing).
UnaryParser
A parser that has single subject (e.g. kleene star).
For our dissection study, we will use a Spirit
primitive, the any_int_parser
in the boost::spirit::qi namespace.
[primitive_parsers_any_int_parser]
The any_int_parser
is derived
from a
,
which in turn derives from PrimitiveParser
<Derived>parser<Derived>
. Therefore, it supports the following
requirements:
parse
member function
what
member function
attribute
metafunction
parse is the main entry point. For primitive parsers, our first thing to do is call:
qi::skip(first, last, skipper);
to do a pre-skip. After pre-skipping, the parser proceeds to do its thing.
The actual parsing code is placed in extract_int<T, Radix, MinDigits, MaxDigits>::call(first, last, attr);
This simple no-frills protocol is one of the reasons why Spirit is fast. If you know the internals of Spirit.Classic and perhaps even wrote some parsers with it, this simple Spirit mechanism is a joy to work with. There are no scanners and all that crap.
The what function just tells us that it is an integer parser. Simple.
The attribute metafunction returns the T template
parameter. We associate the any_int_parser
to some placeholders for short_
,
int_
, long_
and long_long
types. But,
first, we enable these placeholders in namespace boost::spirit:
template <> // enables short_ struct use_terminal<qi::domain, tag::short_> : mpl::true_ {};
template <> // enables int_ struct use_terminal<qi::domain, tag::int_> : mpl::true_ {};
template <> // enables long_ struct use_terminal<qi::domain, tag::long_> : mpl::true_ {};
template <> // enables long_long struct use_terminal<qi::domain, tag::long_long> : mpl::true_ {};
Notice that any_int_parser
is placed in the namespace boost::spirit::qi while these enablers
are in namespace boost::spirit. The reason is that these placeholders are
shared by other Spirit domains.
Spirit.Qi, the parser is one domain. Spirit.Karma,
the generator is another domain. Other parser technologies may be developed
and placed in yet another domain. Yet, all these can potentially share
the same placeholders for interoperability. The interpretation of these
placeholders is domain-specific.
Now that we enabled the placeholders, we have to write generators for them. The make_xxx stuff (in boost::spirit::qi namespace):
template < typename T , unsigned Radix = 10 , unsigned MinDigits = 1 , int MaxDigits = -1> struct make_int { typedef any_int_parser<T, Radix, MinDigits, MaxDigits> result_type; result_type operator()(unused_type, unused_type) const { return result_type(); } };
This one above is our main generator. It's a simple function object with 2 (unused) arguments. These arguments are
no_case[p]
to pass information to inner parser nodes. We'll see how that works
later.
Now:
template <typename Modifiers> struct make_primitive<tag::short_, Modifiers> : make_int<short> {};
template <typename Modifiers> struct make_primitive<tag::int_, Modifiers> : make_int<int> {};
template <typename Modifiers> struct make_primitive<tag::long_, Modifiers> : make_int<long> {};
template <typename Modifiers> struct make_primitive<tag::long_long, Modifiers> : make_int<boost::long_long_type> {};
These, specialize qi:make_primitive
for specific tags. They
all inherit from make_int
which does the actual work.
Let me present the kleene star (also in namespace spirit::qi):
template <typename Subject> struct kleene : unary_parser<kleene<Subject> > { typedef Subject subject_type; template <typename Context, typename Iterator> struct attribute { // Build a std::vector from the subject's attribute. Note // that build_std_vector may return unused_type if the // subject's attribute is an unused_type. typedef typename traits::build_std_vector< typename traits:: attribute_of<Subject, Context, Iterator>::type >::type type; }; kleene(Subject const& subject_) : subject(subject_) {} template <typename F> bool parse_container(F f) const { while (!f (subject)) ; return true; } template <typename Iterator, typename Context , typename Skipper, typename Attribute> bool parse(Iterator& first, Iterator const& last , Context& context, Skipper const& skipper , Attribute& attr_) const { // ensure the attribute is actually a container type traits::make_container(attr_); typedef detail::fail_function<Iterator, Context, Skipper> fail_function; Iterator iter = first; fail_function f(iter, last, context, skipper); parse_container(detail::make_pass_container(f, attr_)); first = f.first; return true; } template <typename Context> info what(Context& context) const { return info("kleene", subject.what(context)); } Subject subject; };
Looks similar in form to its primitive cousin, the int_parser
.
And, again, it has the same basic ingredients required by Derived
.
kleene is a composite parser. It is a parser that composes another parser,
its “subject”. It is a
and subclasses
from it. Like UnaryParser
, PrimitiveParser
derives from UnaryParser
<Derived>parser<Derived>
.
unary_parser<Derived>, has these expression requirements on Derived:
UnaryParser
parser.)
UnaryParser
type.)
parse is the main parser entry point. Since this is
not a primitive parser, we do not need to call qi::skip(first, last, skipper)
. The subject, if
it is a primitive, will do the pre-skip. If if it is another composite
parser, it will eventually call a primitive parser somewhere down the line
which will do the pre-skip. This makes it a lot more efficient than Spirit.Classic.
Spirit.Classic
puts the skipping business into the so-called "scanner" which
blindly attempts a pre-skip every time we increment the iterator.
What is the attribute of the kleene? In general, it
is a std::vector<T>
where T
is the attribute
of the subject. There is a special case though. If T
is an unused_type
, then
the attribute of kleene is also unused_type
.
traits::build_std_vector
takes care of that minor
detail.
So, let's parse. First, we need to provide a local attribute of for the subject:
typename traits::attribute_of<Subject, Context>::type val;
traits::attribute_of<Subject, Context>
simply calls the subject's struct
attribute<Context>
nested metafunction.
val starts out default initialized. This val is the one we'll pass to the subject's parse function.
The kleene repeats indefinitely while the subject parser is successful.
On each successful parse, we push_back
the parsed attribute to the kleene's attribute, which is expected to be,
at the very least, compatible with a std::vector
.
In other words, although we say that we want our attribute to be a std::vector
, we try to be more lenient than
that. The caller of kleene's parse may pass a different attribute type.
For as long as it is also a conforming STL container with push_back
, we are ok. Here is the kleene
loop:
while (subject.parse(first, last, context, skipper, val)) { // push the parsed value into our attribute traits::push_back(attr, val); traits::clear(val); } return true;
Take note that we didn't call attr.push_back(val). Instead, we called a Spirit provided function:
traits::push_back(attr, val);
This is a recurring pattern. The reason why we do it this way is because
attr can be unused_type
.
traits::push_back
takes care of that detail.
The overload for unused_type is a no-op. Now, you can imagine why Spirit is fast! The parsers are so
simple and the generated code is as efficient as a hand rolled loop. All
these parser compositions and recursive parse invocations are extensively
inlined by a modern C++ compiler. In the end, you get a tight loop when
you use the kleene. No more excess baggage. If the attribute is unused,
then there is no code generated for that. That's how Spirit
is designed.
The what function simply wraps the output of the subject in a "kleene“... "”".
Ok, now, like the int_parser
,
we have to hook our parser to the qi
engine. Here's how we do it:
First, we enable the prefix star operator. In proto, it's called the "dereference":
template <> struct use_operator<qi::domain, proto::tag::dereference> // enables *p : mpl::true_ {};
This is done in namespace boost::spirit
like its friend, the use_terminal
specialization for our int_parser
.
Obviously, we use use_operator to enable the dereference
for the qi::domain.
Then, we need to write our generator (in namespace qi):
template <typename Elements, typename Modifiers> struct make_composite<proto::tag::dereference, Elements, Modifiers> : make_unary_composite<Elements, kleene> {};
This essentially says; for all expressions of the form: *p
, to build a kleene parser. Elements
is a Boost.Fusion
sequence. For the kleene, which is a unary operator, expect only one element
in the sequence. That element is the subject of the kleene.
We still don't care about the Modifiers. We'll see how the modifiers is all about when we get to deep directives.