...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
utree
is a dynamically-typed
hierarchical data structure that can represent abstract syntax trees. It's
well integrated with Spirit.Qi and Spirit.Karma.
utree
can be passed as an
attribute to almost any grammar. utree
's
type system is implemented through the use of a discriminated union and type
punning.
utree
has a minimal memory
footprint. The data structure size is 16 bytes on a 32-bit platform, and
32 bytes on 64-bit a platform (4*sizeof(void*)
). Being
a container of itself, it can represent tree structures.
Each instance of an utree
data structure can store exactly one of the following data types at a time:
struct utree_type { enum info { invalid_type, // the utree has not been initialized (it's // default constructed) nil_type, // nil is the sentinel (empty) utree type. list_type, // A doubly linked list of utrees. range_type, // A range of list::iterators. reference_type, // A reference to another utree. any_type, // A pointer or reference to any C++ type. function_type, // A utree holding a stored_function<F> object, // where F is an unary function object taking a // utree as it's parameter and returning a // utree. // numeric atoms bool_type, // An utree holding a boolean value int_type, // An utree holding a integer (int) value double_type, // An utree holding a floating point (double) value // text atoms (utf8) string_type, // An UTF-8 string string_range_type, // A pair of iterators into an UTF-8 string symbol_type, // An UTF-8 symbol name binary_type // Arbitrary binary data }; typedef boost::uint_t<sizeof(info)*8>::exact exact_integral_type; typedef boost::uint_t<sizeof(info)*8>::fast fast_integral_type; };
The UTF-8 string, UTF-8 symbol, and binary data types are internally stored
either directly as the node data (small string optimization applied), or
they are allocated from the heap, storing the pointer to the allocated data
in the utree
. The maximum
possible length of the data to be stored in the node data depends on the
platform the utree
is compiled
for. It is 14 bytes for a 32-bit platform and 30 bytes for a 64-bit platform.
The utree
data structure
is very versatile and can be used as an attribute for all possible Spirit.Qi
parsers and Spirit.Karma generators. For this reason,
it exposes a set of typedef's making it compatible with STL containers:
typedef utree value_type; typedef utree& reference; typedef utree const& const_reference; typedef std::ptrdiff_t difference_type; typedef std::size_t size_type; typedef detail::list::node_iterator<utree> iterator; typedef detail::list::node_iterator<utree const> const_iterator;
The utree
data type exposes
the functional interface of a bidirectional STL container. The iterators
returned from begin()
et.al. conform to the Standard requirements of a bidirectional iterator.
// STL Container interface // insertion template <class T> void push_back(T const&); template <class T> void push_front(T const&); template <class T> iterator insert(iterator, T const&); template <class T> void insert(iterator, std::size_t, T const&); template <class Iterator> void insert(iterator, Iterator, Iterator); // erasure void pop_front(); void pop_back(); iterator erase(iterator); iterator erase(iterator, iterator); // front access reference front(); const_reference front() const; iterator begin(); const_iterator begin() const; ref_iterator ref_begin(); // back access reference back(); const_reference back() const; iterator end(); const_iterator end() const; ref_iterator ref_end();
The exposed container interface makes the utree
usable with all Spirit.Qi parser and Spirit.Karma
generator components, which are compatible with an STL container attribute
type.
A utree
can be constructed
or initialized from a wide range of data types, allowing to create utree
instances for every possible node
type (see the description of utree_type::info
above). For this reason it exposes a constructor and an assignment operator
for each of the allowed node types as shown below. All constructors are non-explicit
on purpose, allowing to use an utree instance as the attribute to almost
any Qi parser.
// This constructs an `invalid_type` node. When used in places // where a boost::optional is expected (i.e. as an attribute for the // optional component), this represents the 'empty' state. utree(invalid_type = invalid_type()); // This initializes a `nil_type` node, which represents a valid, // 'initialized empty' utree (different from invalid_type!). utree(nil_type); reference operator=(nil_type); // This initializes a `boolean_type` node, which can hold 'true' or // 'false' only. explicit utree(bool); reference operator=(bool); // This initializes an `integer_type` node, which can hold arbitrary // integers. For convenience these functions are overloaded for signed // and unsigned integer types. utree(unsigned int); utree(int); reference operator=(unsigned int); reference operator=(int); // This initializes a `double_type` node, which can hold arbitrary // floating point (double) values. utree(double); reference operator=(double); // This initializes a `string_type` node, which can hold a narrow // character sequence (usually an UTF-8 string). utree(char); utree(char const*); utree(char const*, std::size_t); utree(std::string const&); reference operator=(char); reference operator=(char const*); reference operator=(std::string const&); // This constructs a `string_range_type` node, which does not copy the // data but stores the iterator range to the character sequence the // range has been initialized from. utree(utf8_string_range_type const&, shallow_tag); // This initializes a `reference_type` node, which holds a reference to // another utree node. All operations on such a node are automatically // forwarded to the referenced utree instance. utree(boost::reference_wrapper<utree>); reference operator=(boost::reference_wrapper<utree>); // This initializes an `any_type` node, which can hold a pointer to an // instance of any type together with the typeid of that type. When // accessing that pointer the typeid will be checked, causing a // std::bad_cast to be thrown if the typeids do not match. utree(any_ptr const&); reference operator=(any_ptr const&); // This initializes a `range_type` node, which holds an utree list node // the elements of which are copy constructed (assigned) from the // elements referenced by the given range of iterators. template <class Iterator> utree(boost::iterator_range<Iterator>); template <class Iterator> reference operator=(boost::iterator_range<Iterator>); // This initializes a `function_type` node from a polymorphic function // object pointer (takes ownership) or reference. utree(function_base const&); reference operator=(function_base const&); utree(function_base*); reference operator=(function_base*); // This initializes either a `string_type`, a `symbol_type`, or a // `binary_type` node (depending on the template parameter `type_`), // which will hold the corresponding narrow character sequence (usually // an UTF-8 string). template <class Base, utree_type::info type_> utree(basic_string<Base, type_> const&); template <class Base, utree_type::info type_> reference operator=(basic_string<Base, type_> const&);
The utree
data type exposes
the functional interface compatible to Boost.Variant
as well. Its very nature is to hold different data types, one at each point
in time, making it functionally very similar to Boost.Variant.
// return the data type (`utree_type::info`) of the currently stored // data item utree_type::info which() const; // access the currently stored data in a type safe manner, this will // throw a `std::bad_cast()` if the currently stored data item is not // default convertible to `T`. template <class T> T get() const;
The exposed variant-like interface makes the utree
usable with all Spirit.Qi parser and Spirit.Karma
generator components, which are compatible with an Boost.Variant
attribute type.
The utree
string types described
below are used by the utree
API only. These are not used to store information in the utree
itself. Their purpose is to refer to different internal utree
node types only. For instance, creating a utree
from a binary data type will create a binary_type
utree node (see above).
The binary data type can be represented either verbatim as a sequence of
bytes or as a pair of iterators into some other stored binary data sequence.
Use this string type to access/create a binary_type
utree
.
typedef basic_string< boost::iterator_range<char const*>, utree_type::binary_type > binary_range_type; typedef basic_string< std::string, utree_type::binary_type > binary_string_type;
The UTF-8 string can be represented either verbatim as a sequence of characters
or as a pair of iterators into some other stored binary data sequence. Use
this string type to access/create a string_type
utree
.
typedef basic_string< boost::iterator_range<char const*>, utree_type::string_type > utf8_string_range_type; typedef basic_string< std::string, utree_type::string_type > utf8_string_type;
The UTF-8 symbol can be represented either verbatim as a sequence of characters
or as a pair of iterators into some other stored binary data sequence. Use
this string type to access/create a symbol_type
utree
.
typedef basic_string< boost::iterator_range<char const*>, utree_type::symbol_type > utf8_symbol_range_type; typedef basic_string< std::string, utree_type::symbol_type > utf8_symbol_type;
The stored_function template class can to store a unary function objects with a signature of utree(scope const&) as a utree node.
struct function_base { virtual ~function_base() {} virtual utree operator()(utree const& env) const = 0; virtual utree operator()(utree& env) const = 0; // Calling f.clone() must return a newly allocated function_base // instance that is equal to f. virtual function_base* clone() const = 0; }; template <typename F> struct stored_function : function_base { F f; stored_function(F f = F()); virtual ~stored_function(); virtual utree operator()(utree const& env) const; virtual utree operator()(utree& env) const; virtual function_base* clone() const; }; template <typename F> struct referenced_function : function_base { F& f; referenced_function(F& f); virtual ~referenced_function(); virtual utree operator()(utree const& env) const; virtual utree operator()(utree& env) const; virtual function_base* clone() const; };
All exceptions thrown by utree are derived from utree_exception.
struct utree_exception : std::exception {};
The bad_type_exception
is
thrown whenever somebody calls a member function, which applies to certain
stored utree_type's only, but this precondition is violated as the utree
instance holds some other type.
struct bad_type_exception /*: utree_exception*/;
The empty_exception
is thrown
whenever a precondition of a list or range utree method is violated due to
the list or range being empty.
struct empty_exception /*: utree_exception*/;
Our first example demonstrates how to use utree
to write a parser for symbolic
expressions. While utree
is capable of representing just about any AST, utree
's
design is based on the simple yet powerful nature of symbolic expressions.
This example introduces a number of basic and intermediate utree
development techniques: using Spirit.Qi
and Spirit.Karma integration, tracking source code locations
and taking advantage of UTF8 support.
The source for this example can be found here: ../../example/support/utree.
namespace sexpr { template <typename Iterator, typename ErrorHandler = error_handler<Iterator> > struct parser : qi::grammar<Iterator, utree(), whitespace<Iterator> > { qi::rule<Iterator, utree(), whitespace<Iterator> > start, element, list; qi::rule<Iterator, utree()> atom; qi::rule<Iterator, int()> integer; qi::rule<Iterator, utf8_symbol_type()> symbol; qi::rule<Iterator, utree::nil_type()> nil_; qi::rule<Iterator, binary_string_type()> binary; utf8::parser<Iterator> string; px::function<ErrorHandler> const error; tagger<Iterator, save_line_pos> pos; parser(std::string const& source_file = "<string>"): parser::base_type(start), error(ErrorHandler(source_file)) { using standard::char_; using qi::unused_type; using qi::lexeme; using qi::hex; using qi::oct; using qi::no_case; using qi::real_parser; using qi::strict_real_policies; using qi::uint_parser; using qi::bool_parser; using qi::on_error; using qi::fail; using qi::int_; using qi::lit; using qi::_val; using qi::_1; using qi::_2; using qi::_3; using qi::_4; real_parser<double, strict_real_policies<double> > strict_double; uint_parser<unsigned char, 16, 2, 2> hex2; bool_parser<bool, sexpr::bool_input_policies> boolean; start = element.alias(); element = atom | list; list = pos(_val, '(') > *element > ')'; atom = nil_ | strict_double | integer | boolean | string | symbol | binary; nil_ = qi::attr_cast(lit("nil")); integer = lexeme[ no_case["#x"] > hex] | lexeme[ no_case["#o"] >> oct] | lexeme[-no_case["#d"] >> int_]; std::string exclude = std::string(" ();\"\x01-\x1f\x7f") + '\0'; symbol = lexeme[+(~char_(exclude))]; binary = lexeme['#' > *hex2 > '#']; start.name("sexpr"); element.name("element"); list.name("list"); atom.name("atom"); nil_.name("nil"); integer.name("integer"); symbol.name("symbol"); binary.name("binary"); on_error<fail>(start, error(_1, _2, _3, _4)); } }; } // sexpr