Boost C++ Libraries

...one of the most highly regarded and expertly designed C++ library projects in the world. Herb Sutter and Andrei Alexandrescu, C++ Coding Standards

libs/spirit/doc/qi/char.qbk

[/==============================================================================
    Copyright (C) 2001-2011 Joel de Guzman
    Copyright (C) 2001-2011 Hartmut Kaiser

    Distributed under the Boost Software License, Version 1.0. (See accompanying
    file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)
===============================================================================/]
[section:char Character Parsers]

This module includes parsers for single characters. Currently, this
module includes literal chars (e.g. `'x'`, `L'x'`), `char_` (single
characters, ranges and character sets) and the encoding specific
character classifiers (`alnum`, `alpha`, `digit`, `xdigit`, etc.).

[heading Module Header]

    // forwards to <boost/spirit/home/qi/char.hpp>
    #include <boost/spirit/include/qi_char.hpp>

Also, see __include_structure__.

[/------------------------------------------------------------------------------]
[section:char Character Parser (`char_`, `lit`)]

[heading Description]

The `char_` parser matches single characters. The `char_` parser has an
associated __char_encoding_namespace__. This is needed when doing basic
operations such as inhibiting case sensitivity and dealing with
character ranges.

There are various forms of `char_`. 

[heading char_]

The no argument form of `char_` matches any character in the associated
__char_encoding_namespace__.

    char_               // matches any character

[heading char_(ch)]

The single argument form of `char_` (with a character argument) matches
the supplied character. 

    char_('x')          // matches 'x'
    char_(L'x')         // matches L'x'
    char_(x)            // matches x (a char)

[heading char_(first, last)]

`char_` with two arguments, matches a range of characters.

    char_('a','z')      // alphabetic characters
    char_(L'0',L'9')    // digits

A range of characters is created from a low-high character pair. Such a
parser matches a single character that is in the range, including both
endpoints. Note, the first character must be /before/ the second,
according to the underlying __char_encoding_namespace__.

Character mapping is inherently platform dependent. It is not guaranteed
in the standard for example that `'A' < 'Z'`, that is why in Spirit2, we
purposely attach a specific __char_encoding_namespace__ (such as ASCII,
ISO-8859-1) to the `char_` parser to eliminate such ambiguities.

[note *Sparse bit vectors*

To accommodate 16/32 and 64 bit characters, the char-set statically
switches from a `std::bitset` implementation when the character type is
not greater than 8 bits, to a sparse bit/boolean set which uses a sorted
vector of disjoint ranges (`range_run`). The set is constructed from
ranges such that adjacent or overlapping ranges are coalesced.

`range_runs` are very space-economical in situations where there are lots
of ranges and a few individual disjoint values. Searching is O(log n)
where n is the number of ranges.]

[heading char_(def)]

Lastly, when given a string (a plain C string, a `std::basic_string`,
etc.), the string is regarded as a char-set definition string following
a syntax that resembles posix style regular expression character sets
(except that double quotes delimit the set elements instead of square
brackets and there is no special negation ^ character). Examples:

    char_("a-zA-Z")     // alphabetic characters
    char_("0-9a-fA-F")  // hexadecimal characters
    char_("actgACTG")   // DNA identifiers
    char_("\x7f\x7e")   // Hexadecimal 0x7F and 0x7E

[heading lit(ch)]

`lit`, when passed a single character, behaves like the single argument
`char_` except that `lit` does not synthesize an attribute. A plain
`char` or `wchar_t` is equivalent to a `lit`.

[note `lit` is reused by both the [qi_lit_string string parsers] and the
char parsers. In general, a char parser is created when you pass in a
character and a string parser is created when you pass in a string. The
exception is when you pass a single element literal string, e.g.
`lit("x")`. In this case, we optimize this to create a char parser
instead of a string parser.] 

Examples:

    'x'
    lit('x')
    lit(L'x')
    lit(c) // c is a char

[heading Header]

    // forwards to <boost/spirit/home/qi/char/char.hpp>
    #include <boost/spirit/include/qi_char_.hpp>

Also, see __include_structure__.

[heading Namespace]

[table
    [[Name]]
    [[`boost::spirit::lit // alias: boost::spirit::qi::lit` ]]
    [[`ns::char_`]]
]

In the table above, `ns` represents a __char_encoding_namespace__. 

[heading Model of]

[:__primitive_parser_concept__]

[variablelist Notation
    [[`c`, `f`, `l`]    [A literal char, e.g. `'x'`, `L'x'` or anything that can be
                        converted to a `char` or `wchar_t`, or a __qi_lazy_argument__ 
                        that evaluates to anything that can be converted to a `char` 
                        or `wchar_t`.]]
    [[`ns`]             [A __char_encoding_namespace__.]]
    [[`cs`]             [A __string__ or a __qi_lazy_argument__ that evaluates to a __string__
                        that specifies a char-set definition string following a syntax
                        that resembles posix style regular expression character sets
                        (except the square brackets and the negation `^` character).]]
    [[`cp`]             [A char parser, a char range parser or a char set parser.]]
]

[heading Expression Semantics]

Semantics of an expression is defined only where it differs from, or is
not defined in __primitive_parser_concept__.

[table
    [[Expression]       [Semantics]]
    [[`c`]              [Create char parser from a char, `c`.]]
    [[`lit(c)`]         [Create a char parser from a char, `c`.]]
    [[`ns::char_`]      [Create a char parser that matches any character in the
                        `ns` encoding.]]
    [[`ns::char_(c)`]   [Create a char parser with `ns` encoding from a char, `c`.]]
    [[`ns::char_(f, l)`][Create a char-range parser that matches characters from
                        range (`f` to `l`, inclusive) with `ns` encoding.]]
    [[`ns::char_(cs)`]  [Create a char-set parser with `ns` encoding from a char-set
                        definition string, `cs`.]]
    [[`~cp`]            [Negate `cp`. The result is a negated char parser that
                        matches any character in the `ns` encoding except the
                        characters matched by `cp`.]]
]

[heading Attributes]

[table
    [[Expression]       [Attribute]]
    [[`c`]              [__unused__ or if `c` is a __qi_lazy_argument__, the character 
                        type returned by invoking it.]]
    [[`lit(c)`]         [__unused__ or if `c` is a __qi_lazy_argument__, the character 
                        type returned by invoking it.]]
    [[`ns::char_`]      [The character type of the __char_encoding_namespace__, `ns`.]]
    [[`ns::char_(c)`]   [The character type of the __char_encoding_namespace__, `ns`.]]
    [[`ns::char_(f, l)`][The character type of the __char_encoding_namespace__, `ns`.]]
    [[`ns::char_(cs)`]  [The character type of the __char_encoding_namespace__, `ns`.]]
    [[`~cp`]            [The attribute of `cp`.]]
]

[heading Complexity]

[:*O(N)*, except for char-sets with 16-bit (or more) characters (e.g.
`wchar_t`). These have *O(log N)* complexity, where N is the number of
distinct character ranges in the set.]

[heading Example]

[note The test harness for the example(s) below is presented in the
__qi_basics_examples__ section.]

Some using declarations:

[reference_using_declarations_lit_char]

Basic literals:

[reference_char_literals]

Range:

[reference_char_range]

Character set:

[reference_char_set]

Lazy char_ using __phoenix__

[reference_char_phoenix]

[endsect] [/ Char]

[/------------------------------------------------------------------------------]
[section:char_class Character Classification Parsers (`alnum`, `digit`, etc.)]

[heading Description]

The library has the full repertoire of single character parsers for
character classification. This includes the usual `alnum`, `alpha`,
`digit`, `xdigit`, etc. parsers. These parsers have an associated
__char_encoding_namespace__. This is needed when doing basic operations
such as inhibiting case sensitivity.

[heading Header]

    // forwards to <boost/spirit/home/qi/char/char_class.hpp>
    #include <boost/spirit/include/qi_char_class.hpp>

Also, see __include_structure__.

[heading Namespace]

[table
    [[Name]]
    [[`ns::alnum`]]
    [[`ns::alpha`]]
    [[`ns::blank`]]
    [[`ns::cntrl`]]
    [[`ns::digit`]]
    [[`ns::graph`]]
    [[`ns::lower`]]
    [[`ns::print`]]
    [[`ns::punct`]]
    [[`ns::space`]]
    [[`ns::upper`]]
    [[`ns::xdigit`]]
]

In the table above, `ns` represents a __char_encoding_namespace__. 

[heading Model of]

[:__primitive_parser_concept__]

[variablelist Notation
    [[`ns`]             [A __char_encoding_namespace__.]]
]

[heading Expression Semantics]

Semantics of an expression is defined only where it differs from, or is
not defined in __primitive_parser_concept__.

[table
    [[Expression]       [Semantics]]
    [[`ns::alnum`]      [Matches alpha-numeric characters]]
    [[`ns::alpha`]      [Matches alphabetic characters]]
    [[`ns::blank`]      [Matches spaces or tabs]]
    [[`ns::cntrl`]      [Matches control characters]]
    [[`ns::digit`]      [Matches numeric digits]]
    [[`ns::graph`]      [Matches non-space printing characters]]
    [[`ns::lower`]      [Matches lower case letters]]
    [[`ns::print`]      [Matches printable characters]]
    [[`ns::punct`]      [Matches punctuation symbols]]
    [[`ns::space`]      [Matches spaces, tabs, returns, and newlines]]
    [[`ns::upper`]      [Matches upper case letters]]
    [[`ns::xdigit`]     [Matches hexadecimal digits]]
]

[heading Attributes]

[:The character type of the __char_encoding_namespace__, `ns`.]

[heading Complexity]

[:O(N)]

[heading Example]

[note The test harness for the example(s) below is presented in the
__qi_basics_examples__ section.]

Some using declarations:

[reference_using_declarations_char_class]

Basic usage:

[reference_char_class]

[endsect] [/ Char Classification]

[endsect]