...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
The character generators described in this section are:
The char_
generator emits
single characters. The char_
generator has an associated Character
Encoding Namespace. This is needed when doing basic operations
such as forcing lower or upper case and dealing with character ranges.
There are various forms of char_
.
The no argument form of char_
emits any character in the associated Character
Encoding Namespace.
char_ // emits any character as supplied by the attribute
The single argument form of char_
(with a character argument) emits the supplied character.
char_('x') // emits 'x' char_(L'x') // emits L'x' char_(x) // emits x (a char)
char_
with two arguments,
emits any character from a range of characters as supplied by the attribute.
char_('a','z') // alphabetic characters char_(L'0',L'9') // digits
A range of characters is created from a low-high character pair. Such a generator emits a single character that is in the range, including both endpoints. Note, the first character must be before the second, according to the underlying Character Encoding Namespace.
Character mapping is inherently platform dependent. It is not guaranteed
in the standard for example that 'A'
< 'Z'
,
that is why in Spirit2, we purposely attach a specific Character
Encoding Namespace (such as ASCII, ISO-8859-1) to the char_
generator to eliminate such ambiguities.
Note | |
---|---|
Sparse bit vectors
To accommodate 16/32 and 64 bit characters, the char-set statically
switches from a
|
Lastly, when given a string (a plain C string, a std::basic_string
,
etc.), the string is regarded as a char-set definition string following
a syntax that resembles posix style regular expression character sets
(except that double quotes delimit the set elements instead of square
brackets and there is no special negation ^ character). Examples:
char_("a-zA-Z") // alphabetic characters char_("0-9a-fA-F") // hexadecimal characters char_("actgACTG") // DNA identifiers char_("\x7f\x7e") // Hexadecimal 0x7F and 0x7E
These generators emit any character from a range of characters as supplied by the attribute.
lit
, when passed a single
character, behaves like the single argument char_
except that lit
does
not consume an attribute. A plain char
or wchar_t
is equivalent
to a lit
.
Note | |
---|---|
|
Examples:
'x' lit('x') lit(L'x') lit(c) // c is a char
// forwards to <boost/spirit/home/karma/char/char.hpp> #include <boost/spirit/include/karma_char_.hpp>
Also, see Include Structure.
Name |
---|
|
|
In the table above, ns
represents a Character
Encoding Namespace.
Notation
ch
, ch1
, ch2
Character-class specific character (See Character Class Types), or a Lazy Argument that evaluates to a character-class specific character value
cs
Character-set specifier string (See Character Class Types), or
a Lazy
Argument that evaluates to a character-set specifier string,
or a pointer/reference to a null-terminated array of characters.
This string specifies a char-set definition string following a
syntax that resembles posix style regular expression character
sets (except the square brackets and the negation ^
character).
ns
cg
A char generator, a char range generator, or a char set generator.
Semantics of an expression is defined only where it differs from, or
is not defined in PrimitiveGenerator
.
Expression |
Description |
---|---|
|
Generate the character literal |
|
Generate the character literal |
|
Generate the character provided by a mandatory attribute interpreted
in the character set defined by |
|
Generate the character |
|
Generate the character |
|
Generate the character provided by a mandatory attribute interpreted
in the character set defined by |
|
Generate the character provided by a mandatory attribute interpreted
in the character set defined by |
|
Negate |
A character ch
is assumed
to belong to the character range defined by ns::char_(ch1, ch2)
if its character value (binary representation)
interpreted in the character set defined by ns
is not smaller than the character value of ch1
and not larger then the character value of ch2
(i.e. ch1 <=
ch <=
ch2
).
The charset
parameter
passed to ns::char_(charset)
must be a string containing more than one character. Every single character
in this string is assumed to belong to the character set defined by this
expression. An exception to this is the '-'
character which has a special meaning if it is not specified as the first
and not the last character in charset
.
If the '-'
is used in between
to characters it is interpreted as spanning a character range. A character
ch
is considered to belong
to the defined character set charset
if it matches one of the characters as specified by the string parameter
described above. For example
Example |
Description |
---|---|
|
'a', 'b', and 'c' |
|
all characters (and including) from 'a' to 'z' |
|
all characters (and including) from 'a' to 'z' and 'A' and 'Z' |
|
'-' and all characters (and including) from '1' to '9' |
Expression |
Attribute |
---|---|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
Attribute of |
Note | |
---|---|
In addition to their usual attribute of type |
O(1)
The complexity of ch
,
lit(ch)
,
ns::char_
, ns::char_(ch)
, and ns::char_("c")
is constant as all generators emit exactly
one character per invocation.
The character range generator (ns::char_(ch1, ch2)
) additionally requires constant lookup
time for the verification whether the attribute belongs to the character
range.
The character set generator (ns::char_(cs)
) additionally requires O(log N) lookup
time for the verification whether the attribute belongs to the character
set, where N is the number of characters in the character set.
Note | |
---|---|
The test harness for the example(s) below is presented in the Basics Examples section. |
Some includes:
#include <boost/spirit/include/karma.hpp> #include <boost/spirit/include/support_utree.hpp> #include <boost/spirit/include/phoenix_core.hpp> #include <boost/spirit/include/phoenix_operator.hpp> #include <boost/fusion/include/std_pair.hpp> #include <iostream> #include <string>
Some using declarations:
using boost::spirit::karma::lit; using boost::spirit::ascii::char_;
Basic usage of char_
generators:
test_generator("A", 'A'); test_generator("A", lit('A')); test_generator_attr("a", char_, 'a'); test_generator("A", char_('A')); test_generator_attr("A", char_('A'), 'A'); test_generator_attr("", char_('A'), 'B'); // fails (as 'A' != 'B') test_generator_attr("A", char_('A', 'Z'), 'A'); test_generator_attr("", char_('A', 'Z'), 'a'); // fails (as 'a' does not belong to 'A'...'Z') test_generator_attr("k", char_("a-z0-9"), 'k'); test_generator_attr("", char_("a-z0-9"), 'A'); // fails (as 'A' does not belong to "a-z0-9")