...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
In the next section, we will deal with Parsing Expression Grammars (PEG) [3], a variant of Extended Backus-Naur Form (EBNF) [4] with a different interpretation. It is easier to understand PEG using Syntax Diagrams. Syntax diagrams represent a grammar graphically. It was used extensively by Niklaus Wirth [5] in the "Pascal User Manual". Syntax Diagrams are easily understandable by programmers due to their similarity to flow charts. The isomorphism of the diagrams and functions make them ideal for representing Recursive Descent parsers which are essentially mutually recursive functions.
Historically, Parsing Expression Grammars have been used for describing grammars for parsers only (hence the name). In Spirit we use a very similar notation for output generation as well. Almost all the concepts described here are equally applicable both to Spirit.Qi parsers and to Spirit.Karma generators.
All diagrams have one entry and one exit point. Arrows connect all possible paths through the grammar from the entry point to the exit point.
Terminals are represented by round boxes. Terminals are atomic and usually represent plain characters, strings or tokens.
Non-terminals are represented by boxes. Diagrams are modularized using named non-terminals. A complex diagram can be broken down into a set of non-terminals. Non-terminals also allow recursion (i.e. a non-terminal can call itself).
The most basic composition is the Sequence. B follows A:
The ordered choice henceforth we will call alternatives. In PEG, ordered choice and alternatives are not quite the same. PEG allows ambiguity of choice where one or more branches can succeed. In PEG, in case of ambiguity, the first one always wins.
The optional (zero-or-one):
Now, the loops. We have the zero-or-more and one-or-more:
Take note that, as in PEG, these loops behave greedily. If there is another 'A' just before the end-point, it will always fail because the preceding loop has already exhausted all 'A's and there is nothing more left. This is a crucial difference between PEG and general Context Free Grammars (CFGs). This behavior is quite obvious with syntax diagrams as they resemble flow-charts.
Now, the following are Syntax Diagram versions of PEG predicates. These are not traditionally found in Syntax Diagrams. These are special extensions we invented to closely follow PEGs.
First, we introduce a new element, the Predicate:
This is similar to the conditionals in flow charts where the 'No' branch is absent and always signals a failed parse.
We have two versions of the predicate, the And-Predicate and the Not-Predicate:
The And-Predicate tries the predicate, P, and succeeds if P succeeds, or otherwise fail. The opposite is true with the Not-Predicate. It tries the predicate, P, and fails if P succeeds, or otherwise succeeds. Both versions do a look-ahead but do not consume any input regardless if P succeeds or not.
[3] Bryan Ford: Parsing Expression Grammars: A Recognition-Based Syntactic Foundation, http://pdos.csail.mit.edu/~baford/packrat/popl04/
[4] Richard E. Pattis: EBNF: A Notation to Describe Syntax, http://www.cs.cmu.edu/~pattis/misc/ebnf.pdf
[5] Niklaus Wirth: The Programming Language Pascal. (July 1973)