In this section I show how to construct Dual-Use Filters from finite state machines. For purposes of this section, a finite state machine consists of a collection of states, represented as int
s, a distinguished initial state, a transition table, a collection of event handlers and a stack of characters. These finite state machines were inspired by the finite state machine examples that accompany the Boost Metaprogamming library
. See, e.g., <libs/mpl/example/player1.cpp>
For us, an event is simply a character. A transition table is an MPL Forward Sequence of rows. Each row is a 4-tuple consisting of a current state, a character class, a next state and an event handler. During the operation of a finite state machine, its transition table is consulted to determine how the machine's state should be updated and which event handler to call. When an event e
occurs, the transition table is searched for the first row whose current state is equal to the current state of the machine, and whose character class matches e
. The machine's state is then updated to the row's next state, and the event handler is called with e
as an argument.
A character class is a class type with a static
member function test
taking a character and a std::locale
as arguments. E.g.,
struct is_any { static bool test(char, const std::locale&); };
There are several built in character classes. The character class is_any
matches any character. For any character c
, the character class is<c>
only matches c
. The character classes alnum
, is_alpha
, is_cntrl
, is_digit
, is_graph
, is_lower
, is_print
, is_punct
, is_space
, is_upper
, is_xdigit
implement locale
-sensitive character classification.
Event handlers are member functions of a finite state machine which take a single character as argument and return void
. The implementation of an event handler typically examines the given character and pushes zero or more characters of filtered data onto the stack. A special event handler on_eof
takes no arguments and is called automatically after the last character in a sequence is processed.
There are three built-in event handlers, on_any
, push
and skip
. The handler on_any
is called automatically when an event occurs which is not covered by any row in the transition table. The default behavior of on_any
is to do nothing. The handler push
pushes the given character onto the stack. The handler skip
ignores the current character. The handlers push
and skip
are not defined by default; in order to make them available, the macro BOOST_IOSTREAM_FSM
should be invoked within the class definition of a finite state machine, passing the state machine's name as the macro argument.
Finite state machines must derive from the class template boost::iostreams::finite_state_machine
, defined in the header <libs/iostreams/example/finite_state_filter.hpp>
. The first template argument to finite_state_machine
should be the derived class itself; the second template argument should be the character type.
A finite state machine's transition table should be declared as a member type named transition_table
. Its event handlers should be implemented as member functions. A finite state machine may define a static
integral constant named initial_state
, which will be treated as the machine's initial state. Alternatively, the definition of the initial state can be omitted, in which case it will default to 0
.
Given a finite state machine my_fsm
, you can define a Dual-Use Filter as follows:
namespace io = boost::iostreams; typedef io::finite_state_filter<my_fsm> my_finite_state_filter;
dos2unix_fsm
The following state machine can be used to convert line endings from DOS
to UNIX
. The constant initial_state
, the class template row
and the character classes is
and is_any
are members of the base class boost::iostreams::finite_state_machine
.
#include <boost/mpl/vector> #include <libs/iostreams/example/finite_state_filter.hpp> namespace io = boost::iostreams; struct dos2unix_fsm : io::finite_state_machine<dos2unix_fsm> { BOOST_IOSTREAMS_FSM(dos2unix_fsm) // Define skip and push. typedef dos2unix_fsm self; typedef boost::mpl::vector< row<initial_state, is<'\r'>, initial_state, &self::skip>, row<initial_state, is_any, initial_state, &self::push> > transition_table; };
This machine has just a single state. Its transition table can be understood as follows: if the current character is '\r'
, ignore it; otherwise, forward it unchanged.
unix2dos_fsm
The following state machine can be used to convert line endings from UNIX
to DOS
:
#include <boost/mpl/vector> #include <libs/iostreams/example/finite_state_filter.hpp> namespace io = boost::iostreams; struct unix2dos_fsm : io::finite_state_machine<unix2dos_fsm> { BOOST_IOSTREAMS_FSM(unix2dos_fsm) // Define skip and push. typedef unix2dos_fsm self; void on_lf(char) { push('\r'); push('\n'); } typedef boost::mpl::vector< row<initial_state, is<'\n'>, initial_state, &self::on_lf>, row<initial_state, is_any, initial_state, &self::push> > transition_table; };
This machine also has just a single state. The event handler on_lf
pushes a DOS
line-ending sequence onto the stack. The transition table can be understood as follows: if the current character is '\n'
, write a DOS
line-ending sequence; otherwise, forward it unchanged.
uncommenting_fsm
The following state machine removes C-style comments. Although it's not quite sophisticated enough for processing source code, it's a good illustration of a multi-state machine.
#include <boost/mpl/vector> #include <libs/iostreams/example/finite_state_filter.hpp> namespace io = boost::iostreams; struct uncommenting_fsm : io::finite_state_machine<uncommenting_fsm> { BOOST_IOSTREAMS_FSM(uncommenting_fsm) // Define skip and push. typedef uncommenting_fsm self; static const int no_comment = initial_state; static const int pre_comment = no_comment + 1; static const int comment = pre_comment + 1; static const int post_comment = comment + 1; void push_slash(char c) { push('/'); push(c); } typedef boost::mpl::vector< row<no_comment, is<'/'>, pre_comment, &self::skip>, row<no_comment, is_any, no_comment, &self::push>, row<pre_comment, is<'*'>, comment, &self::skip>, row<pre_comment, is<'/'>, pre_comment, &self::push>, row<pre_comment, is_any, no_comment, &self::push_slash>, row<comment, is<'*'>, post_comment, &self::skip>, row<comment, is_any, comment, &self::skip>, row<post_comment, is<'/'>, no_comment, &self::skip>, row<post_comment, is<'*'>, post_comment, &self::skip>, row<post_comment, is_any, comment, &self::skip> > transition_table; };
This machine has four states and one user-defined event handler. I'll leave it as an exercise to discover how it works.
© Copyright 2008 CodeRage, LLC
© Copyright 2004-2007 Jonathan Turkanis
Use, modification, and distribution are subject to the Boost Software License, Version 2.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)