...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
New Features:
(?R)
construct
match_flag_type::format_perl
, match_flag_type::format_sed
,
and match_flag_type::format_all
operator+(std::string, sub_match<>)
and variants
tolower()
and toupper()
Bugs Fixed:
~(set='a')
works.
Bugs Fixed:
This is the version that shipped as part of Boost 1.34.
Bugs Fixed:
match_results::position()
works for nested results.
Version 1.0!
The version reviewed for acceptance into Boost. The review began September 8, 2005. Xpressive was accepted into Boost on September 28, 2005.
New Features:
syntax_option_type::ignore_white_space
New Features:
Announcement of xpressive: http://lists.boost.org/Archives/boost/2003/11/56312.php
The following features are planned for xpressive 2.X:
syntax_option_type::collate
[.a.]
[=a=]
syntax_option_type::nosubs
,
and a nosubs()
modifier for static xpressive.
Here are some wish-list features. You or your company should consider hiring me to implement them!
std::locale
.
Since many of xpressive's users are likely to be familiar with the Boost.Regex library, I would be remiss if
I failed to point out some important differences between xpressive and Boost.Regex. In particular:
xpressive::basic_regex<>
is a template on the iterator type, not the character type.
xpressive::basic_regex<>
cannot be constructed directly from a string; rather, you must use basic_regex::compile()
or regex_compiler<>
to build a regex object from a string.
xpressive::basic_regex<>
does not have an imbue()
member function; rather, the imbue()
member function is in the xpressive::regex_compiler<>
factory.
boost::basic_regex<>
has a subset of std::basic_string<>
's
members. xpressive::basic_regex<>
does not. The members lacking are: assign()
, operator[]()
, max_size()
, begin()
, end()
, size()
, compare()
, and operator=(std::basic_string<>)
.
boost::basic_regex<>
but do not exist in xpressive::basic_regex<>
are: set_expression()
,
get_allocator()
,
imbue()
,
getloc()
,
getflags()
,
and str()
.
xpressive::basic_regex<>
does not have a RegexTraits template parameter. Customization of regex
syntax and localization behavior will be controlled by regex_compiler<>
and a custom regex facet for std::locale
.
xpressive::basic_regex<>
and xpressive::match_results<>
do not have an Allocator template parameter. This is by design.
match_not_dot_null
and
match_not_dot_newline
have
moved from the match_flag_type
enum to the syntax_option_type
enum, and they have changed names to not_dot_null
and not_dot_newline
.
syntax_option_type
enumeration values are not supported: escape_in_lists
,
char_classes
, intervals
, limited_ops
,
newline_alt
, bk_plus_qm
, bk_braces
,
bk_parens
, bk_refs
, bk_vbar
,
use_except
, failbit
, literal
,
perlex
, basic
,
extended
, emacs
, awk
,
grep
,egrep
,
sed
, JavaScript
,
JScript
.
match_flag_type
enumeration values are not supported: match_not_bob
,
match_not_eob
, match_perl
, match_posix
,
and match_extra
.
Also, in the current implementation, the regex algorithms in xpressive will not detect pathological behavior and abort by throwing an exception. It is up to you to write efficient patterns that do not behave pathologically.
The performance of xpressive is competitive with Boost.Regex. I have run performance benchmarks comparing static xpressive, dynamic xpressive and Boost.Regex on two platforms: gcc (Cygwin) and Visual C++. The tests include short matches and long searches. For both platforms, xpressive comes off well on short matches and roughly on par with Boost.Regex on long searches.
<disclaimer> As with all benchmarks, the true test is how xpressive performs with your patterns, your input, and your platform, so if performance matters in your application, it's best to run your own tests. </disclaimer>
Below are the results of a performance comparison between:
Test Specifications
hyper-threaded 3GHz Xeon with 1Gb RAM
Windows XP Pro + Cygwin
GNU C++ version 3.4.4 (Cygwin special)
GNU libstdc++ version 3.4.4
1.33+, BOOST_REGEX_USE_CPP_LOCALE, BOOST_REGEX_RECURSIVE
0.9.6a
The following tests evaluate the time taken to match the expression to the input string. For each result, the top number has been normalized relative to the fastest time, so 1.0 is as good as it gets. The bottom number (in parentheses) is the actual time in seconds. The best time has been marked in green.
static xpressive | dynamic xpressive | Boost | Text | Expression |
---|---|---|---|---|
1(8.79e‑07s) | 1.08(9.54e‑07s) | 2.51(2.2e‑06s) | 100- this is a line of ftp response which contains a message string | ^([0-9]+)(\-| |$)(.*)$ |
1.06(1.07e‑06s) | 1(1.01e‑06s) | 4.01(4.06e‑06s) | 1234-5678-1234-456 | ([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4} |
1(1.4e‑06s) | 1.13(1.58e‑06s) | 2.89(4.05e‑06s) | john_maddock@compuserve.com | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
1(1.28e‑06s) | 1.16(1.49e‑06s) | 3.07(3.94e‑06s) | foo12@foo.edu | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
1(1.22e‑06s) | 1.2(1.46e‑06s) | 3.22(3.93e‑06s) | bob.smith@foo.tv | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
1.04(8.64e‑07s) | 1(8.34e‑07s) | 2.5(2.09e‑06s) | EH10 2QQ | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
1.11(9.09e‑07s) | 1(8.19e‑07s) | 2.47(2.03e‑06s) | G1 1AA | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
1.12(9.38e‑07s) | 1(8.34e‑07s) | 2.5(2.08e‑06s) | SW1 1ZZ | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
1(7.9e‑07s) | 1.06(8.34e‑07s) | 2.49(1.96e‑06s) | 4/1/2001 | ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
1(8.19e‑07s) | 1.04(8.49e‑07s) | 2.4(1.97e‑06s) | 12/12/2001 | ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
1.09(8.95e‑07s) | 1(8.19e‑07s) | 2.4(1.96e‑06s) | 123 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
1.11(8.79e‑07s) | 1(7.9e‑07s) | 2.57(2.03e‑06s) | +3.14159 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
1.09(8.94e‑07s) | 1(8.19e‑07s) | 2.47(2.03e‑06s) | -3.14159 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
The next test measures the time to find all matches in a long English text. The text is the complete works of Mark Twain, from Project Gutenberg. The text is 19Mb long. As above, the top number is the normalized time and the bottom number is the actual time. The best time is in green.
static xpressive | dynamic xpressive | Boost | Expression |
---|---|---|---|
1(0.0263s) | 1(0.0263s) | 1.78(0.0469s) | Twain |
1(0.0234s) | 1(0.0234s) | 1.79(0.042s) | Huck[[:alpha:]]+ |
1.84(1.26s) | 2.21(1.51s) | 1(0.687s) | [[:alpha:]]+ing |
1.09(0.192s) | 2(0.351s) | 1(0.176s) | ^[^
]*?Twain |
1.41(0.08s) | 1.21(0.0684s) | 1(0.0566s) | Tom|Sawyer|Huckleberry|Finn |
1.56(0.195s) | 1.12(0.141s) | 1(0.125s) | (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn) |
Below are the results of a performance comparison between:
Test Specifications
hyper-threaded 3GHz Xeon with 1Gb RAM
Windows XP Pro
Visual C++ .NET 2003 (7.1)
Dinkumware, version 313
1.33+, BOOST_REGEX_USE_CPP_LOCALE, BOOST_REGEX_RECURSIVE
0.9.6a
The following tests evaluate the time taken to match the expression to the input string. For each result, the top number has been normalized relative to the fastest time, so 1.0 is as good as it gets. The bottom number (in parentheses) is the actual time in seconds. The best time has been marked in green.
static xpressive | dynamic xpressive | Boost | Text | Expression |
---|---|---|---|---|
1(3.2e‑007s) | 1.37(4.4e‑007s) | 2.38(7.6e‑007s) | 100- this is a line of ftp response which contains a message string | ^([0-9]+)(\-| |$)(.*)$ |
1(6.4e‑007s) | 1.12(7.15e‑007s) | 1.72(1.1e‑006s) | 1234-5678-1234-456 | ([[:digit:]]{4}[- ]){3}[[:digit:]]{3,4} |
1(9.82e‑007s) | 1.3(1.28e‑006s) | 1.61(1.58e‑006s) | john_maddock@compuserve.com | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
1(8.94e‑007s) | 1.3(1.16e‑006s) | 1.7(1.52e‑006s) | foo12@foo.edu | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
1(9.09e‑007s) | 1.28(1.16e‑006s) | 1.67(1.52e‑006s) | bob.smith@foo.tv | ^([a-zA-Z0-9_\-\.]+)@((\[[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.)|(([a-zA-Z0-9\-]+\.)+))([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$ |
1(3.06e‑007s) | 1.07(3.28e‑007s) | 1.95(5.96e‑007s) | EH10 2QQ | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
1(3.13e‑007s) | 1.09(3.42e‑007s) | 1.86(5.81e‑007s) | G1 1AA | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
1(3.2e‑007s) | 1.09(3.5e‑007s) | 1.86(5.96e‑007s) | SW1 1ZZ | ^[a-zA-Z]{1,2}[0-9][0-9A-Za-z]{0,1} {0,1}[0-9][A-Za-z]{2}$ |
1(2.68e‑007s) | 1.22(3.28e‑007s) | 2(5.36e‑007s) | 4/1/2001 | ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
1(2.76e‑007s) | 1.16(3.2e‑007s) | 1.94(5.36e‑007s) | 12/12/2001 | ^[[:digit:]]{1,2}/[[:digit:]]{1,2}/[[:digit:]]{4}$ |
1(2.98e‑007s) | 1.03(3.06e‑007s) | 1.85(5.51e‑007s) | 123 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
1(3.2e‑007s) | 1.12(3.58e‑007s) | 1.81(5.81e‑007s) | +3.14159 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
1(3.28e‑007s) | 1.11(3.65e‑007s) | 1.77(5.81e‑007s) | -3.14159 | ^[-+]?[[:digit:]]*\.?[[:digit:]]*$ |
The next test measures the time to find all matches in a long English text. The text is the complete works of Mark Twain, from Project Gutenberg. The text is 19Mb long. As above, the top number is the normalized time and the bottom number is the actual time. The best time is in green.
static xpressive | dynamic xpressive | Boost | Expression |
---|---|---|---|
1(0.019s) | 1(0.019s) | 2.98(0.0566s) | Twain |
1(0.0176s) | 1(0.0176s) | 3.17(0.0556s) | Huck[[:alpha:]]+ |
3.62(1.78s) | 3.97(1.95s) | 1(0.492s) | [[:alpha:]]+ing |
2.32(0.344s) | 3.06(0.453s) | 1(0.148s) | ^[^
]*?Twain |
1(0.0576s) | 1.05(0.0606s) | 1.15(0.0664s) | Tom|Sawyer|Huckleberry|Finn |
1.24(0.164s) | 1.44(0.191s) | 1(0.133s) | (Tom|Sawyer|Huckleberry|Finn).{0,30}river|river.{0,30}(Tom|Sawyer|Huckleberry|Finn) |