...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
Header <boost/regex/icu.hpp>
provides a regular expression traits
class that handles UTF-32 characters:
class icu_regex_traits;
and a regular expression type based upon that:
typedef basic_regex<UChar32,icu_regex_traits> u32regex;
The type u32regex
is
regular expression type to use for all Unicode regular expressions; internally
it uses UTF-32 code points, but can be created from, and used to search,
either UTF-8, or UTF-16 encoded strings as well as UTF-32 ones.
The constructors, and assign member functions of u32regex
,
require UTF-32 encoded strings, but there are a series of overloaded
algorithms called make_u32regex
which allow regular expressions to be created from UTF-8, UTF-16, or
UTF-32 encoded strings:
template <class InputIterator> u32regex make_u32regex(InputIterator i, InputIterator j, boost::regex_constants::syntax_option_type opt);
Effects: Creates a regular expression object from the iterator sequence [i,j). The character encoding of the sequence is determined based upon sizeof(*i): 1 implies UTF-8, 2 implies UTF-16, and 4 implies UTF-32.
u32regex make_u32regex(const char* p, boost::regex_constants::syntax_option_type opt = boost::regex_constants::perl);
Effects: Creates a regular expression object from the Null-terminated UTF-8 characater sequence p.
u32regex make_u32regex(const unsigned char* p, boost::regex_constants::syntax_option_type opt = boost::regex_constants::perl);
Effects: Creates a regular expression object from the Null-terminated UTF-8 characater sequence p.
u32regex make_u32regex(const wchar_t* p, boost::regex_constants::syntax_option_type opt = boost::regex_constants::perl);
Effects: Creates a regular expression object from the Null-terminated characater sequence p. The character encoding of the sequence is determined based upon sizeof(wchar_t): 1 implies UTF-8, 2 implies UTF-16, and 4 implies UTF-32.
u32regex make_u32regex(const UChar* p, boost::regex_constants::syntax_option_type opt = boost::regex_constants::perl);
Effects: Creates a regular expression object from the Null-terminated UTF-16 characater sequence p.
template<class C, class T, class A> u32regex make_u32regex(const std::basic_string<C, T, A>& s, boost::regex_constants::syntax_option_type opt = boost::regex_constants::perl);
Effects: Creates a regular expression object from the string s. The character encoding of the string is determined based upon sizeof(C): 1 implies UTF-8, 2 implies UTF-16, and 4 implies UTF-32.
u32regex make_u32regex(const UnicodeString& s, boost::regex_constants::syntax_option_type opt = boost::regex_constants::perl);
Effects: Creates a regular expression object from the UTF-16 encoding string s.