A dictionary filter is a Filter which performs text substitution in the following manner. It maintains a collection of pairs of strings whose first components are words and whose second components represent replacement text — I'll call such a collection a dictionary, and refer to the pairs it contains as definitions. When a dictionary filter encounters a word which appears as the first component of a definition, it forwards the replacement text instead of the original word. Other words, whitespace and punctuation are forwarded unchanged.
The basic algorithm is as follows: You examine characters one at a time, appending them to a string which I'll call the current word. When you encounter a non-alphabetic character, you consult the dictionary to determine whether the current word appears as the first component of a definition. If it does, you forward the replacement text followed by the non-alphabetic character. Otherwise, you forward the current word followed by the non-alphabetic character. When the end-of-stream is reached, you consult the dictionary again and forward either the curent word or its replacement, as appropriate.
In the following sections, I'll express this algorithm as a stdio_filter
, an InputFilter and an OutputFilter. The source code can be found in the header <libs/iostreams/example/dictionary_filter.hpp>
.
dictionary
You can represent a dictionary using the following class:
#include <map> #include <string> namespace boost { namespace iostreams { namespace example { class dictionary { public: void add(std::string key, const std::string& value); void replace(std::string& key); /* ... */ }; } } } // End namespace boost::iostreams:example
The member function add
converts key
to lower case and adds the pair key
, value
to the dictionary. The member function replace
searches for a definition whose first component is equal to the result of converting key
to lower case. If it finds such a definition, it assigns the replacement text to key
, adjusting the case of the first character to match the case of the first character of key
. Otherwise, it does nothing.
dictionary_stdio_filter
You can express a dictionary filter as a stdio_filter
as follows:
#include <cstdio> // EOF #include <iostream> // cin, cout #include <boost/iostreams/filter/stdio.hpp> namespace boost { namespace iostreams { namespace example { class dictionary_stdio_filter : public stdio_filter { public: dictionary_stdio_filter(dictionary& d) : dictionary_(d) { } private: void do_filter() { using namespace std; while (true) { int c = std::cin.get(); if (c == EOF || !std::isalpha((unsigned char) c)) { dictionary_.replace(current_word_); cout.write( current_word_.data(), static_cast<streamsize>(current_word_.size()) ); current_word_.erase(); if (c == EOF) break; cout.put(c); } else { current_word_ += c; } } } dictionary& dictionary_; std::string current_word_; }; } } } // End namespace boost::iostreams:example
The implementation of do_filter
simply loops, reading characters from std::cin
and appending
them to the member variable current_word_
until a non-alphabetic character or end-of-stream indication is encountered. When this occurs it uses its dictionary, stored in the member variable dictionary_
, to replace the current word if necessary. Finally, it writes the current word, followed by the non-alphabetic character, if any, to std::cout
.
dictionary_input_filter
You can express a dictionary filter as an InputFilter as follows:
#include <boost/iostreams/char_traits.hpp> // EOF, WOULD_BLOCK #include <boost/iostreams/concepts.hpp> // input_filter #include <boost/iostreams/operations.hpp> // get namespace boost { namespace iostreams { namespace example { class dictionary_input_filter : public input_filter { public: dictionary_input_filter(dictionary& d) : dictionary_(d), off_(std::string::npos), eof_(false) { } template<typename Source> int get(Source& src); template<typename Source> void close(Source&); private: dictionary& dictionary_; std::string current_word_; std::string::size_type off_; bool eof_; }; } } } // End namespace boost::iostreams:example
The function get
is implemented as follows:
template<typename Source> int get(Source& src) { // Handle unfinished business. if (off_ != std::string::npos && off_ < current_word_.size()) return current_word_[off_++]; if (off_ == current_word_.size()) { current_word_.erase(); off_ = std::string::npos; } if (eof_) return EOF; // Compute curent word. while (true) { int c; if ((c = iostreams::get(src)) == WOULD_BLOCK) return WOULD_BLOCK; if (c == EOF || !std::isalpha((unsigned char) c)) { dictionary_.replace(current_word_); off_ = 0; if (c == EOF) eof_ = true; else current_word_ += c; break; } else { current_word_ += c; } } return this->get(src); // Note: current_word_ is not empty. }
You first check to see whether there are any characters which remain from a previous invocation of get
. If so, you update some book keeping information and return the first such character.
The while
loop is very similar to that of dictionary_stdio_filter::do_filter
: it reads characters from the Source src
, appending them to current_word_
until a non-alphabetic character, EOF
or WOULD_BLOCK
is encountered. The value WOULD_BLOCK
is passed on to the caller. In the remaining cases, the dictionary is consulted to determine the appropriate replacement text.
Finally, get
is called recursively to return the first character of the current word.
As usual, the function close
resets the Filter's state:
template<typename Source> void close(Source&) { current_word_.erase(); off_ = std::string::npos; eof_ = false; }
dictionary_output_filter
You can express a dictionary filter as an OutputFilter as follows:
#include <algorithm> // swap #include <boost/iostreams/concepts.hpp> // output_filter #include <boost/iostreams/operations.hpp> // write namespace boost { namespace iostreams { namespace example { class dictionary_output_filter : public output_filter { public: typedef std::map<std::string, std::string> map_type; dictionary_output_filter(dictionary& d) : dictionary_(d), off_(std::string::npos) { } template<typename Sink> bool put(Sink& dest, int c); template<typename Sink> void close(Sink& dest); private: template<typename Sink> bool write_current_word(Sink& dest); dictionary& dictionary_; std::string current_word_; std::string::size_type off_; }; } } } // End namespace boost::iostreams:example
Let's look first at the helper function write_current_word
:
template<typename Sink> bool write_current_word(Sink& dest) { using namespace std; streamsize amt = static_cast<streamsize>(current_word_.size() - off_); streamsize result = iostreams::write(dest, current_word_.data() + off_, amt); if (result == amt) { current_word_.erase(); off_ = string::npos; return true; } else { off_ += static_cast<string::size_type>(result); return false; } }
This function attempts to write current_word_
, beginning at the offset off_
, to the provided Sink. If the entire sequence is successfully written, current_word_
is cleared and the function returns true
. Otherwise the member variable off_
is updated to point to the first unwritten character and the function fails.
Using write_current_word
you can implement put
as follows:
template<typename Sink> bool put(Sink& dest, int c) { if (off_ != std::string::npos && !write_current_word(dest)) return false; if (!std::isalpha((unsigned char) c)) { dictionary_.replace(current_word_); off_ = 0; } current_word_ += c; return true; }
As in the implementation of dictionary_input_filter::get
, you first check to see whether there are any characters from a previous invocation of put
which remain to be written. If so, you attempt to write these characters using write_current_word
. If successful, you next examine the given character c
. If it is a non-alphabetic character, you consult the dictionary to determine the appropriate replacement text. In any case, you append c
to current_word_
and return true
.
The function close
has more work to do in this case than simply reseting the Filter's state. Unless the last character of the unfiltered sequence happened to be a non-alphabetic character, the contents of current_word_ will not yet have been written:
template<typename Sink> void close(Sink& dest) { // Reset current_word_ and off_, saving old values. std::string current_word; std::string::size_type off = 0; current_word.swap(current_word_); std::swap(off, off_); // Write remaining characters to dest. if (off == std::string::npos) { dictionary_.replace(current_word); off = 0; } if (!current_word.empty()) iostreams::write( dest, current_word.data() + off, static_cast<std::streamsize>(current_word.size() - off) ); }
Note that you may assume that the template argument is a Blocking Sink, and that you must reset the values of current_word_
and off_
before calling write
, in case write
throws an exception.
© Copyright 2008 CodeRage, LLC
© Copyright 2004-2007 Jonathan Turkanis
Use, modification, and distribution are subject to the Boost Software License, Version 2.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)