Tutorial

2.2.7. UNIX-to-DOS Filters

Suppose you want to write a Filter to convert UNIX line endings to DOS line-endings. The basic idea is simple: you process the characters in a sequence one at a time, and whenever you encounter the character '\n' you replace it with the two-character sequence '\r', '\n'. In the following sections I'll implement this algorithm as a stdio_filter, an InputFilter and an OutputFilter. The source code can be found in the header <libs/iostreams/example/unix2dos_filter.hpp>

unix2dos_stdio_filter

You can express a UNIX-to-DOS Filter as a stdio_filter by deriving from stdio_filter and overriding the private virtual function do_filter as follows:

#include <cstdio>    // EOF
#include <iostream>  // cin, cout
#include <boost/iostreams/filter/stdio.hpp>

namespace boost { namespace iostreams { namespace example {

class unix2dos_stdio_filter : public stdio_filter {
private:
    void do_filter()
    {
        int c;
        while ((c = std::cin.get()) != EOF) {
            if (c == '\n')
                std::cout.put('\r');
            std::cout.put(c);
        }
    }
};

} } } // End namespace boost::iostreams:example

The function do_filter consists of a straightforward implementation of the algorithm I described above: it reads characters from standard input and writes them to standard output unchanged, except that when it encounters '\n' it writes '\r', '\n'.

unix2dos_input_filter

Now, let's express a UNIX-to-DOS Filter as an InputFilter.

#include <boost/iostreams/categories.hpp> // input_filter_tag
#include <boost/iostreams/operations.hpp> // get

namespace boost { namespace iostreams { namespace example {

class unix2dos_input_filter {
public:
    typedef char              char_type;
    typedef input_filter_tag  category;

    unix2dos_input_filter() : has_linefeed_(false) { }

    template<typename Source>
    int get(Source& src)
    {
        // Handle unfinished business
        if (has_linefeed_) {
            has_linefeed_ = false;
            return '\n';
        }

        // Forward all characters except '\n'
        int c;
        if ((c = iostreams::get(src)) == '\n') {
            has_linefeed_ = true;
            return '\r';
        }

        return c;
    }

    template<typename Source>
    void close(Source&);
private:
    bool has_linefeed_;
};

} } } // End namespace boost::iostreams:example

The implementation of get can be described as follows. Most of the time, you simply read a character from src and return it. The special values EOF and WOULD_BLOCK are treated the same way: they are simply forwarded as-is. The exception is when iostreams::get returns '\n'. In this case, you return '\r' instead and make a note to return '\n' the next time get is called.

As usual, the member function close reset's the Filter's state:

    template<typename Source>
    void close(Source&) { skip_ = false; }

unix2dos_output_filter

You can express a UNIX-to-DOS Filter as an OutputFilter as follows:

#include <boost/iostreams/concepts.hpp>   // output_filter
#include <boost/iostreams/operations.hpp> // put

namespace boost { namespace iostreams { namespace example {

class unix2dos_output_filter : public output_filter {
public:
    unix2dos_output_filter() : has_linefeed_(false) { }

    template<typename Sink>
    bool put(Sink& dest, int c);

    template<typename Sink>
    void close(Sink&) { has_linefeed_ = false; }
private:
    template<typename Sink>
    bool put_char(Sink& dest, int c);

    bool has_linefeed_;
};

} } } // End namespace boost::iostreams:example

Here I've derived from the helper class output_filter, which provides a member type char_type equal to char and a category tag convertible to output_filter_tag and to closable_tag.

Let's look first at the helper function put_char:

    template<typename Sink>
    bool put_char(Sink& dest, int c)
    {
        bool result;
        if ((result = iostreams::put(dest, c)) == true) {
            has_linefeed_ =
                c == '\r' ?
                    true : 
                    c == '\n' ? 
                        false :
                        has_linefeed_;
        }
        return result;
    }

This function attempts to write a single character to the Sink dest, returning true for success. If successful, it updates the flag has_linefeed_, which indicates that an attempt to write a DOS line ending sequence failed after the first character was written.

Using put_char you can implement put as follows:

    bool put(Sink& dest, int c)
    {
        if (c == '\n') 
            return has_linefeed_ ?
                put_char(dest, '\n') :
                put_char(dest, '\r') ?
                    this->put(dest, '\n') :
                    false;
        return iostreams::put(dest, c);
    }

The implementation works like so:

  1. If you're at the beginning of a DOS line-ending sequence — that is, if c is 'n' and has_line_feed_ is false — you attempt to write '\r' and then '\n' to dest.
  2. If you're in the middle of a DOS line-ending sequence — that is, if c is 'n' and has_line_feed_ is true — you attempt to complete it by writing '\n'.
  3. Otherwise, you attempt to write c to dest.

There are two subtle points. First, why does c == 'n' and has_line_feed_ == true mean that you're in the middle of a DOS line-ending sequence? Because when you attempt to write '\r', '\n' but only the first character succeeds, you set has_line_feed_ and return false. This causes the user of the Filter to resend the character '\n' which triggered the line-ending sequence. Second, note that to write the second character of a line-ending sequence you call put recursively instead of calling put_char.

Comparing the implementations of unix2dos_input_filter and unix2dos_output_filter, you can see that this a case where a filtering algorithm is much easier to express as an Input than as an OutputFilter. If you wanted to avoid the complexity of the above definition, you could use the class template inverse to construct an OutputFilter from unix2dos_input_filter:

#include <boost/iostreams/invert.hpp>   // inverse   

namespace io = boost::iostreams;
namespace ex = boost::iostreams::example;

typedef io::inverse<ex::unix2dos_input_filter> unix2dos_output_filter;

Even this is more work than necessary, however, since line-ending conversions can be handled easily with the built-in component newline_filter.