...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
Boost.Intrusive also offers hashed containers
that can be very useful to implement fast-lookup containers. These containers
(unordered_set
and unordered_multiset
)
are semi-intrusive containers: they need additional memory apart from the hook
stored in the value_type
. This
additional memory must be passed in the constructor of the container.
Unlike C++ TR1 unordered associative containers (which are also hashed containers), the contents of these semi-intrusive containers are not rehashed to maintain a load factor: that would require memory management and intrusive containers don't implement any memory management at all. However, the user can request an explicit rehashing passing a new bucket array. This also offers an additional guarantee over TR1 unordered associative containers: iterators are not invalidated when inserting an element in the container.
As with TR1 unordered associative containers, rehashing invalidates iterators, changes ordering between elements and changes which buckets elements appear in, but does not invalidate pointers or references to elements.
Apart from expected hash and equality function objects, Boost.Intrusive unordered associative containers' constructors take an argument specifying an auxiliary bucket vector to be used by the container.
The size overhead for a hashed container is moderate: 1 pointer per value
plus a bucket array per container. The size of an element of the bucket array
is usually one pointer. To obtain a good performance hashed container, the
bucket length is usually the same as the number of elements that the container
contains, so a well-balanced hashed container (bucket_count()
is equal to size()
) will have an equivalent overhead of two
pointers per element.
An empty, non constant-time size unordered_set
or unordered_multiset
has also the size of bucket_count()
pointers.
Insertions, erasures, and searches, have amortized constant-time complexity
in hashed containers. However, some worst-case guarantees are linear. See
unordered_set
or unordered_multiset
for complexity guarantees of each operation.
Be careful with non constant-time size hashed containers:
some operations, like empty()
, have linear complexity, unlike other
Boost.Intrusive containers.
unordered_set
and unordered_multiset
share the same hooks. This is an advantage, because the same user type can
be inserted first in a unordered_multiset
and after that in unordered_set
without changing the definition of the user class.
template <class ...Options> class unordered_set_base_hook;
unordered_set_base_hook
:
the user class derives publicly from unordered_set_base_hook
to make it unordered_set
/unordered_multiset
-compatible.
template <class ...Options> class unordered_set_member_hook;
unordered_set_member_hook
:
the user class contains a public unordered_set_member_hook
to make it unordered_set
/unordered_multiset
-compatible.
unordered_set_base_hook
and unordered_set_member_hook
receive the same options explained in the section How
to use Boost.Intrusive:
tag<class Tag>
(for base hooks only): This argument serves as a tag, so you can derive
from more than one base hook. Default: tag<default_tag>
.
link_mode<link_mode_type
LinkMode>
:
The linking policy. Default: link_mode<safe_link>
.
void_pointer<class VoidPointer>
:
The pointer type to be used internally in the hook and propagated to
the container. Default: void_pointer<void*>
.
Apart from them, these hooks offer additional options:
store_hash<bool Enabled>
:
This option reserves additional space in the hook to store the hash value
of the object once it's introduced in the container. When this option
is used, the unordered container will store the calculated hash value
in the hook and rehashing operations won't need to recalculate the hash
of the value. This option will improve the performance of unordered containers
when rehashing is frequent or hashing the value is a slow operation.
Default: store_hash<false>
.
optimize_multikey<bool Enabled>
:
This option reserves additional space in the hook that will be used to
group equal elements in unordered multisets, improving significantly
the performance when many equal values are inserted in these containers.
Default: optimize_multikey<false>
.
template<class T, class ...Options> class unordered_set; template<class T, class ...Options> class unordered_multiset;
As mentioned, unordered containers need an auxiliary array to work. Boost.Intrusive unordered containers receive this
auxiliary array packed in a type called bucket_traits
(which can be also customized by a container option). All unordered containers
receive a bucket_traits
object
in their constructors. The default bucket_traits
class is initialized with a pointer to an array of buckets and its size:
#include <boost/intrusive/unordered_set.hpp> using namespace boost::intrusive; struct MyClass : public unordered_set_base_hook<> {}; typedef unordered_set<MyClass>::bucket_type bucket_type; typedef unordered_set<MyClass>::bucket_traits bucket_traits; int main() { bucket_type buckets[100]; unordered_set<MyClass> uset(bucket_traits(buckets, 100)); return 0; }
Each hashed container needs its own bucket traits,
that is, its own bucket vector. Two hashed
containers can't share the same bucket_type
elements. The bucket array
must be destroyed after
the container using it is destroyed, otherwise, the result is undefined.
unordered_set
and unordered_multiset
receive the same options explained in the section How
to use Boost.Intrusive:
base_hook<class Hook>
/ member_hook<class T, class Hook, Hook T::* PtrToMember>
/ value_traits<class ValueTraits>
:
To specify the hook type or value traits used to configure the container.
(To learn about value traits go to the section Containers
with custom ValueTraits.)
constant_time_size<bool Enabled>
:
To activate the constant-time size()
operation. Default: constant_time_size<true>
size_type<bool Enabled>
:
To specify the type that will be used to store the size of the container.
Default: size_type<std::size_t>
And they also can receive additional options:
equal<class Equal>
:
Equality function for the objects to be inserted in containers. Default:
equal<
std::equal_to<T> >
hash<class Hash>
:
Hash function to be used in the container. Default: hash< boost::hash<T> >
bucket_traits<class BucketTraits>
:
A type that wraps the bucket vector to be used by the unordered container.
Default: a type initialized by the address and size of a bucket array
and stores both variables internally.
power_2_buckets<bool Enabled>
:
The user guarantees that only bucket arrays with power of two length
will be used. The container will then use masks instead of modulo operations
to obtain the bucket number from the hash value. Masks are faster than
modulo operations and for some applications modulo operations can impose
a considerable overhead. In debug mode an assertion will be raised if
the user provides a bucket length that is not power of two. Default:
power_2_buckets<false>
.
cache_begin<bool Enabled>
:
Note: this option is not compatible with auto_unlink
hooks. Due to
its internal structure, finding the first element of an unordered container
(begin()
operation) is amortized constant-time. It's possible to speed up begin()
and other operations related to it (like clear()
) if the container caches internally
the position of the first element. This imposes the overhead of one pointer
to the size of the container. Default: cache_begin<false>
.
compare_hash<bool Enabled>
:
Note: this option requires store_hash<true>
option in the hook. When
the comparison function is expensive, (e.g. strings with a long common
predicate) sometimes (specially when the load factor is high or we have
many equivalent elements in an unordered_multiset
and no optimize_multikey<>
is activated in the hook) the
equality function is a performance problem. Two equal values must have
equal hashes, so comparing the hash values of two elements before using
the comparison functor can speed up some implementations.
incremental<bool Enabled>
:
Activates incremental hashing (also known as Linear Hashing). This option
implies power_2_buckets<true>
and the container will require power
of two buckets. For more information on incremental hashing, see Linear
hash
on Wikipedia Default:
incremental<false>
Now let's see a small example using both hooks and both containers:
#include <boost/intrusive/unordered_set.hpp> #include <vector> #include <algorithm> #include <boost/functional/hash.hpp> using namespace boost::intrusive; class MyClass : public unordered_set_base_hook<> { //This is a derivation hook int int_; public: unordered_set_member_hook<> member_hook_; //This is a member hook MyClass(int i) : int_(i) {} friend bool operator== (const MyClass &a, const MyClass &b) { return a.int_ == b.int_; } friend std::size_t hash_value(const MyClass &value) { return std::size_t(value.int_); } }; //Define an unordered_set that will store MyClass objects using the base hook typedef unordered_set<MyClass> BaseSet; //Define an unordered_multiset that will store MyClass using the member hook typedef member_hook<MyClass, unordered_set_member_hook<>, &MyClass::member_hook_> MemberOption; typedef unordered_multiset< MyClass, MemberOption> MemberMultiSet; int main() { typedef std::vector<MyClass>::iterator VectIt; typedef std::vector<MyClass>::reverse_iterator VectRit; //Create a vector with 100 different MyClass objects std::vector<MyClass> values; for(int i = 0; i < 100; ++i) values.push_back(MyClass(i)); //Create a copy of the vector std::vector<MyClass> values2(values); //Create a bucket array for base_set BaseSet::bucket_type base_buckets[100]; //Create a bucket array for member_multi_set MemberMultiSet::bucket_type member_buckets[200]; //Create unordered containers taking buckets as arguments BaseSet base_set(BaseSet::bucket_traits(base_buckets, 100)); MemberMultiSet member_multi_set (MemberMultiSet::bucket_traits(member_buckets, 200)); //Now insert values's elements in the unordered_set for(VectIt it(values.begin()), itend(values.end()); it != itend; ++it) base_set.insert(*it); //Now insert values's and values2's elements in the unordered_multiset for(VectIt it(values.begin()), itend(values.end()), it2(values2.begin()),itend2(values2.end()) ; it != itend; ++it, ++it2){ member_multi_set.insert(*it); member_multi_set.insert(*it2); } //Now find every element { VectIt it(values.begin()), itend(values.end()); for(; it != itend; ++it){ //base_set should contain one element for each key if(base_set.count(*it) != 1) return 1; //member_multi_set should contain two elements for each key if(member_multi_set.count(*it) != 2) return 1; } } return 0; }
Instead of using the default bucket_traits
class to store the bucket array, a user can define his own class to store
the bucket array using the bucket_traits<>
option. A user-defined bucket-traits must fulfill the following interface:
class my_bucket_traits { bucket_ptr bucket_begin(); const_bucket_ptr bucket_begin() const; std::size_t bucket_count() const; };
The following bucket traits just stores a pointer to the bucket array but
the size is a compile-time constant. Note the use of the auxiliary unordered_bucket
and
unordered_bucket_ptr
utilities to obtain the type of the bucket and its pointer before defining
the unordered container:
#include <boost/intrusive/unordered_set.hpp> #include <boost/functional/hash.hpp> #include <vector> using namespace boost::intrusive; //A class to be inserted in an unordered_set class MyClass : public unordered_set_base_hook<> { int int_; public: MyClass(int i = 0) : int_(i) {} friend bool operator==(const MyClass &l, const MyClass &r) { return l.int_ == r.int_; } friend std::size_t hash_value(const MyClass &v) { return boost::hash_value(v.int_); } }; //Define the base hook option typedef base_hook< unordered_set_base_hook<> > BaseHookOption; //Obtain the types of the bucket and the bucket pointer typedef unordered_bucket<BaseHookOption>::type BucketType; typedef unordered_bucket_ptr<BaseHookOption>::type BucketPtr; //The custom bucket traits. class custom_bucket_traits { public: static const int NumBuckets = 100; custom_bucket_traits(BucketPtr buckets) : buckets_(buckets) {} //Functions to be implemented by custom bucket traits BucketPtr bucket_begin() const { return buckets_; } std::size_t bucket_count() const { return NumBuckets;} private: BucketPtr buckets_; }; //Define the container using the custom bucket traits typedef unordered_set<MyClass, bucket_traits<custom_bucket_traits> > BucketTraitsUset; int main() { typedef std::vector<MyClass>::iterator VectIt; std::vector<MyClass> values; //Fill values for(int i = 0; i < 100; ++i) values.push_back(MyClass(i)); //Now create the bucket array and the custom bucket traits object BucketType buckets[custom_bucket_traits::NumBuckets]; custom_bucket_traits btraits(buckets); //Now create the unordered set BucketTraitsUset uset(btraits); //Insert the values in the unordered set for(VectIt it(values.begin()), itend(values.end()); it != itend; ++it) uset.insert(*it); return 0; }