...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
"Dynamic memory allocation has been a fundamental part of most computer systems since roughly 1960..."1
Everyone uses dynamic memory allocation. If you have ever called malloc or new, then you have used dynamic memory allocation. Most programmers have a tendency to treat the heap as a "magic bag": we ask it for memory, and it magically creates some for us. Sometimes we run into problems because the heap is not magic.
The heap is limited. Even on large systems (i.e., not embedded) with huge amounts of virtual memory available, there is a limit. Everyone is aware of the physical limit, but there is a more subtle, "virtual" limit, that limit at which your program (or the entire system) slows down due to the use of virtual memory. This virtual limit is much closer to your program than the physical limit, especially if you are running on a multitasking system. Therefore, when running on a large system, it is considered "nice" to make your program use as few resources as necessary, and release them as soon as possible. When using an embedded system, programmers usually have no memory to waste.
The heap is complicated. It has to satisfy any type of memory request, for any size, and do it fast. The common approaches to memory management have to do with splitting the memory up into portions, and keeping them ordered by size in some sort of a tree or list structure. Add in other factors, such as locality and estimating lifetime, and heaps quickly become very complicated. So complicated, in fact, that there is no known "perfect" answer to the problem of how to do dynamic memory allocation. The diagrams below illustrate how most common memory managers work: for each chunk of memory, it uses part of that memory to maintain its internal tree or list structure. Even when a chunk is malloc'ed out to a program, the memory manager must "save" some information in it — usually just its size. Then, when the block is free'd, the memory manager can easily tell how large it is.
Memory not belonging to process |
Memory used internally by memory allocator algorithm (usually 8-12 bytes) |
Unused memory |
Memory not belonging to process |
Memory not belonging to process |
Memory used internally by memory allocator algorithm (usually 4 bytes) |
Memory usable by program |
Memory not belonging to process |
Because of the complication of dynamic memory allocation, it is often inefficient in terms of time and/or space. Most memory allocation algorithms store some form of information with each memory block, either the block size or some relational information, such as its position in the internal tree or list structure. It is common for such "header fields" to take up one machine word in a block that is being used by the program. The obvious problem, then, is when small objects are dynamically allocated. For example, if ints were dynamically allocated, then automatically the algorithm will reserve space for the header fields as well, and we end up with a 50% waste of memory. Of course, this is a worst-case scenario. However, more modern programs are making use of small objects on the heap; and that is making this problem more and more apparent. Wilson et. al. state that an average-case memory overhead is about ten to twenty percent2. This memory overhead will grow higher as more programs use more smaller objects. It is this memory overhead that brings programs closer to the virtual limit.
In larger systems, the memory overhead is not as big of a problem (compared to the amount of time it would take to work around it), and thus is often ignored. However, there are situations where many allocations and/or deallocations of smaller objects are taking place as part of a time-critical algorithm, and in these situations, the system-supplied memory allocator is often too slow.
Simple segregated storage addresses both of these issues. Almost all memory overhead is done away with, and all allocations can take place in a small amount of (amortized) constant time. However, this is done at the loss of generality; simple segregated storage only can allocate memory chunks of a single size.
Simple Segregated Storage is the basic idea behind the Boost Pool library. Simple Segregated Storage is the simplest, and probably the fastest, memory allocation/deallocation algorithm. It begins by partitioning a memory block into fixed-size chunks. Where the block comes from is not important until implementation time. A Pool is some object that uses Simple Segregated Storage in this fashion. To illustrate:
Memory not belonging to process |
Chunk 0 |
Chunk 1 |
Chunk 2 |
Chunk 3 |
Memory not belonging to process |
Each of the chunks in any given block are always the same size. This is the fundamental restriction of Simple Segregated Storage: you cannot ask for chunks of different sizes. For example, you cannot ask a Pool of integers for a character, or a Pool of characters for an integer (assuming that characters and integers are different sizes).
Simple Segregated Storage works by interleaving a free list within the unused chunks. For example:
Memory not belonging to process |
Chunk 0; points to Chunk 1 |
Chunk 1; points to Chunk 2 |
Chunk 2; points to Chunk 3 |
Chunk 3; end-of-list |
Memory not belonging to process |
Memory not belonging to process |
Chunk 0; points to Chunk 2 |
Chunk 1 (in use by process) |
Chunk 2; end-of-list |
Chunk 3 (in use by process) |
Memory not belonging to process |
By interleaving the free list inside the chunks, each Simple Segregated Storage only has the overhead of a single pointer (the pointer to the first element in the list). It has no memory overhead for chunks that are in use by the process.
Simple Segregated Storage is also extremely fast. In the simplest case, memory allocation is merely removing the first chunk from the free list, a O(1) operation. In the case where the free list is empty, another block may have to be acquired and partitioned, which would result in an amortized O(1) time. Memory deallocation may be as simple as adding that chunk to the front of the free list, a O(1) operation. However, more complicated uses of Simple Segregated Storage may require a sorted free list, which makes deallocation O(N).
Simple Segregated Storage gives faster execution and less memory
overhead than a system-supplied allocator, but at the loss of generality. A
good place to use a Pool is in situations where many (noncontiguous) small
objects may be allocated on the heap, or if allocation and deallocation of
the same-sized objects happens repeatedly.
Pool allocators are found in many programming languages, and in many variations. The beginnings of many implementations may be found in common programming literature; some of these are given below. Note that none of these are complete implementations of a Pool; most of these leave some aspects of a Pool as a user exercise. However, in each case, even though some aspects are missing, these examples use the same underlying concept of a Simple Segregated Storage described in this document.
Revised 05 December, 2006
Copyright © 2000, 2001 Stephen Cleary (scleary AT jerviswebb DOT com)
Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)