...one of the most highly
regarded and expertly designed C++ library projects in the
world.
— Herb Sutter and Andrei
Alexandrescu, C++
Coding Standards
Review the concepts document if you are not already familiar with it. Remember that block is a contiguous section of memory, which is partitioned or segregated into fixed-size chunks. These chunks are what are allocated and deallocated by the user.
Each Pool has a single free list that can extend over a number of memory blocks. Thus, Pool also has a linked list of allocated memory blocks. Each memory block, by default, is allocated using new[], and all memory blocks are freed on destruction. It is the use of new[] that allows us to guarantee alignment.
Each block of memory is allocated as a POD type (specifically, an array of characters) through operator new[]. Let POD_size be the number of characters allocated.
This follows from the following quote:
[5.3.3/2] (Expressions::Unary expressions::Sizeof) "... When applied to an array, the result is the total number of bytes in the array. This implies that the size of an array of n elements is n times the size of an element."
Therefore, arrays cannot contain padding, though the elements within the arrays may contain padding.
This follows from:
Note that an object of that size can exist. One object of that size is an array of the "actual" objects.
Note that the block is properly aligned for an Element. This directly follows from Predicate 2.
This follows from Predicates 1 and 2, and the following quote:
[3.9/9] (Basic concepts::Types) "An object type is a (possibly cv-qualified) type that is not a function type, not a reference type, and not a void type." (Specifically, array types are object types.)
There are no quotes from the Standard to directly support this argument, but it fits the common conception of the meaning of "alignment".
Note that the conditions for p + i being well-defined are outlined in [5.7/5]. We do not quote that here, but only make note that it is well-defined if p and p + i both point into or one past the same array.
This follows naturally, since the memory block is an array of Elements, and for each n, sizeof(Element) % sizeof(Tn) == 0; thus, the boundary of each element in the array of Elements is also a boundary of each element in each array of Tn.
Since pe + i is well-defined, then by Corollary 3, pn + jn is well-defined. It is properly aligned from Predicate 2 and Corollaries 1 and 2.
The proof above covers alignment requirements for cutting chunks out of a block. The implementation uses actual object sizes of:
Each block also contains a pointer to the next block; but that is stored as a pointer to void and cast when necessary, to simplify alignment requirements to the three types above.
Therefore, alloc_size is defined to be the lcm of the sizes of the three types above.
Each memory block consists of three main sections. The first section is the part that chunks are cut out of, and contains the interleaved free list. The second section is the pointer to the next block, and the third section is the size of the next block.
Each of these sections may contain padding as necessary to guarantee alignment for each of the next sections. The size of the first section is number_of_chunks * lcm(requested_size, sizeof(void *), sizeof(size_type)); the size of the second section is lcm(sizeof(void *), sizeof(size_type); and the size of the third section is sizeof(size_type).
Here's an example memory block, where requested_size == sizeof(void *) == sizeof(size_type) == 4:
Sections | size_type alignment | void * alignment | requested_size alignment |
---|---|---|---|
Memory not belonging to process | |||
Chunks section (16 bytes) | (4 bytes) | FLP for Chunk 1 (4 bytes) | Chunk 1 (4 bytes) |
(4 bytes) | FLP for Chunk 2 (4 bytes) | Chunk 2 (4 bytes) | |
(4 bytes) | FLP for Chunk 3 (4 bytes) | Chunk 3 (4 bytes) | |
(4 bytes) | FLP for Chunk 4 (4 bytes) | Chunk 4 (4 bytes) | |
Pointer to next Block (4 bytes) | (4 bytes) | Pointer to next Block (4 bytes) | |
Size of next Block (4 bytes) | Size of next Block (4 bytes) | ||
Memory not belonging to process |
To show a visual example of possible padding, here's an example memory block where requested_size == 8 and sizeof(void *) == sizeof(size_type) == 4:
Sections | size_type alignment | void * alignment | requested_size alignment |
---|---|---|---|
Memory not belonging to process | |||
Chunks section (32 bytes) | (4 bytes) | FLP for Chunk 1 (4 bytes) | Chunk 1 (8 bytes) |
(4 bytes) | (4 bytes) | ||
(4 bytes) | FLP for Chunk 2 (4 bytes) | Chunk 2 (8 bytes) | |
(4 bytes) | (4 bytes) | ||
(4 bytes) | FLP for Chunk 3 (4 bytes) | Chunk 3 (8 bytes) | |
(4 bytes) | (4 bytes) | ||
(4 bytes) | FLP for Chunk 4 (4 bytes) | Chunk 4 (8 bytes) | |
(4 bytes) | (4 bytes) | ||
Pointer to next Block (4 bytes) | (4 bytes) | Pointer to next Block (4 bytes) | |
Size of next Block (4 bytes) | Size of next Block (4 bytes) | ||
Memory not belonging to process |
Finally, here is a convoluted example where the requested_size is 7, sizeof(void *) == 3, and sizeof(size_type) == 5, showing how the least common multiple guarantees alignment requirements even in the oddest of circumstances:
Sections | size_type alignment | void * alignment | requested_size alignment |
---|---|---|---|
Memory not belonging to process | |||
Chunks section (210 bytes) | (5 bytes) | Interleaved free list pointer for Chunk 1 (15 bytes; 3 used) | Chunk 1 (105 bytes; 7 used) |
(5 bytes) | |||
(5 bytes) | |||
(5 bytes) | (15 bytes) | ||
(5 bytes) | |||
(5 bytes) | |||
(5 bytes) | (15 bytes) | ||
(5 bytes) | |||
(5 bytes) | |||
(5 bytes) | (15 bytes) | ||
(5 bytes) | |||
(5 bytes) | |||
(5 bytes) | (15 bytes) | ||
(5 bytes) | |||
(5 bytes) | |||
(5 bytes) | (15 bytes) | ||
(5 bytes) | |||
(5 bytes) | |||
(5 bytes) | (15 bytes) | ||
(5 bytes) | |||
(5 bytes) | |||
(5 bytes) | Interleaved free list pointer for Chunk 2 (15 bytes; 3 used) | Chunk 2 (105 bytes; 7 used) | |
(5 bytes) | |||
(5 bytes) | |||
(5 bytes) | (15 bytes) | ||
(5 bytes) | |||
(5 bytes) | |||
(5 bytes) | (15 bytes) | ||
(5 bytes) | |||
(5 bytes) | |||
(5 bytes) | (15 bytes) | ||
(5 bytes) | |||
(5 bytes) | |||
(5 bytes) | (15 bytes) | ||
(5 bytes) | |||
(5 bytes) | |||
(5 bytes) | (15 bytes) | ||
(5 bytes) | |||
(5 bytes) | |||
(5 bytes) | (15 bytes) | ||
(5 bytes) | |||
(5 bytes) | |||
Pointer to next Block (15 bytes; 3 used) | (5 bytes) | Pointer to next Block (15 bytes; 3 used) | |
(5 bytes) | |||
(5 bytes) | |||
Size of next Block (5 bytes; 5 used) | Size of next Block (5 bytes; 5 used) | ||
Memory not belonging to process |
The theorem above guarantees all alignment requirements for allocating chunks and also implementation details such as the interleaved free list. However, it does so by adding padding when necessary; therefore, we have to treat allocations of contiguous chunks in a different way.
Using array arguments similar to the above, we can translate any request for contiguous memory for n objects of requested_size into a request for m contiguous chunks. m is simply ceil(n * requested_size / alloc_size), where alloc_size is the actual size of the chunks. To illustrate:
Here's an example memory block, where requested_size == 1 and sizeof(void *) == sizeof(size_type) == 4:
Sections | size_type alignment | void * alignment | requested_size alignment |
---|---|---|---|
Memory not belonging to process | |||
Chunks section (16 bytes) | (4 bytes) | FLP to Chunk 2 (4 bytes) | Chunk 1 (4 bytes) |
(4 bytes) | FLP to Chunk 3 (4 bytes) | Chunk 2 (4 bytes) | |
(4 bytes) | FLP to Chunk 4 (4 bytes) | Chunk 3 (4 bytes) | |
(4 bytes) | FLP to end-of-list (4 bytes) | Chunk 4 (4 bytes) | |
Pointer to next Block (4 bytes) | (4 bytes) | Ptr to end-of-list (4 bytes) | |
Size of next Block (4 bytes) | 0 (4 bytes) | ||
Memory not belonging to process |
Sections | size_type alignment | void * alignment | requested_size alignment |
---|---|---|---|
Memory not belonging to process | |||
Chunks section (16 bytes) | (4 bytes) | (4 bytes) | 4 bytes in use by program |
(4 bytes) | (4 bytes) | 3 bytes in use by program (1 byte unused) | |
(4 bytes) | FLP to Chunk 4 (4 bytes) | Chunk 3 (4 bytes) | |
(4 bytes) | FLP to end-of-list (4 bytes) | Chunk 4 (4 bytes) | |
Pointer to next Block (4 bytes) | (4 bytes) | Ptr to end-of-list (4 bytes) | |
Size of next Block (4 bytes) | 0 (4 bytes) | ||
Memory not belonging to process |
Then, when the user deallocates the contiguous memory, we can split it up into chunks again.
Note that the implementation provided for allocating contiguous chunks uses a linear instead of quadratic algorithm. This means that it may not find contiguous free chunks if the free list is not ordered. Thus, it is recommended to always use an ordered free list when dealing with contiguous allocation of chunks. (In the example above, if Chunk 1 pointed to Chunk 3 pointed to Chunk 2 pointed to Chunk 4, instead of being in order, the contiguous allocation algorithm would have failed to find any of the contiguous chunks).
Revised 05 December, 2006
Copyright © 2000, 2001 Stephen Cleary (scleary AT jerviswebb DOT com)
Distributed under the Boost Software License, Version 1.0. (See accompanying file LICENSE_1_0.txt or copy at http://www.boost.org/LICENSE_1_0.txt)