"Eric Sosman" <Eric.Sosman@[EMAIL PROTECTED]
> wrote in message
news:1222959200.617375@[EMAIL PROTECTED]
> Chris M. Thomasson wrote:
>> I was thinking of an idea of expanding a current limitation in some of
my
>> allocators which sup****t the use of header-less blocks. The way I
>> currently do it is to simply align chunks on a fairly large boundary,
say
>> 16 pages. These chunks contain a header at the head. Layout is simple:
>>
>> [H][CHUNK]
>>
>> A 16-page chunk would be aligned on a 16-page chunk boundary. If a
>> header-less block needs to access its header, it simple rounds its
>> address up to a chunk boundary, and subtracts the size of the chunk
plus
>> the size of the header. This has a limitation that the largest memory
>> allocation can't be larger than the size of a chunk. To overcome this,
I
>> was thinking about generating a unique GUID per-computer, and using
that
>> as a header identifier. The header would store the GUID right after it.
>> So, the layout for a multi-chunk allocation would look like:
>>
>>
>> [H][GUID][CHUNK1][CHUNK2]
>>
>>
>> Now, when a header-less block needs to find its chunk header it simply,
>> rounds its address up to a chunk boundary; subtracts the size of the
>> chunk plus the size of the GUID; compares the existing memory contents
>> with the GUID; if it does not find a match, it subtracts the size of
the
>> a chunk minus the size of the GUID and repeats the comparison. This
>> process is repeated until a match is hit. After that, it subtracts the
>> size of the header, and bingo, it has a pointer to its chunk header.
>>
>> The process is simple but is has a caveat! If for some reason, the
memory
>> contents of a chunk that is not the true header just happens to contain
>> data which matches the GUID, well, BOOOOOM!!!!!!! Oh ****%!!! The
program
>> will be royally screwed... OUCH!
>>
>>
>> First of all, what do you think of the scheme?
>
> A couple questions arise right away. First, what information
> does this header provide; why do you need it; what's it for?
The most im****tant thing, is that it contains a pointer to what thread
currently owns the chunk. I do this so that allocation's on a non-empty
chunk do no need any synchronization whatsoever. Also, local frees do not
need any sync. Remote frees need a single atomic RMW, and local allocation
boundary conditions, such as chunk-empty, needs a single atomic RMW. Its
highly scaleable design. Here is brief pseudo-code for such an algorithm:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/8245f4b48591fc69
This is not optimized because each block has a header. I can get rid of
the
header by using the following:
http://groups.google.com/group/comp.programming.threads/msg/bc019dc04362be41
http://groups.google.com/group/comp.programming.threads/msg/1d3aeaa6ebf5181f
As you can see, this technique gives 100% header-less blocks.
> Second,
> how does a chunk discover its own size? Are all chunks the same
> fixed size,
Small chunks would all be the same size. Large chunks would be any size >
the small chunk size. I would use small chunks in per-thread heap, and
large
chunks in global heap. When a thread is created it requests some chunks
from
the "global chunk server" and builds a segerated bucket based heap. Or, I
could do a region allocator and have a single doubly linked list of heaps.
BTW, here is fully working example code of such a per-thread region
allocator:
http://webpages.charter.net/appcore/vzoom/malloc/sergey_vzmem_thread.html
In this case, std::malloc is the global chunk server. This allocator has
100% header-less blocks. Blocks can be variant sizes, as long as their
less
than the region size; the `THREAD_ALLOCATOR_HEAP_SIZE' macro determines
this
fixed size. This is the problem I am trying to overcome. I want to be able
to allocate blocks larger than the region size in this allocator. I can do
it like this:
http://groups.google.com/group/comp.programming.threads/msg/12f4a1fc1adbe993
But it has limitations. I think I can use the GUID search HACK to go ahead
and get around those...
> or is the size stored somewhere in the chunk -- and if
> you're already storing metadata in the chunk, why not store the
> rest of the header there, too?
Typically, I would store the following in a chunk header for a slab-based
allocator with fixed sized chunks:
struct chunk {
struct chunk* next;
struct chunk* prev;
per_thread* thread;
void* blocks;
size_t blksize;
};
`chunk::thread' points to the thread which owns it. `chunk::blocks' is a
LIFO of unused memory. `chunk::blksize' points to the block sizes. For a
region allocator the header can be like:
struct chunk {
struct chunk* next;
struct chunk* prev;
per_thread* thread;
unsigned char* base_mem;
size_t offset;
};
where `chunk::base_mem + chunk::offset' is the next available memory. The
advantage is that blocks can be any size < chunk size. The advantage the
slab-allocator has over region is that freed blocks can be reused
immediately. A disadvantage slabs have is that memory sizes requested by
the
application will be rounded up to `sizeof(void*). That is any size that is
less than sizeof(void*) will be set to sizeof(void*), all sizes greater
than
that stay the same. But, you already know the differences between slab and
region allocation, so enough with that.
> An observation: The GUID doesn't seem to add much value. Any
> unlikely signature would do just as well -- that is to say, just as
> poorly.
Its not bullet-proof in any way shape or form! However, I can't really
think
of any other way except the method I laied out for Dmitriy. Humm... I need
some clever suggestions!
> A counter-question: Why must headers be adjacent to chunks?
A block of memory needs to be able to get at this header using just the
address to itself. Remember, its blocks are header-less. However, turns
out
that an address to a block is all the data it needs in order to get its
containing chunks header. The algorithm is simple... The block rounds its
address up to the size of a chunk boundary, then subtracts the size of the
chunk plus the size of chunks header. Bingo! It has a pointer to the
chunks
header. For slab-allocator the block can then figure out its size by using
the `chunk::blksize' member. For region allocator, well, the block could
care less how big it is.
> Would
> it help if they lived in their own dedicated data structure? I've used
> skip-lists in a separate area for this purpose as part of a malloc()
> intended for debugging; keeping the metadata apart from the allocated
> data helps limit the damage from common programming errors.
How could a header-less block get at these external chunk headers without
any synchronization whatsoever? Or, better question, how would you
efficiently map a chunk to these external headers such that no sync was
needed? In my scheme small chunk's (16 pages) headers are owned on a
per-thread basis; thus eliminating the need for synchronization on nearly
all cases. The exception is when a block grabs its chunks header during
the
freeing process and realizes that the thread which owns the chunk is not
the
one calling `free()'. This requires the use of a single atomic RMW. For
slab-allocator this would be single-width CAS, for regions it would be
fetch-and-add.
>> Also, are there better ways to sup****t 100% header-less blocks in a
>> memory allocator? IMVHO, header-less blocks are a MAJOR plus when
>> designing highly efficient memory allocation implementations. Think
about
>> it... A 4-byte allocation actually means 4-bytes are allocated
>> internally. No 4-bytes plus internal block header information! That's
>> crap and totally wastes memory on a per-allocation basis; which can
>> quickly add up indeed.
>
> Well, you need *some* way to store the size information, and maybe
> other information as well. I suppose you could use the block address
> to encode size: A block address of binary ...xxx100 would indicate a
> size of four bytes, ...xx1000 would indicate eight bytes, and so on.
> Fragmentation would be horrific, though.
I am rounding block address and perform subtraction to get at its
containing
chunks header. It works. I posted some working example code, perhaps you
should take a quick look at it. Any questions are welcome indeed.
>> There is a way to minimize the GUID scheme, but it requires two
sections
>> of virtual memory which can be wasteful. It involves block sizes <=
chunk
>> size to be allocated from a separate memory pool than block sizes that
>> are > chunk size. The former can use the normal scheme of simple
rounding
>> up and subtracting to get chunk header... However, the latter would
need
>> to use the GUID search algorithm.
>
> You haven't described your allocator's purposes and constraints
> fully enough for me to follow this paragraph.
I hope I have begun to start describing it in some more detail; how am I
doing? Should I give full high-level pseudo-code of entire allocator
algorithm I am thinking of?
;^|


|