I have developed a non-blocking algorithm that allows one to transform an
inherently single-threaded memory allocator into a scaleable
multi-threaded
allocator. This invention allows a programmer to just create an allocator
algorithm without worrying about any multi-threading issues. Once your
done
with your single-threaded allocator, well, when you apply my non-blocking
algorithm, and PRESTO! Its now a scaleable multi-threaded allocator! Is
that
cool or what?
Okay, here is an initial proof of concept, I have transformed the Windows
Heap object created with the HEAP_NO_SERIALIZE flag. This flag disables
the
internal lock used by the heap, which makes it a heck of a lot more
efficient. However, since the heap is no longer serialized, one cannot use
it in a multi-threaded application as a general allocator; until now...
The
algorithm I created is fairly simple; basically, it goes like this:
___________________________________________________________________________
I create thread local user allocator in every thread.
Alloc request is forwarded to this thread-local user allocator directly.
If free request goes from thread that allocate block, then free request is
forwarded to this thread-local user allocator.
If free request goes from another thread, then you ac***ulate this block
in
per-thread stack-based freelist.
Blocks from this freelist are actually freed in batches when thread
allocates/deallocates another block.
___________________________________________________________________________
I mentioned this algorithm a while back:
http://groups.google.com/group/comp.programming.threads/browse_frm/thread/eb87f9cb2cb62d84
Well, I have released a pre-alpha version of it which demonstrates how to
transform the Windows Heap object under GPL license; the example
source-code
can be found here:
http://webpages.charter.net/appcore/vzoom/malloc/vzmalloc_win_v000_c.html
Can you please run the test as-is, which uses stock malloc impl, and
record
the output... Then uncomment the following line:
#define USE_MEM_ALLOC
which engages my algorithm; run the test again; and post the differences
in
timings between the two. I think you will be fairly surprised at the
performance enhancement. The lock in the Windows Heap object that `malloc'
is based on severely damages its scalability when used in a multi-threaded
environment. When `USE_MEM_ALLOC' is defined, ALL of the locking is
removed
and the heaps are created on a per-thread basis. This distribution and
clever sup****t algorithm I invented allows for the Windows Heap to scale
quite nicely indeed.
I am going to tweak the algorithm such that one could plug in different
allocators on a per-thread basis! In other words, thread A can use a
different allocation algorithm than thread B, and so on. My algorithm
actually sup****ts this. I think the possibilities are truly amazing. It
even
allows thread A to allocate and free in thread B even though they use
completely different allocation algorithm. That's pretty darn cool. Some
may
think its even impossible; its not!
BTW, I am thinking that this algorithm might be able to be used to
transform
other single-threaded things besides allocators... Humm, need to think
some
more on that.


|