Question about bitmap code

Mitchell Blank Jr mitch-jemalloc-discuss at
Thu Nov 8 02:17:26 PST 2012

On Mon, Nov 5, 2012 at 10:05 AM, Jason Evans <jasone at> wrote:

> You're probably right that for 8-entry bitmaps, the multi-level bitmap
> code is overkill.  However, there are combinations of heap profiling
> settings that can cause the bitmap to contain thousands of items.

OK... I haven't looked at the heap profiling code in depth yet.  I assume
you mean that it causes bin_info->nregs to be substantially higher?

The reason I'm interested in simplifying the bitmap code is that I think it
would be beneficial to squeeze more of the per-bin data into a cacheline.
On the malloc side this might not matter as much since if you're doing a
lot of allocations the arena-run you're hitting most will be cache hot.
However, the harder case is the cascading-free: some complicated object
hierarchy gets released and thousands of free()'s happen, all of objects
that have been alive for millions of cycles.  There, the bin-data access is
going to be a L2 miss, probably dominating other costs.  So keeping the
bitmap small so as much of it as possible lives in the same cache line as
arena_run_t is beneficial.

The other cacheline concern I have is aliasing.  Again, think about the
cascading free(): thousands of frees coming from dozens of different
arena_run's but in essentially random order.  The problem is that the
arena_run_t is always on a page boundary, so they will heavily alias each
other at all cache levels.  (is this true of arena_chunk_t as well?  I'm
still working my way around that code)

It might be worth moving the header to a different place in the page, i.e.
instead of having it at appear at
   ptr &~PAGE_MASK,
use something like:
   (ptr &~ PAGE_MASK) | ((ptr >> (LG_PAGE-6)) & (PAGE_MASK &~ 63))

Of course this makes computing the address of each element in the page a
little more complicated (since now some appear before the header and some
after it) but I think it could be worth it.

Or does none of this matter because the tcache insulates this from these
effects enough?  That's another area of the code I've barely poked at.

Anyway, sorry for all of the rambling -- I don't have a patch or anything,
this is just me thinking aloud while trying to understand the jemaloc3 code
better.  Hopefully I'm not sounding too idiotic.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <>

More information about the jemalloc-discuss mailing list