On Mon, Nov 5, 2012 at 10:05 AM, Jason Evans <<a href="mailto:jasone@canonware.com" target="_blank">jasone@canonware.com</a>> wrote: <div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex"> You're probably right that for 8-entry bitmaps, the multi-level bitmap code is overkill. However, there are combinations of heap profiling settings that can cause the bitmap to contain thousands of items. </blockquote></div> OK... I haven't looked at the heap profiling code in depth yet. I assume you mean that it causes bin_info->nregs to be substantially higher? The reason I'm interested in simplifying the bitmap code is that I think it would be beneficial to squeeze more of the per-bin data into a cacheline. On the malloc side this might not matter as much since if you're doing a lot of allocations the arena-run you're hitting most will be cache hot. However, the harder case is the cascading-free: some complicated object hierarchy gets released and thousands of free()'s happen, all of objects that have been alive for millions of cycles. There, the bin-data access is going to be a L2 miss, probably dominating other costs. So keeping the bitmap small so as much of it as possible lives in the same cache line as arena_run_t is beneficial. The other cacheline concern I have is aliasing. Again, think about the cascading free(): thousands of frees coming from dozens of different arena_run's but in essentially random order. The problem is that the arena_run_t is always on a page boundary, so they will heavily alias each other at all cache levels. (is this true of arena_chunk_t as well? I'm still working my way around that code) It might be worth moving the header to a different place in the page, i.e. instead of having it at appear at ptr &~PAGE_MASK, use something like: (ptr &~ PAGE_MASK) | ((ptr >> (LG_PAGE-6)) & (PAGE_MASK &~ 63)) Of course this makes computing the address of each element in the page a little more complicated (since now some appear before the header and some after it) but I think it could be worth it. Or does none of this matter because the tcache insulates this from these effects enough? That's another area of the code I've barely poked at. Anyway, sorry for all of the rambling -- I don't have a patch or anything, this is just me thinking aloud while trying to understand the jemaloc3 code better. Hopefully I'm not sounding too idiotic. -Mitch