arenas.extend + thread.arena confusion

Wed Oct 1 17:45:10 PDT 2014

On Oct 1, 2014, at 7:37 PM, Jason Evans <jasone at canonware.com> wrote:

> On Sep 30, 2014, at 11:08 AM, D'Alessandro, Luke K <ldalessa at indiana.edu> wrote:
>> I have an application where I want every thread to have two arenas. One is use for default allocations and has access to the cache, and the other is used for private allocations through malllocx().
>> 
>> I do this by doing an arena.extend + thread.arena in each thread. The problem that I have is that jemalloc seems to reuse arena ids in this context. Essentially I get a trace that looks something like:
>> 
>> t1: t2 = pthread_create()
>> t1: new1 = arenas.extend
>> t1: old1 = thread.arena(new1)
>> t2: new2 = arenas.extend
>> t2: old2 = thread.arena(new2)
>> 
>> : old1 == old2 == 0
>> 
>> Is this behavior expected? Shouldn’t jemalloc use a fresh arena for each new thread?
> 
> When jemalloc assigns an arena to a thread, it finds the set of default (non-"extend") arenas that have the fewest assigned threads, and assigns the lowest-numbered arena in that set to the thread.  If you were to remove the "thread.arena" assignment from your test program (which is consistent with your stated purpose), you would end up with your threads assigned to arenas 0 and 1, and you would additionally have two arenas that jemalloc uses only if you specify MALLOCX_ARENA() to one of the *allocx() functions.  As it is, the test program is racey; you could end up with the created thread initially assigned to arena 1, if a context switch happened at the right time.

I can’t use the default arenas because I’m using a custom chunk allocator and I couldn't figure out a way to get jemalloc to release all of the chunks associated with an arena after setting the chunk allocator—I don’t even know what’s been allocated prior to that so I’m not sure it’s even feasible. I’d have to shift them to a different arena or something, and that seems like a lot of work.

I need fast concurrent access to the arenas managing my network-registered address space, so I’m extending the arena space, binding the chunk allocator, and then swapping to it. I wanted to use the existing arena for slower access to mostly-private, system mmap()ed data, and I don’t want these arenas shared since I’m bypassing the cache when using them.

I am working around my issue by using two extended arenas for each thread, one for the private allocations and one for the shared registered memory regions, and just leaving the “primordial” arenas to rot, which sees fine since arenas have such low overhead.

As you might imagine, I’m running into an issue where dallocx with MALLOCX_ARENA() set is caching allocations when I don’t want them to be cached. I have to embed and distribute a slightly modified jemalloc where I comment out the optimistic caching in dallocx and rallocx. A different BYPASS_CACHE flag would be useful here, but it’s not an urgent issue.

Thanks for your help. It’s a huge help to us.

Luke