jemalloc 3 performance vs. mozjemalloc
jasone at canonware.com
Tue Feb 3 16:22:35 PST 2015
On Feb 3, 2015, at 4:00 PM, Mike Hommey <mh at glandium.org> wrote:
> On Wed, Feb 04, 2015 at 07:51:17AM +0900, Mike Hommey wrote:
>> - The average number of mutex lock per alloc/dealloc is close to 1 with
>> mozjemalloc (1.001), but 1.13 with jemalloc 3 (same testcase as above).
>> Fortunately, contention is likely lower (I measured it to be lower, but
>> the instrumentation had so much overhead that it may have skewed the
>> results), but pthread_mutex_lock/unlock are not free as far as
>> instruction count is concerned.
> Forgot to mention, this is with tcache disabled. Tcache does make
> instruction count significantly lower and does much less mutex locking,
> but at the cost of more memory overhead. We'll investigate the
> tradeoffs, but we're not ready for that yet.
Oh! mozjemalloc only has one mutex per arena, whereas jemalloc 1+ has per bin mutexes as well. In the fast path only the bin mutex is needed for a small allocation/deallocation, but if a page run has to be allocated/deallocated, additional locking occurs. In the absence of tcache this increase in locking makes sense, though it's a bit higher than I'd normally expect.
More information about the jemalloc-discuss