jemalloc 3 performance vs. mozjemalloc

Mike Hommey mh at glandium.org
Tue Feb 3 16:00:11 PST 2015


On Wed, Feb 04, 2015 at 07:51:17AM +0900, Mike Hommey wrote:
> Hi,
> 
> I've been tracking a startup time regression in Firefox for Android when
> we tried to switch from mozjemalloc (memory refresher: it's derived from
> jemalloc 0.9) to mostly current jemalloc dev.
> 
> It turned out to be https://github.com/jemalloc/jemalloc/pull/192 but in
> the process I found a few interesting things that I thought are worth
> mentioning:
> 
> - Several changesets between 3.6 and current dev made the number of
>   instructions as reported by perf stat on GNU/Linux x86-64 increase
>   significantly, on a ~200k alloc/dealloc testcase that does nothing
>   else[1]:
>   - 5460aa6f6676c7f253bfcb75c028dfd38cae8aaf made the count go from
>   69M to 76M.
>   - 6ef80d68f092caf3b3802a73b8d716057b41864c from 76M to 81.5M
>   - 4dcf04bfc03b9e9eb50015a8fc8735de28c23090 from 81.5M to 85M
>   - 155bfa7da18cab0d21d87aa2dce4554166836f5d from 85M to 88M
>   I didn't investigate further because it was a red herring as far as
>   the regression I was tracking was concerned.
> 
> - The average number of mutex lock per alloc/dealloc is close to 1 with
>   mozjemalloc (1.001), but 1.13 with jemalloc 3 (same testcase as above).
>   Fortunately, contention is likely lower (I measured it to be lower, but
>   the instrumentation had so much overhead that it may have skewed the
>   results), but pthread_mutex_lock/unlock are not free as far as
>   instruction count is concerned.

Forgot to mention, this is with tcache disabled. Tcache does make
instruction count significantly lower and does much less mutex locking,
but at the cost of more memory overhead. We'll investigate the
tradeoffs, but we're not ready for that yet.

Mike


More information about the jemalloc-discuss mailing list