jemalloc 3 performance vs. mozjemalloc
Mike Hommey
mh at glandium.org
Tue Feb 3 14:51:17 PST 2015
Hi,
I've been tracking a startup time regression in Firefox for Android when
we tried to switch from mozjemalloc (memory refresher: it's derived from
jemalloc 0.9) to mostly current jemalloc dev.
It turned out to be https://github.com/jemalloc/jemalloc/pull/192 but in
the process I found a few interesting things that I thought are worth
mentioning:
- Several changesets between 3.6 and current dev made the number of
instructions as reported by perf stat on GNU/Linux x86-64 increase
significantly, on a ~200k alloc/dealloc testcase that does nothing
else[1]:
- 5460aa6f6676c7f253bfcb75c028dfd38cae8aaf made the count go from
69M to 76M.
- 6ef80d68f092caf3b3802a73b8d716057b41864c from 76M to 81.5M
- 4dcf04bfc03b9e9eb50015a8fc8735de28c23090 from 81.5M to 85M
- 155bfa7da18cab0d21d87aa2dce4554166836f5d from 85M to 88M
I didn't investigate further because it was a red herring as far as
the regression I was tracking was concerned.
- The average number of mutex lock per alloc/dealloc is close to 1 with
mozjemalloc (1.001), but 1.13 with jemalloc 3 (same testcase as above).
Fortunately, contention is likely lower (I measured it to be lower, but
the instrumentation had so much overhead that it may have skewed the
results), but pthread_mutex_lock/unlock are not free as far as
instruction count is concerned.
Cheers,
Mike
1. That testcase is derived from a dump of the allocations happening
during a Firefox for Android startup.
More information about the jemalloc-discuss
mailing list