jemalloc 3 performance vs. mozjemalloc

Mike Hommey mh at glandium.org
Tue Feb 3 14:51:17 PST 2015


Hi,

I've been tracking a startup time regression in Firefox for Android when
we tried to switch from mozjemalloc (memory refresher: it's derived from
jemalloc 0.9) to mostly current jemalloc dev.

It turned out to be https://github.com/jemalloc/jemalloc/pull/192 but in
the process I found a few interesting things that I thought are worth
mentioning:

- Several changesets between 3.6 and current dev made the number of
  instructions as reported by perf stat on GNU/Linux x86-64 increase
  significantly, on a ~200k alloc/dealloc testcase that does nothing
  else[1]:
  - 5460aa6f6676c7f253bfcb75c028dfd38cae8aaf made the count go from
  69M to 76M.
  - 6ef80d68f092caf3b3802a73b8d716057b41864c from 76M to 81.5M
  - 4dcf04bfc03b9e9eb50015a8fc8735de28c23090 from 81.5M to 85M
  - 155bfa7da18cab0d21d87aa2dce4554166836f5d from 85M to 88M
  I didn't investigate further because it was a red herring as far as
  the regression I was tracking was concerned.

- The average number of mutex lock per alloc/dealloc is close to 1 with
  mozjemalloc (1.001), but 1.13 with jemalloc 3 (same testcase as above).
  Fortunately, contention is likely lower (I measured it to be lower, but
  the instrumentation had so much overhead that it may have skewed the
  results), but pthread_mutex_lock/unlock are not free as far as
  instruction count is concerned.

Cheers,

Mike

1. That testcase is derived from a dump of the allocations happening
during a Firefox for Android startup.


More information about the jemalloc-discuss mailing list