jemalloc 3 performance vs. mozjemalloc

Wed Feb 18 01:34:26 PST 2015

On Wed, Feb 04, 2015 at 11:54:32AM +0900, Mike Hommey wrote:
> On Wed, Feb 04, 2015 at 07:51:17AM +0900, Mike Hommey wrote:
> > Hi,
> > 
> > I've been tracking a startup time regression in Firefox for Android when
> > we tried to switch from mozjemalloc (memory refresher: it's derived from
> > jemalloc 0.9) to mostly current jemalloc dev.
> > 
> > It turned out to be https://github.com/jemalloc/jemalloc/pull/192
> 
> *sigh* and sadly, this doesn't fix it all :(

So, it /might/ be related to the size classes. I don't have all results
yet, but it looks like I'm getting good results with #192,
--with-lg-quantum=4, --with-lg-tiny-min=2 and replacing size2index,
index2size and s2u so that jemalloc3 uses the same size classes as
mozjemalloc (IOW, a very bastardized jemalloc3)

If that happens to be true, I'll dig deeper as to what particular size
classes changes are making a difference.

In the meanwhile I spotted something "interesting" in size_classes.sh:
it generates size classes for large allocations up to close to the
entire size of the address space, which, apart from being completely
unrealistic on 64 bits, and kind of crazy on 32 bits, is completely
useless since those classes are not ever used, since jemalloc switches
to huge allocations for size >= chunksize (simplifying a bit), and
AFAICT, huge allocations don't rely on size2index/index2size.

So in practice, only 68 of the 108 classes on 32 bits systems are ever
used with 4MB chunks, and 68 of the 236 classes on 64 bits systems.

Sure, there is no limit to lg_chunk, but there probably should.

Mike