Memory usage regression

Fri Oct 26 02:45:32 PDT 2012

(FYI, the message I'm quoting here is still in the list moderation queue)

On Thu, Oct 25, 2012 at 08:42:11AM +0200, Mike Hommey wrote:
> In the 3 jemalloc cases, narenas = 1 (like mozjemalloc). The chunk size
> for mozjemalloc is 1MB, which is why I tested with that chunk size as
> well, and also tried without tcache. There's still a large difference
> between jemalloc 3 and mozjemalloc with similar config.

http://i.imgur.com/UBxob.png
It turns out 1MB vs 4MB chunks are not making a great deal of a difference
on RSS, it however does on how VmData progresses. The bumps at the
beginning of each iteration is bigger with 1MB chunks.

Note: horizontal axis is not time, it's alloc operation number ;
basically the line number in the alloc log.

> I did try hacking jemalloc 3 to use the same size classes but didn't get
> much different results, although retrospectively, I think I was only
> looking at the VmData numbers, then. I'll respin, looking at VmRSS.

http://i.imgur.com/Ijf5u.png
Interestingly, jemalloc 3 with the mozjemalloc size classes uses more
RSS than jemalloc 3 (using 1MB chunks in both cases).

> > This looks pretty bad.  The only legitimate potential explanation I
> > can think of is that jemalloc now partitions dirty and clean pages
> > (and jemalloc 3 is much less aggressive than mozjemalloc about
> > purging), so it's possible to have to allocate a new chunk for a large
> > object, even though there would be enough room in an existing chunk if
> > clean and dirty available runs were coalesced.  This increases run
> > fragmentation in general, but it tends to dramatically reduce the
> > number of pages that are dirtied.  I'd like to see the output of
> > malloc_stats_print() for two adjacent points along the x axis, like
> > "After iteration 5" and its predecessor.  I'd also be curious if the
> > VM size increases continue after many grow/shrink cycles (if so, it
> > might be due to an outright bug in jemalloc).

Some more data:

http://i.imgur.com/3Q2js.png
This is zooming on the big bump at the beginning of iteration 2. Looking
at the corresponding allocation log, this corresponds to > 1MB
allocations with memalign, but turning them into mallocs doesn't change
the result, so it's not a memalign problem.

Looking more globally at the data, there is /some/ correlation with >
1MB allocations, but occasionally, 128KB allocations do trigger the same
behaviour, as well as 64KB. One interesting fact is that it's only a
limited subset of these big allocations that trigger this. The vast
majority of them don't.

For reference, the unzoomed graph looks like this:
http://i.imgur.com/PViYm.png

Mike