Memory usage regression

Tue Oct 30 12:49:27 PDT 2012

On Oct 30, 2012, at 9:03 AM, Mike Hommey wrote:
> On Tue, Oct 30, 2012 at 04:36:58PM +0100, Mike Hommey wrote:
>> On Tue, Oct 30, 2012 at 04:35:02PM +0100, Mike Hommey wrote:
>>> On Fri, Oct 26, 2012 at 06:10:13PM +0200, Mike Hommey wrote:
>>>>>>> For reference, the unzoomed graph looks like this:
>>>>>>> http://i.imgur.com/PViYm.png
>>>>>> 
>>>>>> I rediscovered --enable-munmap, and tried again with that, thinking it
>>>>>> could be related, and it did change something, but it's still growing:
>>>>>> http://i.imgur.com/lWzhG.png
>>>>> 
>>>>> Needless to say, the increases I was observing closely on the the zoomed
>>>>> graph without a matching decrease was entirely due to munmap. Now I need
>>>>> to find the remainder...
>>>> 
>>>> I tested size class independently, and none would cause the VM leak
>>>> alone. Combining small and large classes do, but large + huge or small +
>>>> huge don't.
>>> 
>>> Some more data: all non-unmapped chunks *are* used to some extent. The
>>> following is a dump of the number of requested and usable bytes in each
>>> chunk ; that's 18M spread across 600M... that sounds like a really bad
>>> case of fragmentation.
>> 
>> BTW, it does seem to grow forever: I went up to 1.3GB with more
>> iterations before stopping.
> 
> So, what seems to be happening is that because of that fragmentation, when
> requesting big allocations, jemalloc has to allocate and use new chunks.
> When these big allocations are freed, the new chunk tends to be used
> more often than the other free chunks, adding to the fragmentation, thus
> requiring more new chunks for big allocations.

The preference for allocating dirty runs was a solution to excessive dirty page purging.  However, the purging policy (as of jemalloc 3.0.0) is round-robin, justified only as a strategy for allowing dirty pages to accumulate in chunks before going to the considerable effort (including arena mutex operations) of scanning a chunk for dirty pages.  In retrospect I'm thinking maybe this was a bad choice, and that we should go back to scanning downward through memory to purge dirty pages.  The danger is that the linear scanning overhead for scanning each chunk will cause a measurable performance degradation if high chunks routinely have many runs, only a few of which are unused dirty runs.  I think that problem can be solved with slightly more sophisticated hysteresis though.

I'll work on a diff for you to test, and see how it affects Firefox.  I'll do some testing with Facebook server loads too (quite different behavior from Firefox).  If this causes a major reduction in virtual memory usage for both workloads, it's probably the right thing to do, even speed-wise.

Thanks,
Jason