jemalloc tuning help

Nikhil Bhatia nbhatia at vmware.com
Mon Dec 9 17:03:09 PST 2013


Thanks Jason & Robert for your analysis & time. 

>From these results its fairly evident that I have work ahead to reduce the 
long-lived allocations or use different arenas for long/short lived objects 
using the new ALLOCM_ARENA API in jemalloc. Shall update once 
I figure out any one or both of these. 

Best Regards, 
Nikhil 

----- Original Message -----

From: "Robert Mowry" <Robert.Mowry at netapp.com> 
To: "Jason Evans" <jasone at canonware.com>, "Nikhil Bhatia" <nbhatia at vmware.com> 
Cc: jemalloc-discuss at canonware.com 
Sent: Thursday, November 14, 2013 6:15:46 PM 
Subject: Re: jemalloc tuning help 

I just want to reiterate what Jason has said below. I recently spent several months trying to reduce the amount of memory used by one of our applications. We were seeing efficiency ratings for the heap in the 50-60% range (in terms of VM use vs outstanding buffers used by the app). 

In our case it was relatively easy to segregate one of the largest offenders (a periodic thread that consumes large amounts of heap and then frees it when finished). This resulted in a very large efficiency gain (now closer to 90%). If you are able to segregate long lived allocations I don't think it matters how many transient arenas you have configured because over time they'll empty themselves. 

Also, another use for arenas we are interested in trying but haven't explored is fault isolation. Again this will depend a bit upon your application, but one idea is to assign a problem thread or module its own arena in order to pinpoint the source of memory corruption issues. In reduced memory environments tools like valgrind aren't always an option so something much lighter weight like thread specific arenas seem likely to be more viable. 

We are using a fairly old version of jemalloc. I'm happy to see that the newer version has official support for this type of segregation. In the version we are using we also had to modify the code that detects when there's contention for a specific arena and allows threads to use alternate arenas. We needed complete isolation of the one arena to see the efficiency gains noted above. 

I also want to apologize to Jason. He's clearly spent a great deal of time optimizing the performance of jemalloc. Those of us operating in limited memory environments start off by disabling much of his hard work :) 

From: Jason Evans < jasone at canonware.com > 
Date: Thursday, November 14, 2013 8:20 PM 
To: Nikhil Bhatia < nbhatia at vmware.com > 
Cc: " jemalloc-discuss at canonware.com " < jemalloc-discuss at canonware.com > 
Subject: Re: jemalloc tuning help 



You can potentially mitigate the problem by reducing the number of arenas (only helps if per thread memory usage spikes are uncorrelated). Another possibility is to segregate short- and long-lived objects into different arenas, but this requires that you have reliable (and ideally stable) knowledge of object lifetimes. In practice, segregation is usually very difficult to maintain. If you choose to go this direction, take a look at the "arenas.extend" mallctl (for creating an arena that contains long-lived objects), and the ALLOCM_ARENA(a) macro argument to the [r]allocm() functions. 



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://jemalloc.net/mailman/jemalloc-discuss/attachments/20131209/7a034598/attachment.html>


More information about the jemalloc-discuss mailing list