<html><body><div style="font-family: times new roman, new york, times, serif; font-size: 12pt; color: #000000"><div>Thanks Jason & Robert for your analysis & time. </div><div><br></div><div>From these results its fairly evident that I have work ahead to reduce the </div><div>long-lived allocations or use different arenas for long/short lived objects</div><div>using the new ALLOCM_ARENA API in jemalloc. Shall update once </div><div>I figure out any one or both of these. </div><div><br></div><div>Best Regards,</div><div>Nikhil</div><div><br></div><hr id="zwchr"><div style="color:#000;font-weight:normal;font-style:normal;text-decoration:none;font-family:Helvetica,Arial,sans-serif;font-size:12pt;" data-mce-style="color: #000; font-weight: normal; font-style: normal; text-decoration: none; font-family: Helvetica,Arial,sans-serif; font-size: 12pt;"><b>From: </b>"Robert Mowry" <Robert.Mowry@netapp.com><br><b>To: </b>"Jason Evans" <jasone@canonware.com>, "Nikhil Bhatia" <nbhatia@vmware.com><br><b>Cc: </b>jemalloc-discuss@canonware.com<br><b>Sent: </b>Thursday, November 14, 2013 6:15:46 PM<br><b>Subject: </b>Re: jemalloc tuning help<br><div><br></div><div>I just want to reiterate what Jason has said below. I recently spent several months trying to reduce the amount of memory used by one of our applications. We were seeing efficiency ratings for the heap in the 50-60% range (in terms of VM use vs outstanding buffers used by the app).</div><div><br></div><div>In our case it was relatively easy to segregate one of the largest offenders (a periodic thread that consumes large amounts of heap and then frees it when finished). This resulted in a very large efficiency gain (now closer to 90%). If you are able to segregate long lived allocations I don't think it matters how many transient arenas you have configured because over time they'll empty themselves.</div><div><br></div><div>Also, another use for arenas we are interested in trying but haven't explored is fault isolation. Again this will depend a bit upon your application, but one idea is to assign a problem thread or module its own arena in order to pinpoint the source of memory corruption issues. In reduced memory environments tools like valgrind aren't always an option so something much lighter weight like thread specific arenas seem likely to be more viable.</div><div><br></div><div>We are using a fairly old version of jemalloc. I'm happy to see that the newer version has official support for this type of segregation. In the version we are using we also had to modify the code that detects when there's contention for a specific arena and allows threads to use alternate arenas. We needed complete isolation of the one arena to see the efficiency gains noted above.</div><div><br></div><div>I also want to apologize to Jason. He's clearly spent a great deal of time optimizing the performance of jemalloc. Those of us operating in limited memory environments start off by disabling much of his hard work :)</div><div><br></div><div style="font-family:Calibri; font-size:11pt; text-align:left; color:black; BORDER-BOTTOM: medium none; BORDER-LEFT: medium none; PADDING-BOTTOM: 0in; PADDING-LEFT: 0in; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; BORDER-RIGHT: medium none; PADDING-TOP: 3pt" data-mce-style="font-family: Calibri; font-size: 11pt; text-align: left; color: black; border-bottom: medium none; border-left: medium none; padding-bottom: 0in; padding-left: 0in; padding-right: 0in; border-top: #b5c4df 1pt solid; border-right: medium none; padding-top: 3pt;"><span style="font-weight:bold" data-mce-style="font-weight: bold;">From: </span>Jason Evans <<a href="mailto:jasone@canonware.com" target="_blank" data-mce-href="mailto:jasone@canonware.com">jasone@canonware.com</a>><br> <span style="font-weight:bold" data-mce-style="font-weight: bold;">Date: </span>Thursday, November 14, 2013 8:20 PM<br> <span style="font-weight:bold" data-mce-style="font-weight: bold;">To: </span>Nikhil Bhatia <<a href="mailto:nbhatia@vmware.com" target="_blank" data-mce-href="mailto:nbhatia@vmware.com">nbhatia@vmware.com</a>><br> <span style="font-weight:bold" data-mce-style="font-weight: bold;">Cc: </span>"<a href="mailto:jemalloc-discuss@canonware.com" target="_blank" data-mce-href="mailto:jemalloc-discuss@canonware.com">jemalloc-discuss@canonware.com</a>" <<a href="mailto:jemalloc-discuss@canonware.com" target="_blank" data-mce-href="mailto:jemalloc-discuss@canonware.com">jemalloc-discuss@canonware.com</a>><br> <span style="font-weight:bold" data-mce-style="font-weight: bold;">Subject: </span>Re: jemalloc tuning help<br></div><div><br></div><blockquote id="MAC_OUTLOOK_ATTRIBUTION_BLOCKQUOTE" style="BORDER-LEFT: #b5c4df 5 solid; PADDING:0 0 0 5; MARGIN:0 0 0 5;" data-mce-style="border-left: #b5c4df 5 solid; padding: 0 0 0 5; margin: 0 0 0 5;">You can potentially mitigate the problem by reducing the number of arenas (only helps if per thread memory usage spikes are uncorrelated). Another possibility is to segregate short- and long-lived objects into different arenas, but this requires that you have reliable (and ideally stable) knowledge of object lifetimes. In practice, segregation is usually very difficult to maintain. If you choose to go this direction, take a look at the "arenas.extend" mallctl (for creating an arena that contains long-lived objects), and the ALLOCM_ARENA(a) macro argument to the [r]allocm() functions.</blockquote></div><div><br></div></div></body></html>