Fragmentation with jemalloc

Mon Apr 22 22:34:36 PDT 2013

On Apr 22, 2013, at 9:18 PM, vandana shah <shah.vandana at gmail.com> wrote:
> On Mon, Apr 22, 2013 at 11:49 PM, Jason Evans <jasone at canonware.com> wrote:
> On Apr 21, 2013, at 10:01 PM, vandana shah wrote:
> > I have been trying to use jemalloc for my application and observed that the rss of the process keeps on increasing.
> >
> > I ran the application with valgrind to confirm that there are no memory leaks.
> >
> > To investigate more, I collected jemalloc stats after running the test for few days and here is the summary for a run with narenas:1, tcache:false, lg_chunk:24
> >
> >  Arenas: 1
> >  Pointer size: 8
> >  Quantum size: 16
> >  Page size: 4096
> >  Min active:dirty page ratio per arena: 8:1
> >  Maximum thread-cached size class: 32768
> >  Chunk size: 16777216 (2^24)
> >  Allocated: 24364176040, active: 24578334720, mapped: 66739765248
> >  Current active ceiling: 24578621440
> >  chunks: nchunks   highchunks    curchunks
> >             3989         3978         3978
> >  huge: nmalloc      ndalloc    allocated
> >              3            2    117440512
> >
> >  arenas[0]:
> >  assigned threads: 17
> >  dss allocation precedence: disabled
> >  dirty pages: 5971898:64886 active:dirty, 354265 sweeps, 18261119 madvises, 1180858954 purged
> >
> > While in this state, the RSS of the process was at 54GB.
> >
> > Questions:
> > 1) The difference between RSS and jemalloc active is huge (more than 30GB). In my test, the difference was quite less in the beginning (say 4 GB) and it went on increasing with time. That seems too high to account for jemalloc data structures, overhead etc. What else gets accounted in process RSS - active?
> 
> jemalloc is reporting very low page-level external fragmentation for your app: 1.0 - allocated/active == 1.0 - 24364176040/24578334720 == 0.87%.  However, virtual memory fragmentation is quite high: 1.0 - active/mapped == 63.2%.
> 
> > 2) The allocations are fairly random, sized between 8 bytes and 2MB. Are there any known issues of fragmentation for particular allocation sizes?
> 
> If your application were to commonly allocate slightly more than one chunk, then internal fragmentation would be quite high, but at little actual cost to physical memory.  However, you are using 16 MiB chunks, and the stats say that there's only a single huge (112-MiB) allocation.
> 
> > 3) Is there a way to tune the allocations and reduce the difference?
> 
> I can't think of a way this could happen short of a bug in jemalloc.  Can you send me a complete statistics, and provide the following?
> 
> - jemalloc version
> - operating system
> - compile-time jemalloc configuration flags
> - run-time jemalloc option flags
> - brief description of what application does
> 
> Hopefully that will narrow down the possible explanations.
> 
> Thanks,
> Jason
> 
> Jemalloc version: 3.2.0
> Operating system: Linux 2.6.32-220.7.1.el6.x86_64
> Compile-time jemalloc configuration flags:
> autogen            : 0
> experimental       : 1
> cc-silence         : 0
> debug              : 0
> stats              : 1
> prof               : 0
> prof-libunwind     : 0
> prof-libgcc        : 0
> prof-gcc           : 0
> tcache             : 1
> fill               : 1
> utrace             : 0
> valgrind           : 0
> xmalloc            : 0
> mremap             : 0
> munmap             : 0
> dss                : 0
> lazy_lock          : 0
> tls                : 1
> 
> Run-time jemalloc configuration flags:
> MALLOC_CONF=narenas:1,tcache:false,lg_chunk:24
> 
> Application description:
> This is a server that caches and serves data from sqlite database. The database size can be multiple of the cache size.
> The data is paged in and out as necessary to keep the process RSS under control. The server is written in C++.
> All data and metadata is dynamically allocated, so allocator is used quite extensively.
> In the test, server starts with a healthy data/RSS ratio (say 0.84). This ratio reduces with time as RSS keeps growing
> whereas server starts to page out data to keep RSS under control. In the test the ratio came down to 0.42.

Okay, I've taken a close look at this, and I see no direct evidence of a bug in jemalloc.  The difference between active and mapped memory is due to page run fragmentation within the chunks, but the total fragmentation-induced overhead attributable to chunk metadata and unused dirty pages appears to be 200-300 MiB.  The only way I can see for the statistics to be self-consistent, yet have such a high RSS is if the madvise() call within pages_purge() is failing.  You should be able to eliminate this possibility by looking at strace output.

Are you certain that you are looking at RES (resident set size, aka RSS) rather than VIRT (virtual size, aka VSIZE or VSZ)?  Assuming that your application doesn't do a bunch of mmap()ing outside jemalloc, I would expect VIRT to be pretty close to jemalloc's 'mapped' statistic, and RES to be pretty close to jemalloc's 'active' statistic.

Thanks,
Jason

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://jemalloc.net/mailman/jemalloc-discuss/attachments/20130422/8c4edfb6/attachment.html>