Fragmentation with jemalloc

Mon Apr 22 23:36:41 PDT 2013

I am quite certain I am looking at RES and not VIRT. In the tests, VIRT
remains close to jemalloc's 'mapped' statistic, but resident set size is
way off 'active' reported by jemalloc.

I will check if madvise fails in the tests and get back.

Thanks,
Vandana

On Tue, Apr 23, 2013 at 11:04 AM, Jason Evans <jasone at canonware.com> wrote:

> On Apr 22, 2013, at 9:18 PM, vandana shah <shah.vandana at gmail.com> wrote:
>
> On Mon, Apr 22, 2013 at 11:49 PM, Jason Evans <jasone at canonware.com>wrote:
>
>> On Apr 21, 2013, at 10:01 PM, vandana shah wrote:
>> > I have been trying to use jemalloc for my application and observed that
>> the rss of the process keeps on increasing.
>> >
>> > I ran the application with valgrind to confirm that there are no memory
>> leaks.
>> >
>> > To investigate more, I collected jemalloc stats after running the test
>> for few days and here is the summary for a run with narenas:1,
>> tcache:false, lg_chunk:24
>> >
>> >  Arenas: 1
>> >  Pointer size: 8
>> >  Quantum size: 16
>> >  Page size: 4096
>> >  Min active:dirty page ratio per arena: 8:1
>> >  Maximum thread-cached size class: 32768
>> >  Chunk size: 16777216 (2^24)
>> >  Allocated: 24364176040, active: 24578334720, mapped: 66739765248
>> >  Current active ceiling: 24578621440
>> >  chunks: nchunks   highchunks    curchunks
>> >             3989         3978         3978
>> >  huge: nmalloc      ndalloc    allocated
>> >              3            2    117440512
>> >
>> >  arenas[0]:
>> >  assigned threads: 17
>> >  dss allocation precedence: disabled
>> >  dirty pages: 5971898:64886 active:dirty, 354265 sweeps, 18261119
>> madvises, 1180858954 purged
>> >
>> > While in this state, the RSS of the process was at 54GB.
>> >
>> > Questions:
>> > 1) The difference between RSS and jemalloc active is huge (more than
>> 30GB). In my test, the difference was quite less in the beginning (say 4
>> GB) and it went on increasing with time. That seems too high to account for
>> jemalloc data structures, overhead etc. What else gets accounted in process
>> RSS - active?
>>
>> jemalloc is reporting very low page-level external fragmentation for your
>> app: 1.0 - allocated/active == 1.0 - 24364176040/24578334720 == 0.87%.
>>  However, virtual memory fragmentation is quite high: 1.0 - active/mapped
>> == 63.2%.
>>
>> > 2) The allocations are fairly random, sized between 8 bytes and 2MB.
>> Are there any known issues of fragmentation for particular allocation sizes?
>>
>> If your application were to commonly allocate slightly more than one
>> chunk, then internal fragmentation would be quite high, but at little
>> actual cost to physical memory.  However, you are using 16 MiB chunks, and
>> the stats say that there's only a single huge (112-MiB) allocation.
>>
>> > 3) Is there a way to tune the allocations and reduce the difference?
>>
>> I can't think of a way this could happen short of a bug in jemalloc.  Can
>> you send me a complete statistics, and provide the following?
>>
>> - jemalloc version
>> - operating system
>> - compile-time jemalloc configuration flags
>> - run-time jemalloc option flags
>> - brief description of what application does
>>
>> Hopefully that will narrow down the possible explanations.
>>
>> Thanks,
>> Jason
>
>
> Jemalloc version: 3.2.0
> Operating system: Linux 2.6.32-220.7.1.el6.x86_64
> Compile-time jemalloc configuration flags:
> autogen            : 0
> experimental       : 1
> cc-silence         : 0
> debug              : 0
> stats              : 1
> prof               : 0
> prof-libunwind     : 0
> prof-libgcc        : 0
> prof-gcc           : 0
> tcache             : 1
> fill               : 1
> utrace             : 0
> valgrind           : 0
> xmalloc            : 0
> mremap             : 0
> munmap             : 0
> dss                : 0
> lazy_lock          : 0
> tls                : 1
>
> Run-time jemalloc configuration flags:
> MALLOC_CONF=narenas:1,tcache:false,lg_chunk:24
>
> Application description:
> This is a server that caches and serves data from sqlite database. The
> database size can be multiple of the cache size.
> The data is paged in and out as necessary to keep the process RSS under
> control. The server is written in C++.
> All data and metadata is dynamically allocated, so allocator is used quite
> extensively.
> In the test, server starts with a healthy data/RSS ratio (say 0.84). This
> ratio reduces with time as RSS keeps growing
> whereas server starts to page out data to keep RSS under control. In the
> test the ratio came down to 0.42.
>
>
> Okay, I've taken a close look at this, and I see no direct evidence of a
> bug in jemalloc.  The difference between active and mapped memory is due to
> page run fragmentation within the chunks, but the total
> fragmentation-induced overhead attributable to chunk metadata and unused
> dirty pages appears to be 200-300 MiB.  The only way I can see for the
> statistics to be self-consistent, yet have such a high RSS is if the
> madvise() call within pages_purge() is failing.  You should be able to
> eliminate this possibility by looking at strace output.
>
> Are you certain that you are looking at RES (resident set size, aka RSS)
> rather than VIRT (virtual size, aka VSIZE or VSZ)?  Assuming that your
> application doesn't do a bunch of mmap()ing outside jemalloc, I would
> expect VIRT to be pretty close to jemalloc's 'mapped' statistic, and RES to
> be pretty close to jemalloc's 'active' statistic.
>
> Thanks,
> Jason
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://jemalloc.net/mailman/jemalloc-discuss/attachments/20130423/00b94c3f/attachment.html>