Fragmentation with jemalloc
vandana shah
shah.vandana at gmail.com
Mon Apr 22 23:36:41 PDT 2013
I am quite certain I am looking at RES and not VIRT. In the tests, VIRT
remains close to jemalloc's 'mapped' statistic, but resident set size is
way off 'active' reported by jemalloc.
I will check if madvise fails in the tests and get back.
Thanks,
Vandana
On Tue, Apr 23, 2013 at 11:04 AM, Jason Evans <jasone at canonware.com> wrote:
> On Apr 22, 2013, at 9:18 PM, vandana shah <shah.vandana at gmail.com> wrote:
>
> On Mon, Apr 22, 2013 at 11:49 PM, Jason Evans <jasone at canonware.com>wrote:
>
>> On Apr 21, 2013, at 10:01 PM, vandana shah wrote:
>> > I have been trying to use jemalloc for my application and observed that
>> the rss of the process keeps on increasing.
>> >
>> > I ran the application with valgrind to confirm that there are no memory
>> leaks.
>> >
>> > To investigate more, I collected jemalloc stats after running the test
>> for few days and here is the summary for a run with narenas:1,
>> tcache:false, lg_chunk:24
>> >
>> > Arenas: 1
>> > Pointer size: 8
>> > Quantum size: 16
>> > Page size: 4096
>> > Min active:dirty page ratio per arena: 8:1
>> > Maximum thread-cached size class: 32768
>> > Chunk size: 16777216 (2^24)
>> > Allocated: 24364176040, active: 24578334720, mapped: 66739765248
>> > Current active ceiling: 24578621440
>> > chunks: nchunks highchunks curchunks
>> > 3989 3978 3978
>> > huge: nmalloc ndalloc allocated
>> > 3 2 117440512
>> >
>> > arenas[0]:
>> > assigned threads: 17
>> > dss allocation precedence: disabled
>> > dirty pages: 5971898:64886 active:dirty, 354265 sweeps, 18261119
>> madvises, 1180858954 purged
>> >
>> > While in this state, the RSS of the process was at 54GB.
>> >
>> > Questions:
>> > 1) The difference between RSS and jemalloc active is huge (more than
>> 30GB). In my test, the difference was quite less in the beginning (say 4
>> GB) and it went on increasing with time. That seems too high to account for
>> jemalloc data structures, overhead etc. What else gets accounted in process
>> RSS - active?
>>
>> jemalloc is reporting very low page-level external fragmentation for your
>> app: 1.0 - allocated/active == 1.0 - 24364176040/24578334720 == 0.87%.
>> However, virtual memory fragmentation is quite high: 1.0 - active/mapped
>> == 63.2%.
>>
>> > 2) The allocations are fairly random, sized between 8 bytes and 2MB.
>> Are there any known issues of fragmentation for particular allocation sizes?
>>
>> If your application were to commonly allocate slightly more than one
>> chunk, then internal fragmentation would be quite high, but at little
>> actual cost to physical memory. However, you are using 16 MiB chunks, and
>> the stats say that there's only a single huge (112-MiB) allocation.
>>
>> > 3) Is there a way to tune the allocations and reduce the difference?
>>
>> I can't think of a way this could happen short of a bug in jemalloc. Can
>> you send me a complete statistics, and provide the following?
>>
>> - jemalloc version
>> - operating system
>> - compile-time jemalloc configuration flags
>> - run-time jemalloc option flags
>> - brief description of what application does
>>
>> Hopefully that will narrow down the possible explanations.
>>
>> Thanks,
>> Jason
>
>
> Jemalloc version: 3.2.0
> Operating system: Linux 2.6.32-220.7.1.el6.x86_64
> Compile-time jemalloc configuration flags:
> autogen : 0
> experimental : 1
> cc-silence : 0
> debug : 0
> stats : 1
> prof : 0
> prof-libunwind : 0
> prof-libgcc : 0
> prof-gcc : 0
> tcache : 1
> fill : 1
> utrace : 0
> valgrind : 0
> xmalloc : 0
> mremap : 0
> munmap : 0
> dss : 0
> lazy_lock : 0
> tls : 1
>
> Run-time jemalloc configuration flags:
> MALLOC_CONF=narenas:1,tcache:false,lg_chunk:24
>
> Application description:
> This is a server that caches and serves data from sqlite database. The
> database size can be multiple of the cache size.
> The data is paged in and out as necessary to keep the process RSS under
> control. The server is written in C++.
> All data and metadata is dynamically allocated, so allocator is used quite
> extensively.
> In the test, server starts with a healthy data/RSS ratio (say 0.84). This
> ratio reduces with time as RSS keeps growing
> whereas server starts to page out data to keep RSS under control. In the
> test the ratio came down to 0.42.
>
>
> Okay, I've taken a close look at this, and I see no direct evidence of a
> bug in jemalloc. The difference between active and mapped memory is due to
> page run fragmentation within the chunks, but the total
> fragmentation-induced overhead attributable to chunk metadata and unused
> dirty pages appears to be 200-300 MiB. The only way I can see for the
> statistics to be self-consistent, yet have such a high RSS is if the
> madvise() call within pages_purge() is failing. You should be able to
> eliminate this possibility by looking at strace output.
>
> Are you certain that you are looking at RES (resident set size, aka RSS)
> rather than VIRT (virtual size, aka VSIZE or VSZ)? Assuming that your
> application doesn't do a bunch of mmap()ing outside jemalloc, I would
> expect VIRT to be pretty close to jemalloc's 'mapped' statistic, and RES to
> be pretty close to jemalloc's 'active' statistic.
>
> Thanks,
> Jason
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://jemalloc.net/mailman/jemalloc-discuss/attachments/20130423/00b94c3f/attachment.html>
More information about the jemalloc-discuss
mailing list