keeping memory usage at certain limit
jasone at canonware.com
Thu May 1 21:37:22 PDT 2014
On May 1, 2014, at 1:36 AM, Antony Dovgal <antony.dovgal at gmail.com> wrote:
> On 05/01/2014 06:43 AM, Jason Evans wrote:
>> Use "thread.tcache.flush" to flush thread caches; "arena.<i>.purge" merely uses madvise(2)
>> to inform the kernel that it can recycle dirty pages that contain unused data.
> According to the docs "thread.tcache.flush" only flushes the cache of the calling thread and
> I have a lot of threads running in thread pools, which are created at the start and never destroyed.
> Or did you mean to call it periodically from every thread?
Your application can benefit from calling the “thread.tcache.flush” mallctl from a thread that is about to go “idle” (i.e. stops using the allocator for a while), but there’s little benefit otherwise, because there’s an incremental flushing mechanism built in that is driven by continued allocation activity. One straightforward way to implement flushing for idle threads in thread pools is to have idle threads wake up after a few seconds of inactivity and flush before going back to sleep.
>> There are two statistics jemalloc tracks that directly allow you to measure external fragmentation: "stats.allocated"  and "stats.active" .
> Right, I've tried using both of them.
> Do I understand it correctly that stats.active decreases only when an entire page is freed?
“stats.active” decreases when an entire page run is freed. It precisely tracks what actually matters in terms of physical memory exhaustion.
> So far, using Salvatore's method and code I can see about 3% difference between RSS and allocated memory
> when using jemalloc and ~9% difference when using Hoard.
> But I expect these values to change since the processes haven't started removing outdated records yet.
> I also have a control process without jemalloc (i.e. using plain libc malloc()) using the same code to compute fragmentation
> and it shows about 20% difference (and it's growing).
> What buffles me most is that stats.allocated keeps returning the same value, but RSS constantly grows.
This is probably because you aren’t calling the “epoch” mallctl to refresh mallctl’s cached statistics.
> Could it be because of the amount of threads I use?
If your application occasionally recurses deeply, you may be incrementally increasing the total amount of memory dedicated to thread execution stacks. That could account for several gigabytes of memory usage, but probably isn’t the only issue.
> Say, I free memory in one thread and try to allocate in another one, but the second thread
> doesn't have it cached and has to do the actual allocation?
Within limits, this can bloat memory usage. However, IIRC thread caches average ~2.5 MiB per thread under the worst conditions (all threads are purely deallocating a broad mix of allocation sizes), so the thread caches probably account for less than 1 GiB in your application.
More information about the jemalloc-discuss