Regression: jemalloc >= 4.0 use munmap() even when configured with --disable-munmap
daniel at rethinkdb.com
Fri Apr 22 22:56:22 PDT 2016
Thanks Jason, that's very helpful. I'll see if changing the `lg_chunk`
parameter changes anything.
In the meantime I found out that one likely reason for why RethinkDB
generates so many discontiguous VM mappings is because of our use of
`mprotect`. We use `mprotect` to install "guard pages" in heap-allocated
coroutine stacks, of which there can be quite a few under some workloads.
I now believe that this isn't really a jemalloc issue per se. At the very
least there are other factors involved.
We'll look into this more on our side, but I consider this a false alarm
Thanks for taking your time for explaining things here!
On Fri, Apr 22, 2016 at 10:41 PM, Jason Evans <jasone at canonware.com> wrote:
> On Apr 22, 2016, at 10:22 PM, Daniel Mewes <daniel at rethinkdb.com> wrote:
> > The reason for the failing `munmap` appears to be that we hit the
> kernel's `max_map_count` limit.
> > I can reproduce the issue very quickly by reducing the limit through
> `echo 16000 > /proc/sys/vm/max_map_count`, and it disappears in our tests
> when increasing it to something like `echo 131060 >
> /proc/sys/vm/max_map_count`. The default value is 65530 I believe.
> > We used to see this behavior in jemalloc 2.x, but didn't see it in 3.x
> anymore. It now re-appeared somewhere between 3.6 and 4.1.
> Version 4 switched to per arena management of huge allocations, and along
> with that completely independent trees of cached chunks. For many
> workloads this means increased virtual memory usage, since cached chunks
> can't migrate among arenas. I have plans to reduce the impact somewhat by
> decreasing the number of arenas by 4X, but the independence of arenas'
> mappings has numerous advantages that I plan to leverage more over time.
> > Do you think the allocator should handle reaching the map_count limit
> and somehow deal with it gracefully (if that's even possible)? Or should we
> just advise our users to raise the kernel limit, or alternatively try to
> change RethinkDB's allocation patterns to avoid hitting it?
> I'm surprised you're hitting this, because the normal mode of operation is
> for jemalloc's chunk allocation to get almost all contiguous mappings,
> which means very few distinct kernel VM map entries. Is it possible that
> RethinkDB is routinely calling mmap() and interspersing mappings that are
> not a multiple of the chunk size? One would hope that the kernel could
> densely pack such small mappings in the existing gaps between jemalloc's
> chunks, but unfortunately Linux uses fragile heuristics to find available
> virtual memory (the exact problem that --disable-munmap works around).
> To your question about making jemalloc gracefully deal with munmap()
> failure, it seems likely that mmap() is in imminent danger of failing under
> these conditions, so there's not much that can be done. In fact, jemalloc
> only aborts if the abort option is set to true (the default for debug
> builds), so the error message jemalloc is printing probably doesn't
> directly correspond to a crash.
> As a workaround, you could substantially increase the chunk size (e.g.
> MALLOC_CONF=lg_chunk:30), but better would be to diagnose and address
> whatever is causing the terrible VM map fragmentation.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the jemalloc-discuss