Regression: jemalloc >= 4.0 use munmap() even when configured with --disable-munmap

Fri Apr 22 22:56:22 PDT 2016

Thanks Jason, that's very helpful. I'll see if changing the `lg_chunk`
parameter changes anything.

In the meantime I found out that one likely reason for why RethinkDB
generates so many discontiguous VM mappings is because of our use of
`mprotect`. We use `mprotect` to install "guard pages" in heap-allocated
coroutine stacks, of which there can be quite a few under some workloads.

I now believe that this isn't really a jemalloc issue per se. At the very
least there are other factors involved.
We'll look into this more on our side, but I consider this a false alarm
for now.
Thanks for taking your time for explaining things here!

- Daniel

On Fri, Apr 22, 2016 at 10:41 PM, Jason Evans <jasone at canonware.com> wrote:

> On Apr 22, 2016, at 10:22 PM, Daniel Mewes <daniel at rethinkdb.com> wrote:
> > The reason for the failing `munmap` appears to be that we hit the
> kernel's `max_map_count` limit.
> >
> > I can reproduce the issue very quickly by reducing the limit through
> `echo 16000 > /proc/sys/vm/max_map_count`, and it disappears in our tests
> when increasing it to something like `echo 131060 >
> /proc/sys/vm/max_map_count`. The default value is 65530 I believe.
> >
> > We used to see this behavior in jemalloc 2.x, but didn't see it in 3.x
> anymore. It now re-appeared somewhere between 3.6 and 4.1.
>
> Version 4 switched to per arena management of huge allocations, and along
> with that completely independent trees of cached chunks.  For many
> workloads this means increased virtual memory usage, since cached chunks
> can't migrate among arenas.  I have plans to reduce the impact somewhat by
> decreasing the number of arenas by 4X, but the independence of arenas'
> mappings has numerous advantages that I plan to leverage more over time.
>
> > Do you think the allocator should handle reaching the map_count limit
> and somehow deal with it gracefully (if that's even possible)? Or should we
> just advise our users to raise the kernel limit, or alternatively try to
> change RethinkDB's allocation patterns to avoid hitting it?
>
> I'm surprised you're hitting this, because the normal mode of operation is
> for jemalloc's chunk allocation to get almost all contiguous mappings,
> which means very few distinct kernel VM map entries.  Is it possible that
> RethinkDB is routinely calling mmap() and interspersing mappings that are
> not a multiple of the chunk size?  One would hope that the kernel could
> densely pack such small mappings in the existing gaps between jemalloc's
> chunks, but unfortunately Linux uses fragile heuristics to find available
> virtual memory (the exact problem that --disable-munmap works around).
>
> To your question about making jemalloc gracefully deal with munmap()
> failure, it seems likely that mmap() is in imminent danger of failing under
> these conditions, so there's not much that can be done.  In fact, jemalloc
> only aborts if the abort option is set to true (the default for debug
> builds), so the error message jemalloc is printing probably doesn't
> directly correspond to a crash.
>
> As a workaround, you could substantially increase the chunk size (e.g.
> MALLOC_CONF=lg_chunk:30), but better would be to diagnose and address
> whatever is causing the terrible VM map fragmentation.
>
> Thanks,
> Jason
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://jemalloc.net/mailman/jemalloc-discuss/attachments/20160422/ec01d00f/attachment-0001.html>