<div dir="ltr"><div><div><div><div>Hi Jason,<br><br></div>thank you for your reply.<br><br>The reason for the failing `munmap` appears to be that we hit the kernel's `max_map_count` limit.<br><br></div>I can reproduce the issue very quickly by reducing the limit through <code>`echo 16000 > /proc/sys/vm/max_map_count`</code>, and it disappears in our tests when increasing it to something like `<code>echo 131060 > /proc/sys/vm/max_map_count</code>`. The default value is 65530 I believe.<br><br></div><div>We used to see this behavior in jemalloc 2.x, but didn't see it in 3.x anymore. It now re-appeared somewhere between 3.6 and 4.1.<br><br></div><div>It looks like I looked at the wrong place when I checked the jemalloc 3.6 code for comparison earlier today, and I can now see that the same code was indeed there just in a different file (`chunk_mmap.c`). Thanks for clarifying this.<br><br></div><div>So it seems that the difference between 3.6 and 4.1 must be caused by something else then, and we might just have been lucky that the particular behavior of jemalloc 3 didn't trigger the issue for our workload.<br></div><div><br></div><div>Do you think the allocator should handle reaching the map_count limit and somehow deal with it gracefully (if that's even possible)? Or should we just advise our users to raise the kernel limit, or alternatively try to change RethinkDB's allocation patterns to avoid hitting it?<br></div><div><br></div>I can try to come up with a small test case to specifically reproduce this issue later.<br><br></div>- Daniel<br><br><div><div><div><br></div></div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Fri, Apr 22, 2016 at 9:24 PM, Jason Evans <span dir="ltr"><<a href="mailto:jasone@canonware.com" target="_blank">jasone@canonware.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><span class="">On Apr 22, 2016, at 4:38 PM, Daniel Mewes <<a href="mailto:daniel@rethinkdb.com">daniel@rethinkdb.com</a>> wrote:<br>
> In jemalloc 3.0, this patch added the `--disable-munmap` option and disabled the use of `munmap` on Linux by default: <a href="https://github.com/jemalloc/jemalloc/commit/59ae2766af88bad07ac721c4ee427b171e897bcb" rel="noreferrer" target="_blank">https://github.com/jemalloc/jemalloc/commit/59ae2766af88bad07ac721c4ee427b171e897bcb</a><br>
><br>
> It looks like jemalloc starting with version 4.0 makes use of `munmap` even when `--disable-munmap` is specified. From what I can tell, `chunk_map.c` honors the `config_munmap` flag, but the function `page_unmap` in `pages.c` ignores it (this code appears to be new in jemalloc 4?).<br>
><br>
> We are using jemalloc for RethinkDB and would like to upgrade to version 4.1 because we think that it fixes some bugs that our users have run into.<br>
> However it causes a regression for <a href="https://github.com/rethinkdb/rethinkdb/issues/3516" rel="noreferrer" target="_blank">https://github.com/rethinkdb/rethinkdb/issues/3516</a> :<br>
> "<jemalloc>: Error in munmap(): Cannot allocate memory"<br>
<br>
</span>pages_unmap() is used to trim mappings so that what remains is chunk-aligned, regardless of whether --disable-munmap is specified. jemalloc 3.x has similar code that calls munmap(). I don't see anything in what you're describing that is particular to jemalloc 4.x. Are you able to determine anything else about the failure? Its' extremely unusual for munmap() to fail (I've not seen it happen since ~2005 during initial development), so I'm guessing there's a memory corruption issue of some sort, whether due to a bug in jemalloc or RethinkDB.<br>
<br>
Thanks,<br>
Jason</blockquote></div><br></div>