jemalloc-3.6.0 erroneously recycles already-allocated memory
jasone at canonware.com
Mon Jan 19 14:46:48 PST 2015
On Jan 19, 2015, at 1:11 PM, Kurt Wampler <Kurt.Wampler at asml.com> wrote:
> We have an x86_64 Linux C++ application which installs a "NewHandler" that
> attempts to cope with an out-of-memory situation [malloc() returns NULL
> pointer] in two ways: (1) It performs some amount of garbage collection,
> and if this fails to free enough memory, (2) it attempts to raise the soft
> virtual memory ceiling with a call to setrlimit(RLIMIT_AS,<new_limit>).
> Expected Behavior:
> When jemalloc's malloc() is called from libstdc++'s "new" operator, but
> mmap() returns a NULL pointer to jemalloc, indicating an out-of-memory
> condition, jemalloc's malloc() is expected to return a NULL pointer to
> its caller, which will in turn trigger our predefined "NewHandler".
> Observed Behavior:
> We found that jemalloc's malloc() does not immediately return a NULL pointer
> after the first failed mmap(). Instead, it returns a series of pointers
> that it had already given to the application, and only returns a NULL pointer
> after the second mmap() fails. Reassigning already-in-use chunks of memory is
> of course deadly, and our application eventually segfaults.
> As evidence of this behavior, I'm including an strace logging the two mmap()
> calls, the malloc() return values before and after the first failed mmap(),
> and the subsequent NULL return from malloc() after the second failed mmap(),
> finally triggering the invocation of the "NewHandler". Note that address
> 0x2aaade7ffa60 is handed out twice, without ever being freed. The same
> is true for addresses 0x2aaade7ffab0, 0x2aaade7ffb00, 0x2aaade7ffb50,
> 0x2aaade7ffba0, 0x2aaade7ffbf0, 0x2aaade7ffc40, 0x2aaade7ffc9, and
> 0x2aaade7ffce0 -- all get handed out twice(!)
> I'm also including a partial call stack showing the calls from operator
> new() on down, taken at the moment where the first mmap() fails.
> Single-stepping in gdb from that point onward, I find that the NULL returned
> by mmap() is handed up approximately 10 levels before things go awry. The
> code seems to re-check in several other places for available memory, but
> without finding anything it can dole out. When it has bubbled up to the
> function je_tcache_alloc_small_hard(), this function calls tcache_alloc_easy().
> In tcache_alloc_easy(), tbin->ncached is 9, and tbin->avail[8..0] contains
> the 9 addresses mentioned above. It seems to be erroneously handing them
> out again from there.
> This test case can be reproduced at will within a few minutes of run time.
> We have not yet attempted to devise a fix; it took several days of
> investigation to reach this degree of understanding of the problem.
This sounds like the regression fixed by this commit:
Please let me know its effect on your application.
More information about the jemalloc-discuss