jemalloc-3.6.0 erroneously recycles already-allocated memory

Mon Jan 19 14:46:48 PST 2015

On Jan 19, 2015, at 1:11 PM, Kurt Wampler <Kurt.Wampler at asml.com> wrote:
> Context:
> 
> We have an x86_64 Linux C++ application which installs a "NewHandler" that
> attempts to cope with an out-of-memory situation [malloc() returns NULL
> pointer] in two ways: (1) It performs some amount of garbage collection,
> and if this fails to free enough memory, (2) it attempts to raise the soft
> virtual memory ceiling with a call to setrlimit(RLIMIT_AS,<new_limit>).
> 
> Expected Behavior:
> 
> When jemalloc's malloc() is called from libstdc++'s "new" operator, but
> mmap() returns a NULL pointer to jemalloc, indicating an out-of-memory
> condition, jemalloc's malloc() is expected to return a NULL pointer to
> its caller, which will in turn trigger our predefined "NewHandler".
> 
> Observed Behavior:
> 
> We found that jemalloc's malloc() does not immediately return a NULL pointer
> after the first failed mmap().  Instead, it returns a series of pointers
> that it had already given to the application, and only returns a NULL pointer
> after the second mmap() fails.  Reassigning already-in-use chunks of memory is
> of course deadly, and our application eventually segfaults.
> 
> As evidence of this behavior, I'm including an strace logging the two mmap()
> calls, the malloc() return values before and after the first failed mmap(),
> and the subsequent NULL return from malloc() after the second failed mmap(),
> finally triggering the invocation of the "NewHandler".  Note that address
> 0x2aaade7ffa60 is handed out twice, without ever being freed.  The same
> is true for addresses 0x2aaade7ffab0, 0x2aaade7ffb00, 0x2aaade7ffb50,
> 0x2aaade7ffba0, 0x2aaade7ffbf0, 0x2aaade7ffc40, 0x2aaade7ffc9, and
> 0x2aaade7ffce0 -- all get handed out twice(!)
> 
> I'm also including a partial call stack showing the calls from operator
> new() on down, taken at the moment where the first mmap() fails.
> 
> Single-stepping in gdb from that point onward, I find that the NULL returned
> by mmap() is handed up approximately 10 levels before things go awry.  The
> code seems to re-check in several other places for available memory, but
> without finding anything it can dole out.  When it has bubbled up to the
> function je_tcache_alloc_small_hard(), this function calls tcache_alloc_easy().
> In tcache_alloc_easy(), tbin->ncached is 9, and tbin->avail[8..0] contains
> the 9 addresses mentioned above.  It seems to be erroneously handing them
> out again from there.
> 
> This test case can be reproduced at will within a few minutes of run time.
> 
> We have not yet attempted to devise a fix; it took several days of
> investigation to reach this degree of understanding of the problem.

This sounds like the regression fixed by this commit:

	https://github.com/jemalloc/jemalloc/commit/f11a6776c78a09059f8418b718c996a065b33fca

Please let me know its effect on your application.

Thanks,
Jason