Crash in arenas_cleanup on linux x86-64

Mike Hommey mh+jemalloc at glandium.org
Thu Mar 29 08:34:50 PDT 2012


On Wed, Mar 28, 2012 at 04:52:23PM -0700, Jason Evans wrote:
> 
> On Mar 28, 2012, at 4:30 PM, Jason Evans wrote:
> 
> > On Mar 28, 2012, at 12:42 PM, Mike Hommey wrote:
> >> I'm getting crashes in Firefox in some cases (only one test suite,
> >> actually), and on Linux x86-64 only (not Linux x86, not Android
> >> ARM, and not OSX x86 or x86-64).  They are a NULL deref in
> >> arenas_cleanup, in which the arena variable seems to be NULL.  This
> >> happens with current dev branch. I had a hunch that I tested, and
> >> it turns out commit cd9a134 is broken too and 154829d is not, which
> >> makes cd9a134 the culprit.  I haven't looked why, though.
> > 
> > It looks to me like the tsd cleanup handler can be called even if
> > the thread never initialized the tsd for that thread.  I think the
> > crash you are seeing would happen if a thread never allocated a
> > small or large object.  I'll work on a fix tonight.
> 
> Actually, after further scrutiny, I don't see how this can happen
> unless TLS (__thread variable memory) is cleared before pthreads TSD
> destructors are called.  That seems an unlikely explanation though;
> any other ideas?

I haven't reproduced in a simpler testcase, but I could reproduce the
failing circumstances in a local Firefox build, with a debugger, and
here is roughly what happens:

#0  arenas_tsd_set (val=0x7f0cc08cda78) at include/jemalloc/internal/jemalloc_internal.h:443
#1  0x00000000004035b5 in choose_arena_hard () at memory/jemalloc/src/jemalloc.c:157
#2  0x0000000000402b1f in choose_arena () at include/jemalloc/internal/jemalloc_internal.h:557
#3  0x0000000000408807 in tcache_get () at memory/jemalloc/include/jemalloc/internal/tcache.h:226
#4  0x000000000041036d in arena_dalloc (arena=0x7f0cc6400180, chunk=0x7f0cc6000000, ptr=0x7f0cc60bde00)
    at memory/jemalloc/include/jemalloc/internal/arena.h:593
#5  0x0000000000402f75 in idalloc (ptr=0x7f0cc60bde00) at include/jemalloc/internal/jemalloc_internal.h:685
#6  0x00000000004069c1 in free (ptr=0x7f0cc60bde00) at memory/jemalloc/src/jemalloc.c:1129
#7  0x00007f0cd1aceaa7 in _dl_deallocate_tls () from /lib64/ld-linux-x86-64.so.2
#8  0x00007f0cd18a892d in __free_stacks (limit=41943040) at allocatestack.c:278
#9  0x00007f0cd18a8a39 in queue_stack (stack=<optimized out>) at allocatestack.c:306
#10 __deallocate_stack (pd=0x64a3c0) at allocatestack.c:758

The arenas_tls value is set when a thread that probably did no
allocation terminated. Libc doesn't call arenas_cleanup for the
corresponding key, probably because it already did call key
destructors for that thread by then.

Then, another thread gets the memory region where arenas_tls was,
and writes something else there. First another pointer, and after
that a NULL value.

At some point, another thread ends and libc does call arenas_cleanup
for the arenas_tls of the previously terminated thread, and this
is where we segfault. We're actually lucky that something else wrote
a NULL so that we can notice this issue. The fact that the previous
tls implementation was using direct pointers was hiding it even more.

Mike



More information about the jemalloc-discuss mailing list