[Q] Strange hang in jemalloc

Wed Oct 23 23:24:44 PDT 2013

On Oct 23, 2013, at 8:11 PM, Taehwan Weon <taehwan.weon at gmail.com> wrote:
> I am using jemalloc-3.4.0 on Centos 6.3
> When my SEGV signal handler tried to dump call stacks, hang occurred as following.
> I don't know why libc's fork called jemalloc_prefork even if I didn't set LD_PRELOAD.
> 
> #0  0x000000351d60d654 in __lll_lock_wait () from /lib64/libpthread.so.0
> #1  0x000000351d608f4a in _L_lock_1034 () from /lib64/libpthread.so.0
> #2  0x000000351d608e0c in pthread_mutex_lock () from /lib64/libpthread.so.0
> #3  0x00002b59a8055d6d in malloc_mutex_lock (mutex=0x2b59a84a4320) at include/jemalloc/internal/mutex.h:77
> #4  malloc_mutex_prefork (mutex=0x2b59a84a4320) at src/mutex.c:109
> #5  0x00002b59a8044c32 in arena_prefork (arena=0x2b59a84a3d40) at src/arena.c:2344
> #6  0x00002b59a803f555 in jemalloc_prefork () at src/jemalloc.c:1760
> #7  0x000000351ca9a2a6 in fork () from /lib64/libc.so.6
> #8  0x000000351ca6200d in _IO_proc_open@@GLIBC_2.2.5 () from /lib64/libc.so.6
> #9  0x000000351ca62269 in popen@@GLIBC_2.2.5 () from /lib64/libc.so.6
> #10 0x00002b59a71bc1f9 in backtrace_lineinfo (number=1, address=<value optimized out>, symbol=0x2b61f4000918 "/usr/lib64/libnc.so.2 [0x2b59a71bc3b1]") at cfs_apix.c:363
> #11 0x00002b59a71bc3ff in nc_dump_stack (sig=<value optimized out>) at cfs_apix.c:423
> #12 <signal handler called>
> #13 0x00002b59a8047332 in arena_dalloc_bin_locked (arena=0x2b59a84a3d40, chunk=0x2b61ef000000, ptr=<value optimized out>, mapelm=<value optimized out>) at src/arena.c:1717
> #14 0x00002b59a805fba4 in tcache_bin_flush_small (tbin=0x2b61ef107128, binind=8, rem=9, tcache=0x2b61ef107000) at src/tcache.c:127
> #15 0x00002b59a805fcd4 in tcache_event_hard (tcache=0x80) at src/tcache.c:39
> #16 0x00002b59a80428d9 in tcache_event (ptr=0x2b61efd26dc0) at include/jemalloc/internal/tcache.h:271
> #17 tcache_dalloc_small (ptr=0x2b61efd26dc0) at include/jemalloc/internal/tcache.h:408
> #18 arena_dalloc (ptr=0x2b61efd26dc0) at include/jemalloc/internal/arena.h:1003
> #19 idallocx (ptr=0x2b61efd26dc0) at include/jemalloc/internal/jemalloc_internal.h:913
> #20 iqallocx (ptr=0x2b61efd26dc0) at include/jemalloc/internal/jemalloc_internal.h:932
> #21 iqalloc (ptr=0x2b61efd26dc0) at include/jemalloc/internal/jemalloc_internal.h:939
> #22 jefree (ptr=0x2b61efd26dc0) at src/jemalloc.c:1272
> #23 0x00002b59a71c958b in __nc_free (p=0x2b61efd26dd0, file=<value optimized out>, lno=<value optimized out>) at util.c:1916
> #24 0x00002b59a71cb975 in tlcq_dequeue (q=0x2b59a9ee3210, msec=<value optimized out>) at tlc_queue.c:215
> #25 0x00002b59a71c154b in tp_worker (d=<value optimized out>) at threadpool.c:116
> #26 0x000000351d60683d in start_thread () from /lib64/libpthread.so.0
> #27 0x000000351cad503d in clone () from /lib64/libc.so.6
> 
> Any hint will be highly appreciated.

jemalloc calls pthread_atfork(3) in order to install functions that get called just before and after fork(2).  In this case your application is causing a signal while deep inside jemalloc (very likely due to memory corruption), with locks already acquired.  Deadlock obviously results.  To my surprise, fork() is actually listed in the signal(7) manual page as an async-signal-safe function, though popen(3) isn't, and it would probably allocate memory if it got past the hang during fork().  If you were to call fork() directly, then you'd be hitting a peculiar failure condition and we could make a case for jemalloc's behavior being questionable, but as your signal handler is written, it is simply unreliable due to calling something outside the list of async-signal-safe functions.

Jason