hint on debugging what seems to be a deadlock
Ricardo Nabinger Sanchez
rnsanchez at wait4.org
Thu Aug 9 08:36:01 PDT 2012
Hello,
While using jemalloc-3.0.0 on a busy server, glibc-2.15 (-r2, Gentoo),
kernel 3.2.25, our application is frequently hitting this backtrace
pasted below. We'd appreciate tips on where to start looking for problems.
(gdb) bt
#0 0x00007f1e2b3ee304 in __lll_lock_wait () from /lib/libpthread.so.0
#1 0x00007f1e2b3e9789 in _L_lock_534 () from /lib/libpthread.so.0
#2 0x00007f1e2b3e959e in pthread_mutex_lock () from /lib/libpthread.so.0
#3 0x00007f1e2b606e4d in malloc_mutex_lock (arena=0x7f1e2ac7f8c0, chunk=0x7df49fc00000, run=0x7df49ff8b000, bin=0x7f1e2ac7fa48)
at include/jemalloc/internal/mutex.h:77
#4 arena_dalloc_bin_run (arena=0x7f1e2ac7f8c0, chunk=0x7df49fc00000, run=0x7df49ff8b000, bin=0x7f1e2ac7fa48) at src/arena.c:1520
#5 0x00007f1e2b60782a in arena_dalloc_bin_locked (arena=0x7f1e2ac7f8c0, chunk=0x7df49fc00000, ptr=<value optimized out>,
mapelm=<value optimized out>) at src/arena.c:1593
#6 0x00007f1e2b61fa57 in tcache_bin_flush_small (tbin=0x7df48dc06048, binind=1, rem=35, tcache=0x7df48dc06000) at src/tcache.c:128
#7 0x00007f1e2b61fdc5 in tcache_event_hard (tcache=0x7df48dc06000) at src/tcache.c:39
#8 0x00007f1e2b600f18 in tcache_event (ptr=<value optimized out>) at include/jemalloc/internal/tcache.h:271
#9 tcache_dalloc_large (ptr=<value optimized out>) at include/jemalloc/internal/tcache.h:435
#10 arena_dalloc (ptr=<value optimized out>) at include/jemalloc/internal/arena.h:966
#11 idalloc (ptr=<value optimized out>) at include/jemalloc/internal/jemalloc_internal.h:840
#12 iqalloc (ptr=<value optimized out>) at include/jemalloc/internal/jemalloc_internal.h:852
#13 free (ptr=<value optimized out>) at src/jemalloc.c:1212
When this happens, most of our threads get stuck on what seems to be
a deadlock among certain threads:
(gdb) i thr
28 Thread 0x7f1e2c30c700 (LWP 30862) 0x00007f1e2b1256a9 in syscall () from /lib/libc.so.6
27 Thread 0x7df457fff700 (LWP 30870) 0x00007f1e2b3ecd3b in pthread_once () from /lib/libpthread.so.0
26 Thread 0x7df4577fe700 (LWP 30871) 0x00007f1e2b0f64dd in nanosleep () from /lib/libc.so.6
25 Thread 0x7df456ffd700 (LWP 30872) 0x00007f1e2b3ef03d in nanosleep () from /lib/libpthread.so.0
24 Thread 0x7df4567fc700 (LWP 30873) 0x00007f1e2b3eeafd in accept () from /lib/libpthread.so.0
23 Thread 0x7df455ffb700 (LWP 30874) 0x00007f1e2b1256a9 in syscall () from /lib/libc.so.6
22 Thread 0x7df455f7a700 (LWP 30875) 0x00007f1e2b1256a9 in syscall () from /lib/libc.so.6
21 Thread 0x7df455ef9700 (LWP 30876) 0x00007f1e2b1256a9 in syscall () from /lib/libc.so.6
20 Thread 0x7df455e78700 (LWP 30877) 0x00007f1e2b1256a9 in syscall () from /lib/libc.so.6
19 Thread 0x7df455df7700 (LWP 30878) 0x00007f1e2b1256a9 in syscall () from /lib/libc.so.6
18 Thread 0x7df455d76700 (LWP 30879) 0x00007f1e2b3ee304 in __lll_lock_wait () from /lib/libpthread.so.0
17 Thread 0x7df455cf5700 (LWP 30880) 0x00007f1e2b3ecd3b in pthread_once () from /lib/libpthread.so.0
16 Thread 0x7df455c74700 (LWP 30881) 0x00007f1e2b1256a9 in syscall () from /lib/libc.so.6
15 Thread 0x7df455bf3700 (LWP 30882) 0x00007f1e2b3ecd3b in pthread_once () from /lib/libpthread.so.0
14 Thread 0x7df455b72700 (LWP 30883) 0x00007f1e2b3ecd3b in pthread_once () from /lib/libpthread.so.0
13 Thread 0x7df455af1700 (LWP 30884) 0x00007f1e2b3ecd3b in pthread_once () from /lib/libpthread.so.0
12 Thread 0x7df455a70700 (LWP 30885) 0x00007f1e2b3ecd3b in pthread_once () from /lib/libpthread.so.0
11 Thread 0x7df4559ef700 (LWP 30886) 0x00007f1e2b3ecd3b in pthread_once () from /lib/libpthread.so.0
10 Thread 0x7df45596e700 (LWP 30887) 0x00007f1e2b3ee304 in __lll_lock_wait () from /lib/libpthread.so.0
9 Thread 0x7df4558ed700 (LWP 30888) 0x00007f1e2b3ecd3b in pthread_once () from /lib/libpthread.so.0
8 Thread 0x7df45586c700 (LWP 30889) 0x00007f1e2b3ecd3b in pthread_once () from /lib/libpthread.so.0
7 Thread 0x7df4557eb700 (LWP 30890) 0x00007f1e2b3ecd3b in pthread_once () from /lib/libpthread.so.0
6 Thread 0x7df45576a700 (LWP 30891) 0x00007f1e2b3ee304 in __lll_lock_wait () from /lib/libpthread.so.0
5 Thread 0x7df4556e9700 (LWP 30892) 0x00007f1e2b1256a9 in syscall () from /lib/libc.so.6
4 Thread 0x7df455668700 (LWP 30893) 0x00007f1e2b3ecd3b in pthread_once () from /lib/libpthread.so.0
3 Thread 0x7df4555e7700 (LWP 30894) 0x00007f1e2b3ecd3b in pthread_once () from /lib/libpthread.so.0
2 Thread 0x7df455566700 (LWP 30895) 0x00007f1e2b3ef03d in nanosleep () from /lib/libpthread.so.0
* 1 Thread 0x7f1e2c30d740 (LWP 30861) 0x00007f1e2b1256a9 in syscall ()
from /lib/libc.so.6
All threads on pthread_once() or __lll_lock_wait() are stuck and unresponsive
to anything, and it requires us to fire a -KILL to the application. We have
*no* reason to suspect from jemalloc itself, but we cannot confirm using other
libraries because they simply can't handle the load last we tried.
Our application is not using pthread locks/mutexes anymore, so the pressure
on them is much slower now.
Perhaps I will be able to provide more info on this, if I can get my SSH
connection to the server back up.
Thank you for your attention.
Regards
--
Ricardo Nabinger Sanchez http://rnsanchez.wait4.org/
"Left to themselves, things tend to go from bad to worse."
More information about the jemalloc-discuss
mailing list