<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;" class="">
<br class="">
<div>
<blockquote type="cite" class="">
<div class="">On 30 Aug 2016, at 22:43, Paul Smith <<a href="mailto:paul@mad-scientist.net" class="">paul@mad-scientist.net</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div class="">
<div class="">Hi all. I wonder if anyone has any thoughts for me about a situation I<br class="">
have.<br class="">
<br class="">
I'm working on GNU/Linux.<br class="">
<br class="">
I'm compiling jemalloc as a static library (with -fPIC) then I link it<br class="">
into my own shared library (.so). I use -fvisibility=hidden so that<br class="">
the jemalloc symbols are not visible outside the shared library (e.g.,<br class="">
when I use "nm" on my .so, all the jemalloc symbols are marked "t" not<br class="">
"T").<br class="">
<br class="">
It works all the time for my testing and most of the time for my users.<br class="">
However, in some situations I've had users report that their process is<br class="">
hanging and when I get a stacktrace, the hang is happening inside<br class="">
pthread_mutex_unlock called from within jemalloc tls stuff. Note that<br class="">
my library is not being linked directly, it's being dlopen()'d, so the<br class="">
process is running for a bit before my library is loaded. To be<br class="">
precise, it's being loaded inside an openjdk 1.8 JVM and invoked from<br class="">
Java using JNI.<br class="">
<br class="">
Here's a sample stacktrace:<br class="">
<br class="">
#0 0x0000003793a0a8a9 in pthread_mutex_unlock () from ./lib64/libpthread.so.0<br class="">
#1 0x00000037932110d2 in tls_get_addr_tail () from ./lib64/ld-linux-x86-64.so.2<br class="">
#2 0x0000003793211500 in __tls_get_addr () from ./lib64/ld-linux-x86-64.so.2<br class="">
#3 0x00007f0181a7ab7f in tcache_enabled_get () at jemalloc/include/jemalloc/internal/tcache.h:172<br class="">
#4 tcache_get (create=true) at jemalloc/include/jemalloc/internal/tcache.h:238<br class="">
#5 arena_malloc (arena=0x0, zero=false, try_tcache=true, size=96) at jemalloc/include/jemalloc/internal/arena.h:873<br class="">
#6 imallocx (try_tcache=true, arena=0x0, size=96) at jemalloc/include/jemalloc/internal/jemalloc_internal.h:767<br class="">
#7 imalloc (size=96) at jemalloc/include/jemalloc/internal/jemalloc_internal.h:776<br class="">
#8 prof_tdata_init () at jemalloc/src/prof.c:1244<br class="">
#9 0x00007f0181a5f7dd in prof_tdata_get () at jemalloc/include/jemalloc/internal/prof.h:317<br class="">
#10 malloc (size=<optimized out>) at jemalloc/src/jemalloc.c:850<br class="">
#11 0x00007f018185fbb1 in operator new [] (size=19) at core/Allocator.h:86<br class="">
#12 String::allocate (this=this@entry=0x7f0189c93e40, length=length@entry=6) at core/StringClass.cpp:158<br class="">
...<br class="">
<br class="">
(I should come clean and mention this is an older version of jemalloc:<br class="">
3.1 I believe--if that's likely to be the issue I can look into<br class="">
updating).<br class="">
</div>
</div>
</blockquote>
<div><br class="">
</div>
<span style="font-family: Menlo-Regular; font-size: 11px;" class="">3.1 is pretty old now. A quick scan through the ChangeLog does suggest there’s been a few changes related to TLS initialisation / bootstrap. I’d be tempted to at least upgrade to the most
recent 3.x, and maybe even 4.1.x and see if the problem goes away.</span></div>
<div><font face="Menlo-Regular" class=""><span style="font-size: 11px;" class=""><br class="">
</span></font>
<blockquote type="cite" class="">
<div class="">
<div class=""><br class="">
The hang seems to happen very close to when this library starts, and<br class="">
it's clearly in a fundamental area.<br class="">
<br class="">
What I was wondering was whether it might be possible that some static<br class="">
memory inside jemalloc was not getting initialized in the right order<br class="">
when the shared library is loaded. Perhaps there's some other static<br class="">
variable with a constructor which allocates memory, and if they are<br class="">
invoked in the wrong order then jemalloc's structures are not set up<br class="">
properly or something.<br class="">
<br class="">
Does this seem like it might be plausible? If so is there anything<br class="">
that can be done (other than sweeping all my code to remove any<br class="">
allocation done during a static constructor)? It's OK if this is a<br class="">
GCC-only solution, such as using __attribute__((init_priority())) or<br class="">
something...<br class="">
<br class="">
It would be much simpler if I could reproduce the problem myself, then<br class="">
I could just experiment, but so far no luck.<br class="">
_______________________________________________<br class="">
jemalloc-discuss mailing list<br class="">
<a href="mailto:jemalloc-discuss@canonware.com" class="">jemalloc-discuss@canonware.com</a><br class="">
http://www.canonware.com/mailman/listinfo/jemalloc-discuss<br class="">
</div>
</div>
</blockquote>
</div>
<br class="">
</body>
</html>