jemalloc initialization in a shared library

Thu Sep 8 09:28:48 PDT 2016

On Aug 30, 2016, at 2:43 PM, Paul Smith <paul at mad-scientist.net> wrote:
> [...]
> 
> I'm compiling jemalloc as a static library (with -fPIC) then I link it
> into my own shared library (.so).  I use -fvisibility=hidden so that
> the jemalloc symbols are not visible outside the shared library (e.g.,
> when I use "nm" on my .so, all the jemalloc symbols are marked "t" not
> "T").
> 
> It works all the time for my testing and most of the time for my users.
> However, in some situations I've had users report that their process is
> hanging and when I get a stacktrace, the hang is happening inside
> pthread_mutex_unlock called from within jemalloc tls stuff.  Note that
> my library is not being linked directly, it's being dlopen()'d, so the
> process is running for a bit before my library is loaded.  To be
> precise, it's being loaded inside an openjdk 1.8 JVM and invoked from
> Java using JNI.

This may a separate issue from the TLS initialization issue you're hitting, but linking a malloc implementation into a dlopen()ed library is exceedingly difficult to make work correctly, because it's very difficult to avoid mixed allocator use, e.g. calling malloc() of one implementation and erroneously calling free() of the other.  You can work around this by using mangled names for one implementation, and being very careful to match calls correctly.

> Here's a sample stacktrace:
> 
> #0  0x0000003793a0a8a9 in pthread_mutex_unlock () from ./lib64/libpthread.so.0
> #1  0x00000037932110d2 in tls_get_addr_tail () from ./lib64/ld-linux-x86-64.so.2
> #2  0x0000003793211500 in __tls_get_addr () from ./lib64/ld-linux-x86-64.so.2
> #3  0x00007f0181a7ab7f in tcache_enabled_get () at jemalloc/include/jemalloc/internal/tcache.h:172
> #4  tcache_get (create=true) at jemalloc/include/jemalloc/internal/tcache.h:238
> #5  arena_malloc (arena=0x0, zero=false, try_tcache=true, size=96) at jemalloc/include/jemalloc/internal/arena.h:873
> #6  imallocx (try_tcache=true, arena=0x0, size=96) at jemalloc/include/jemalloc/internal/jemalloc_internal.h:767
> #7  imalloc (size=96) at jemalloc/include/jemalloc/internal/jemalloc_internal.h:776
> #8  prof_tdata_init () at jemalloc/src/prof.c:1244
> #9  0x00007f0181a5f7dd in prof_tdata_get () at jemalloc/include/jemalloc/internal/prof.h:317
> #10 malloc (size=<optimized out>) at jemalloc/src/jemalloc.c:850
> #11 0x00007f018185fbb1 in operator new [] (size=19) at core/Allocator.h:86
> #12 String::allocate (this=this at entry=0x7f0189c93e40, length=length at entry=6) at core/StringClass.cpp:158
>   ...

This is probably related to attempts at reentrant allocation inside the glibc TLS machinery.  For the most part we avoid this by bootstrapping prior to accessing TLS, but perhaps that's not happening early enough as some side effect of dlopen().

> (I should come clean and mention this is an older version of jemalloc:
> 3.1 I believe--if that's likely to be the issue I can look into
> updating).

I huge amount has changed in the TLS-related code since 3.1, so it's hard for me to recall the exact quirks relative to the current release.  Trying a newer version is certainly worthwhile.

> [...]
> 
> Does this seem like it might be plausible?  If so is there anything
> that can be done (other than sweeping all my code to remove any
> allocation done during a static constructor)?  It's OK if this is a
> GCC-only solution, such as using __attribute__((init_priority())) or
> something...

The init_priority attribute could help, but note that there's no simple way to guarantee that some other linked code isn't also using the maximum priority, thus resulting in arbitrary initialization order.

> It would be much simpler if I could reproduce the problem myself, then
> I could just experiment, but so far no luck.

You may be able to work around this by making jemalloc_constructor() visible and calling it directly, i.e. look it up via dlsym() and call it immediately after dlopen().  However, your comments make it sound as though this is happening before dlopen() returns.

Jason