jemalloc initialization in a shared library
Paul Smith
paul at mad-scientist.net
Thu Sep 8 09:59:41 PDT 2016
On Thu, 2016-09-08 at 09:28 -0700, Jason Evans wrote:
> On Aug 30, 2016, at 2:43 PM, Paul Smith <paul at mad-scientist.net>
> wrote:
> >
> > [...]
> >
> > I'm compiling jemalloc as a static library (with -fPIC) then I link it
> > into my own shared library (.so). I use -fvisibility=hidden so that
> > the jemalloc symbols are not visible outside the shared library (e.g.,
> > when I use "nm" on my .so, all the jemalloc symbols are marked "t" not
> > "T").
> >
> > It works all the time for my testing and most of the time for my users.
> > However, in some situations I've had users report that their process is
> > hanging and when I get a stacktrace, the hang is happening inside
> > pthread_mutex_unlock called from within jemalloc tls stuff. Note that
> > my library is not being linked directly, it's being dlopen()'d, so the
> > process is running for a bit before my library is loaded. To be
> > precise, it's being loaded inside an openjdk 1.8 JVM and invoked from
> > Java using JNI.
>
> This may a separate issue from the TLS initialization issue you're
> hitting, but linking a malloc implementation into a dlopen()ed
> library is exceedingly difficult to make work correctly, because it's
> very difficult to avoid mixed allocator use, e.g. calling malloc() of
> one implementation and erroneously calling free() of the other. You
> can work around this by using mangled names for one implementation,
> and being very careful to match calls correctly.
Yes; however we also run on Windows where there are similar issues
crossing DLL boundaries even using the default allocator: our code is
careful to never free memory given to us by other libraries and no one
else will free our memory.
We have also compiled jemalloc with -fvisibility=hidden so I don't
think that any other code besides ours will be able to invoke our
jemalloc functions.
I get that this is a fraught area and I'm not 100% sure that our
safeguards are sufficient. However in all our internal testing things
seem to work OK...
> I huge amount has changed in the TLS-related code since 3.1, so it's
> hard for me to recall the exact quirks relative to the current
> release. Trying a newer version is certainly worthwhile.
I will work on this. It's not trivial because we've made some changes:
particularly an enhancement to allow us to dump profile stats to a
memory buffer rather than a file, so that we can send them back over
the network to a central admin service.
We have meant to contribute this back although I suspect you would not
be happy with the implementation as it is. As part of this port I'll
try to clean it up at least a bit and send along a patch, just for
informational purposes if nothing else.
> > Does this seem like it might be plausible? If so is there anything
> > that can be done (other than sweeping all my code to remove any
> > allocation done during a static constructor)? It's OK if this is a
> > GCC-only solution, such as using __attribute__((init_priority()))
> > or something...
>
> The init_priority attribute could help, but note that there's no
> simple way to guarantee that some other linked code isn't also using
> the maximum priority, thus resulting in arbitrary initialization
> order.
Yes, I understand. However looking through all the code for static
content that allocates memory is daunting (as is ensuring more such
content doesn't crop up on the future) so I don't prefer this option,
if it can be avoided.
> > It would be much simpler if I could reproduce the problem myself,
> > then I could just experiment, but so far no luck.
>
> You may be able to work around this by making jemalloc_constructor()
> visible and calling it directly, i.e. look it up via dlsym() and call
> it immediately after dlopen(). However, your comments make it sound
> as though this is happening before dlopen() returns.
Well, the hang definitely happens later, not during dlopen(): it's one
of the earliest operations but it's part of user code after the library
has been loaded. Of course, if the problem really is initialization of
memory then it could be that the corruption etc. happens during the
dlopen() call and the hang is just a symptom.
More information about the jemalloc-discuss
mailing list