RFC: TCMalloc-style new/delete hooks

Tue Oct 14 10:55:01 PDT 2014

On Oct 14, 2014, at 9:13 AM, David Rigby <daver at couchbase.com> wrote:
> We are currently using TCMalloc as our memory allocator, however the significantly better fragmentation characteristics and deterministic lowest-available address selection of jemalloc means we want switching to jemalloc in the near future.
> 
> One of (the only?) sticking points however is the lack of a direct equivalent to TCMalloc’s new/delete hooks, which allow an application to register callbacks when memory is allocated/freed by the application. 
> 
> We use this feature to essentially perform sub-heap memory tracking, to determine how much memory different buckets (think tables/databases) are using. To be more specific, as a worker thread is assigned to a particular bucket the bucket ID is stored in TLS, and then when a new/delete callback is invoked we lookup the thread’s current bucket from TLS and increment/decrement the total used as appropriate.
> 
> To allow us to work with jemalloc, I’ve implemented[1] equivalent functionality in jemalloc.
> 
> I did consider making use of the arena functionality in jemalloc for this, but I was concerned about the potential increase in fragment ion with many arenas, which is exactly one of the reasons why we want to move away from TCMalloc (I’m proposing setting narenas=1 when we deploy).
> 
> How would you (Jason?) feel about merging this patch, or something conceptually similar into upstream? 
> 
> [1]: https://github.com/daverigby/jemalloc/commit/bbf3877d785417f03671bd1aed94723d750937d5

I have some concerns about this functionality that have kept me from adding it so far:

- It adds yet another branch to the fast path, whereas if you create your own wrappers and mangle jemalloc's API, it imposes no cost on applications which don't need hooks.
- It's really tricky (and requires a messy API) to support hooks that get called for all allocations from the beginning of program execution.  I don't know of a way to pull this off short of exposing weak function pointer symbols that can be overridden during static linking or dynamic loading.
- It can result in really surprising "impossible" behavior if the compiler makes assumptions about globally visible side effects, as does gcc.  In order to make hooks generally safe, the application must be compiled with -fno-builtin-malloc -fno-builtin-calloc -fno-builtin-realloc -fno-builtin-free.  Other compilers potentially have similar issues, possibly without escape hatches.  I worry that hooks add a documentation burden on jemalloc, and that people will repeatedly fail to take note of this requirement, leaving them with the impression that jemalloc is somehow flakey.

Jason