Memory allocation/release hooks
D'Alessandro, Luke K
ldalessa at indiana.edu
Thu Oct 15 09:03:43 PDT 2015
> On Oct 15, 2015, at 11:45 AM, Shamis, Pavel <shamisp at ornl.gov> wrote:
> Dear Jemalloc Community,
> We are developer of UCX project  and as part of the effort
> we are looking for a malloc library that supports hooks for alloc/dealloc chunks and can be used for the following:
> (a) Allocation of memory that can be shared transparently between processes on the same node. For this purpose we would like to mmap memory with MAP_SHARED. This is very useful for implementation for Remote Memory Access (RMA) operations in MPI-3 one-sided  and OpenSHMEM  communication libraries. This allow a remote process to map user allocated memory and provide RMA operations through memcpy().
I’m not sure about this, but I expect that you just need to install a set of custom chunk hooks to manage this. You can read about the chunk_hooks_t [here](http://www.canonware.com/download/jemalloc/jemalloc-latest/doc/jemalloc.html).
> (b) Implementation of memory de-allocation hooks for RDMA hardware (Infiniband, ROCE, iWarp etc.). For optimization purpose we implement a lazy memory de-registration (memory unpinning) policy and we use the hook for the notification of communication library about memory release event. On the event, we cleanup our registration cache and de-register (unpin) the memory on hardware.
We have been using jemalloc for some time to manage, among other things, registered memory regions in HPX-5 (https://hpx.crest.iu.edu/) for Verbs and uGNI. If you already have a mechanism which manages keys, then you can simply install a set of chunk hooks that can perform the registration/deregistration as necessary. We have found this to work quite well for our purposes.
[Here are our hooks](https://gitlab.crest.iu.edu/extreme/hpx/blob/v1.3.0/libhpx/network/pwc/jemalloc_registered.c). There is a bit of abstraction in there, but it’s basically straightforward. We only deal with chunk allocation and deallocation since we can’t really do anything interesting on commit/decommit due to the network registration (and we’re normally using hugetlbfs anyway).
In order to actually use the arenas that manage registered memory each pthread will call [this](https://gitlab.crest.iu.edu/extreme/hpx/blob/v1.3.0/libhpx/memory/jemalloc.c#L41) at startup, and registered allocation explicitly uses the caches created there. You need to be careful to ensure that jemalloc correctly keeps memory spaces disjoint by explicitly managing caches.
We also have a global heap that is implemented in a similar fashion, except that we’re implementing mmap() there to get chunk sized bits of a much larger segment of memory that we registered.
Obviously this won’t be exactly what you need, but it should serve as an example of chunk hook replacement for RDMA memory and can almost certainly be used as a basis for what you want to do. You may be able to simply decorate jemalloc’s existing chunk allocator with the registration calls that you need, rather than replacing its implementation entirely like we do (we customize mmap() to get huge pages from hugetlbfs when available, which adds to the complexity here).
Hope it helps.
> Based on this requirements we would like to understand what is the best approach for integration this functionality within jemalloc.
> Pasha & Yossi
>  OpenUCX: https://github.com/openucx/ucx or www.openucx.org
>  MPI SPEC: http://www.mpi-forum.org/docs/mpi-3.0/mpi30-report.pdf
>  OpenSHMEM SPEC: http://bongo.cs.uh.edu/site/sites/default/site_files/openshmem-specification-1.2.pdf
> jemalloc-discuss mailing list
> jemalloc-discuss at canonware.com
More information about the jemalloc-discuss