network registered memory and pages_purge()

Thu Jul 17 13:27:36 PDT 2014

On Jul 17, 2014, at 3:52 PM, Jason Evans <jasone at canonware.com> wrote:

> On Jul 17, 2014, at 11:13 AM, D'Alessandro, Luke K <ldalessa at indiana.edu> wrote:
> 
>> I don’t really want to to use jemalloc exclusively for managing the pinned memory—in fact, it gets used to back all of the normal malloc/free calls in the code. We use allocx() dallocx() with the pinned arena directly for network managed memory. Will the MALLOC_CONF cause us problems with the rest of the our runtime? I could link something else first, to deal with those routines if this is a bad thing to disable in general (-lc?).
> 
> The MALLOC_CONF setting will impact the application as a whole, so until there's a mechanism in jemalloc for controlling purging on a per arena basis, you're going to potentially suffer increased physical memory usage, depending on application behavior, because jemalloc won't be discarding the dirty pages in all the other arenas.  I just created an issue on github to make sure this use case isn't forgotten in jemalloc 4:
> 
> 	https://github.com/jemalloc/jemalloc/issues/93

Ok, great. I can live with memory pressure for now.

>> It would be nice to have allocx() dallocx() take a “cache” reference instead of an arena reference, with the cache configured with arenas to handle cache misses or something to deal with multithreaded-accesses. Other than that we really like using the library and as long as our network memory doesn’t move between cores frequently, this works well.
> 
> Are you suggesting a system in which each cache is uniquely identified, or one in which every thread potentially has indexable caches [0, 1, 2, ...]?  I've given some thought to something similar to the latter: each thread invisibly has a cache layered on top of any arena which is explicitly used for allocation.  I'm still in the idea phase on this, so I'm really interested to hear any insights you have.

I haven’t thought it through very hard. :-)

The problem I’m trying to deal with is that we have arenas associated with different memories—call them address spaces for now. I want to be able to allocate and free from address spaces independently (with the extended interface), and have caching work. There may be more than one arena per address space, in fact, I’d guess that each thread would have at least one arena for each address space. Having to bypass the cache right now is bad for us because we start to thrash the allocator with allocx/dallocx from different places, so I’d like to be able to cache based on address space.

I think that corresponds to your indexable cache?

Luke