network registered memory and pages_purge()

Tue Oct 14 18:02:57 PDT 2014

On Oct 14, 2014, at 6:15 PM, Jason Evans <jasone at canonware.com> wrote:

> On Jul 17, 2014, at 1:27 PM, D'Alessandro, Luke K <ldalessa at indiana.edu> wrote:
>> On Jul 17, 2014, at 3:52 PM, Jason Evans <jasone at canonware.com> wrote:
>>> On Jul 17, 2014, at 11:13 AM, D'Alessandro, Luke K <ldalessa at indiana.edu> wrote:
>>>> It would be nice to have allocx() dallocx() take a “cache” reference instead of an arena reference, with the cache configured with arenas to handle cache misses or something to deal with multithreaded-accesses. Other than that we really like using the library and as long as our network memory doesn’t move between cores frequently, this works well.
>>> 
>>> Are you suggesting a system in which each cache is uniquely identified, or one in which every thread potentially has indexable caches [0, 1, 2, ...]?  I've given some thought to something similar to the latter: each thread invisibly has a cache layered on top of any arena which is explicitly used for allocation.  I'm still in the idea phase on this, so I'm really interested to hear any insights you have.
>> 
>> I haven’t thought it through very hard. :-)
>> 
>> The problem I’m trying to deal with is that we have arenas associated with different memories—call them address spaces for now. I want to be able to allocate and free from address spaces independently (with the extended interface), and have caching work. There may be more than one arena per address space, in fact, I’d guess that each thread would have at least one arena for each address space. Having to bypass the cache right now is bad for us because we start to thrash the allocator with allocx/dallocx from different places, so I’d like to be able to cache based on address space.
>> 
>> I think that corresponds to your indexable cache?
> 
> It has taken some time, but I think I understand the pain points and use cases surrounding thread caches well enough now to have a solid plan of action.  Tracking issue:
> 
> 	https://github.com/jemalloc/jemalloc/issues/145

Thanks Jason,

This seems to be a good representation of the issue. Creating a thread cache, and explicitly associating an arena with it that has a custom chunk allocator would solve our problem. I would then be able to manage our network address space as an independent entity by providing a network arena and cache to each thread, and then explicitly using it for network allocation without having to sacrifice scalable performance for normal allocations, or vice versa.

This would eliminate lots of hacks that I have going at the moment.

:-)

Luke