Memory layout refinements (was Re: should there be another jemalloc release)
jasone at canonware.com
Tue Jan 20 22:15:40 PST 2015
On Jan 20, 2015, at 4:38 PM, Dave Barrett <multisample at gmail.com> wrote:
> >Regarding performance, I'm quite happy with the dev version of jemalloc. It is significantly faster, and its memory layout refinements tend to decrease fragmentation.
> Please forgive me if this was covered before, I only check in from time to time. Is there specific commit(s) or a link with a description of the changes for that comment above. Specifically the memory layout refinements ?
Here are the main changes that affect memory layout. I'm leaving out commit IDs because these features each required numerous commits.
- Purge unused dirty pages in LRU order instead of address order (contributed by Qinfan Wu). This tends to decrease run fragmentation, which means fewer arena chunks, and therefore fewer pages dedicated to arena chunk header metadata.
- Normalize size class spacing across the full range of size classes such that for each doubling in size, there are four equidistant size classes. Prior to this change, one of jemalloc's worst practical weaknesses was that a 4097-byte request was rounded up to 8 KiB, but now it's rounded up to only 5 KiB. Another benefit is that iterative reallocation has much better worst case performance with regard to copying (many fewer large and huge size classes to traverse), though in practice in-place growing reallocation tended to avoid worst case performance for large size classes. Also, with fewer run sizes, arena chunks don't tend to fragment as badly.
- Move small run metadata into the arena chunk header. This allows much smaller runs to achieve low (actually zero) waste; all size classes evenly divide relatively small run sizes. In addition to reducing the number of run sizes, this reduces the number of small regions per run, which reduces the impact of a single long-lived allocation keeping an entire run alive.
These changes are particularly satisfying because they tend toward generalization/simplification; although they were nontrivial to implement they feel like elegant improvements.
More information about the jemalloc-discuss