jemalloc Suitable for embedded environments

Mayank Kumar (mayankum) mayankum at cisco.com
Thu May 7 16:01:04 PDT 2015


Hi Jason and Konstantin

Thanks for your replies. I will investigate along the pointers specified. Few more questions and comments:-

--what specifically causes the code size bloat ?
--it is comforting to hear that the jemalloc is already part of FreeBSD. I would like to know which version of jemalloc is part of FreeBSD releases now ? Also does the FreeBSD distribution of jemalloc includes all the enhancements done for Facebook or is it some stripped down version?

-mayank


-----Original Message-----
From: Jason Evans [mailto:jasone at canonware.com] 
Sent: Thursday, May 07, 2015 8:06 AM
To: Mayank Kumar (mayankum); Konstantin Tokarev
Cc: jemalloc-discuss at canonware.com
Subject: Re: jemalloc Suitable for embedded environments

On May 7, 2015, at 2:41 AM, Konstantin Tokarev <annulen at yandex.ru> wrote:
> 07.05.2015, 04:51, "Mayank Kumar (mayankum)" <mayankum at cisco.com>:
>> 
>> I recently started experimenting with jemalloc and found that jemalloc controls fragmentation phenomenally. A process that was running out of its allocated quota of 2g of virtual memory now uses only 1000mb of virtual memory and works without any crashes. I am trying to incorporate this library but it seems people have some apprehensions about it associated with big data applications(enhanced by facebook).
>> 
>> Can someone answer the following question, while I am trying  to do some code reading:-
>> 
>> 1.       Does jemalloc waste memory at the cost of speed and is optimized for big data applications ?

jemalloc remains the allocator built into FreeBSD's libc, and memory utilization has actually improved for such applications over the past five years, because I've taken care to solve performance issues in ways that scale both to large and small scale.  In my opinion the worst problem with jemalloc for very small embedded environments is code size, which translates to a larger instruction cache footprint.  Some features can be compiled out, but even so the resulting binary size is larger than that of most other allocators.

> By default jemalloc uses 4MiB chunks which can be too much for low memory systems. You might want to tune it with lg_chunk option (note that decreasing chunk size may affect performance).

The dev version now uses 256 KiB chunks, so this shouldn't be much of an issue starting with jemalloc 4.

>> 2.       Are there scenarios where if many threads are competing for malloc, it will dynamically create new arenas to reduce  thread contention and are their parameters to control/tune them ?

The maximum number of arenas that can be automatically created is fixed at startup time, and can be tuned:

	http://www.canonware.com/download/jemalloc/jemalloc-latest/doc/jemalloc.html#opt.narenas

>> 3.       Are there any other tunable parameters I should look at to ensure jemalloc doesn’t uncontrollably allocate memory in stress scenarios to optimize performance at the cost of memory . In my environment, I would expect jemalloc to reduce performance rather than allocate more  memory/arenas/pools to better performance.

You may want to turn off thread caches:

	http://www.canonware.com/download/jemalloc/jemalloc-latest/doc/jemalloc.html#opt.tcache

More aggressive dirty page purging may reduce physical memory usage, depending on application:

	http://www.canonware.com/download/jemalloc/jemalloc-latest/doc/jemalloc.html#opt.lg_dirty_mult

Jason


More information about the jemalloc-discuss mailing list