Jason Evans jasone at canonware.com
Wed Aug 6 09:12:15 PDT 2014


On Aug 6, 2014, at 8:55 AM, Guillaume Holley <gholley at cebitec.uni-bielefeld.de> wrote:
> Le 06/08/2014 01:10, Jason Evans a écrit :
>> On Aug 5, 2014, at 10:35 AM, gholley at CeBiTec.Uni-Bielefeld.DE wrote:
>>> I’m currently working on a data structure allowing the storage of a
>>> dynamic set of short DNA sequences plus annotations.
>>> Here are few details : the data structure is written in C, tests are
>>> currently run on Ubuntu 14.04 64 bits, everything is single threaded and
>>> Valgrind indicates that the program which manipulates the data structure
>>> has no memory leaks.
>>> 
>>> I’ve started to use Jemalloc in an attempt to reduce the fragmentation of
>>> the memory (by using one arena, disabling the thread caching system and
>>> using a high ratio of dirty pages). On small data sets (30 millions
>>> insertions), results are very good in comparison of Glibc: about 150MB
>>> less by using tuned Jemalloc.
>>> 
>>> Now, I’ve started tests with much bigger data sets (3 to 10 billions
>>> insertions) and I realized that Jemalloc is using more memory than Glibc.
>>> I have generated a data set of 200 millions entries which I tried to
>>> insert in the data structure and when the memory used reached 1GB, I
>>> stopped the program and reported the number of entries inserted.
>>> When using Jemalloc, doesn’t matter the tuning parameters (1 or 4 arenas,
>>> tcache activated or not, lg_dirty = 3 or 8 or 16, lg_chunk = 14 or 22 or
>>> 30), the number of entries inserted varies between 120 millions to 172
>>> millions. Or by using the standard Glibc, I’m able to insert 187 millions
>>> of entries.
>>> And on billions of entries, Glibc (I don’t have precise numbers
>>> unfortunately) uses few Gigabytes less than Jemalloc.
>>> 
>>> So I would like to know if there is an explanation for this and if I can
>>> do something to make Jemalloc at least as efficient as Glibc is on my
>>> tests ? Maybe I’m not using Jemalloc correctly ?
>> There are a few possible issues, mainly related to fragmentation, but I can't make many specific guesses because I don't know what the allocation/deallocation patterns are in your application.  It sounds like your application just does a bunch of allocation, with very little interspersed deallocation, in which case I'm surprised by your results unless you happen to be allocating lots of objects that are barely larger than the nearest size class boundaries (e.g. 17 bytes).  Have you taken a close look at the output of malloc_stats_print()?
>> 
>> Jason
> 
> Hi and thank you for the help.
> 
> Well my application is doing like you said a lot of allocations with very little deallocations, but the memory allocated is very often reallocated. However, my application cannot allocate memory for more than 600KB in one allocation, so no allocation of huge objects.
> I tried to have a look at malloc_stats_print() (which is enclosed at the end of my answer) and I see that for bins of size 64, it seems I make a huge amount of allocations/reallocations for a small amount of memory allocated, and maybe this generates a lot of fragmentation but I don't know in which proportion. Do you think my problem could be linked to this, and if yes, can I do something on Jemalloc to solve it ?
> 
> Here are the Jemalloc statistics for 200 millions insertions, Jemalloc tuned with the following parameters : "narenas:1,tcache:false,lg_dirty_mult:8"
> 
> ___ Begin jemalloc statistics ___
> [...]
> Allocated: 1196728936, active: 1212567552, mapped: 1287651328

The overall external fragmentation is 1-(allocated/active), 1.3%, which is very low.

> Current active ceiling: 16416505856
> chunks: nchunks   highchunks    curchunks
>            307          307          307
> huge: nmalloc      ndalloc    allocated
>            0            0            0
> 
> arenas[0]:
> assigned threads: 1
> dss allocation precedence: disabled
> dirty pages: 296037:66 active:dirty, 25210 sweeps, 90046 madvises, 627169 purged
>            allocated      nmalloc      ndalloc    nrequests
> small:     1116037736    372536523    370598419    372536523
> large:       80691200       223900       220617       223900
> total:     1196728936    372760423    370819036    372760423
> active:    1212567552
> mapped:    1283457024
> bins:     bin  size regs pgs    allocated      nmalloc ndalloc    nrequests       nfills     nflushes      newruns reruns      curruns
> [...]
>           21  1280   51  16    600357120       641800 172771       641800            0            0         9221 184785         9197

The only aspect of jemalloc that might be causing you problems is size class rounding.  IIRC glibc's malloc spaces its small size classes closer together.  If your application allocates lots of 1025-byte objects, that could cost you nearly 20% in terms of memory usage for those allocations (~120 MB in this case).

Jason


More information about the jemalloc-discuss mailing list