No subject

Tue Aug 5 10:35:22 PDT 2014

Hello,

I’m currently working on a data structure allowing the storage of a
dynamic set of short DNA sequences plus annotations.
Here are few details : the data structure is written in C, tests are
currently run on Ubuntu 14.04 64 bits, everything is single threaded and
Valgrind indicates that the program which manipulates the data structure
has no memory leaks.

I’ve started to use Jemalloc in an attempt to reduce the fragmentation of
the memory (by using one arena, disabling the thread caching system and
using a high ratio of dirty pages). On small data sets (30 millions
insertions), results are very good in comparison of Glibc: about 150MB
less by using tuned Jemalloc.

Now, I’ve started tests with much bigger data sets (3 to 10 billions
insertions) and I realized that Jemalloc is using more memory than Glibc.
I have generated a data set of 200 millions entries which I tried to
insert in the data structure and when the memory used reached 1GB, I
stopped the program and reported the number of entries inserted.
When using Jemalloc, doesn’t matter the tuning parameters (1 or 4 arenas,
tcache activated or not, lg_dirty = 3 or 8 or 16, lg_chunk = 14 or 22 or
30), the number of entries inserted varies between 120 millions to 172
millions. Or by using the standard Glibc, I’m able to insert 187 millions
of entries.
And on billions of entries, Glibc (I don’t have precise numbers
unfortunately) uses few Gigabytes less than Jemalloc.

So I would like to know if there is an explanation for this and if I can
do something to make Jemalloc at least as efficient as Glibc is on my
tests ? Maybe I’m not using Jemalloc correctly ?

Thank you a lot for your help and your time.

Have a nice day.

Guillaume Holley