<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 12pt;
font-family:Calibri
}
--></style></head>
<body class='hmmessage'><div dir='ltr'>
<div><div dir="ltr">Hi,<br> I am using jemalloc 3.1.0, compiled with gcc 4.1.2 on CentOS release 5.6 for a compute intensive multithreaded application where bulk of allocations falls in 8-byte, 16-byte (stats below). We are seeing very high runtime in tcache_alloc_small_hard(), arena_tcache_fill_small() and arena_bin_malloc_hard(). The 8 and 16 bytes allocations happen in very large number. We probably want the allocations to come form tcache_alloc_easy() to not hit the locks and take less runtime. The allocations are mostly temporary in nature -- especially 8-byte ones -- so some number of 8-byte allocations are done, then they are freed and then alloc/free process repeated. <br><br>The runs use 8 threads (so 48 arenas and all other jemalloc settings are default) and are on an Intel Xeon server with 12-cores (no hyperthreading and no other user so no contention with other user; the peak memory in application is ~6.5GB and server has 128GB memory so no memory shortage/swapping etc.); the jemalloc output at the end of run is given below. As we can see, pretty much 8 and 16 byte bins are overloaded. <br><br>A run with TCACHE_NSLOTS_SMALL_MAX=10000, LG_TCACHE_MAXCLASS_DEFAULT=10, changing small bins to hold only upto 224 bytes (NBINS=15), and hardcoding <br>tcache_bin_info[i].ncached_max = TCACHE_NSLOTS_SMALL_MAX in tcache.c file improves runtime by about 16% (for overall application runtime) but increases peak memory from 6.5GB to 6.7GB. We want the runtime to at least improve by another 10% without preferably any increase in peak memory (so even 6.5GB to 6.7GB is not desirable). <br>Any suggestions on what changes to jemalloc settings to try?<br>thanks a lot!,<br>--Rajat<br><br>___ Begin jemalloc statistics ___<br>Version: 3.1.0-0-g3b1f3aca54fede23299cde9034f7b909c3d290d7<br>Assertions disabled<br>Run-time option settings:<br> opt.abort: false<br> opt.lg_chunk: 22<br> opt.dss: "secondary"<br> opt.narenas: 48<br> opt.lg_dirty_mult: 5<br> opt.stats_print: false<br> opt.junk: false<br> opt.quarantine: 0<br> opt.redzone: false<br> opt.zero: false<br> opt.tcache: true<br> opt.lg_tcache_max: 15<br>CPUs: 12<br>Arenas: 48<br>Pointer size: 8<br>Quantum size: 16<br>Page size: 4096<br>Min active:dirty page ratio per arena: 32:1<br>Maximum thread-cached size class: 32768<br>Chunk size: 4194304 (2^22)<br>Allocated: 1209815024, active: 1295314944, mapped: 1832910848<br>Current active ceiling: 1321205760<br>chunks: nchunks highchunks curchunks<br> 446 443 437<br>huge: nmalloc ndalloc allocated<br> 0 0 0<br><br>Merged arenas stats:<br>assigned threads: 1<br>dss allocation precedence: N/A<br>dirty pages: 316239:9435 active:dirty, 1133 sweeps, 43514 madvises, 198766 purged<br> allocated nmalloc ndalloc nrequests<br>small: 1048219632 14499926892 14459296188 16587774427<br>large: 161595392 1658 1282 2659<br>total: 1209815024 14499928550 14459297470 16587777086<br>active: 1295314944<br>mapped: 1828716544<br>bins: bin size regs pgs allocated nmalloc ndalloc nrequests nfills nflushes newruns reruns curruns<br> 0 8 501 1 17659680 7173034900 7170827440 8020202160 71750873 71709910 282770 17873119 4463<br> 1 16 252 1 399942864 7263860606 7238864177 8275464931 72760253 72395028 138537 1394609632 105755<br> 2 32 126 1 340674496 34841421 24195343 229801652 504480 257943 128506 1157771 89699<br> 3 48 84 1 78136368 2304192 676351 11567784 101000 18024 22085 18898 19409<br> 4 64 63 1 1573312 3402405 3377822 5487283 217389 67666 33203 121936 454<br> 5 80 50 1 17949200 1591440 1367075 2726511 177324 40895 23869 56202 4552<br> 6 96 84 2 25039488 7866030 7605202 19022100 150183 96444 40254 127493 3556<br> 7 112 72 2 25960928 4878810 4647016 6646579 277346 80409 39762 81691 3309<br> 8 128 63 2 4877696 345189 307082 887445 91917 17364 2600 18459 606<br> 9 160 51 2 3152000 470435 450735 712418 81161 19755 6687 13692 388<br> 10 192 63 3 7931712 1356246 1314935 2595825 81608 31104 18297 14202 792<br> 11 224 72 4 583072 1666870 1664267 4112733 94788 84418 18487 14044 37<br> 12 256 63 4 4441088 214118 196770 531195 50982 23215 2773 3298 332<br> 13 320 63 5 161600 1507044 1506539 3623214 112592 45548 16571 22779 9<br> 14 384 63 6 63520128 1115114 949697 1845693 61670 33546 15219 8725 3117<br> 15 448 63 7 56578368 922862 796571 908379 69788 43446 13528 7001 2006<br> ...<cut>...<br>large: size pages nmalloc ndalloc nrequests curruns<br> 4096 1 413 413 956 0<br> 8192 2 378 378 821 0<br> 12288 3 13 13 27 0<br> 16384 4 2 2 3 0<br> 20480 5 3 3 3 0<br> ...<cut>...<br><br> </div></div> </div></body>
</html>