calling a function via PLT and jemalloc realloc changes function first argument (XMM0)
david.abdurachmanov at gmail.com
Thu Nov 14 07:49:24 PST 2013
> If I understand correctly, then the simplest correct fix for this problem is to modify ld.so such that it preserves all caller-saved registers when calling out to functions like realloc(3). In my opinion, ld.so probably shouldn't be using the normal malloc at all (instead use a directly embedded minimal malloc implementation), because there are lots of mind-boggling ways bootstrapping can fail, but that's a more involved change.
I think you cannot count on any fixes on ld.so side. Have you tried running glibc test with jemalloc?
> Can you disable the gcc optimization (-fno-tree-vectorize) when building jemalloc? You won't hit actual floating point code in jemalloc unless you enable heap profiling, so that should prevent XMM usage in all the relevant jemalloc code. If that works okay, I'll need to get a better understanding of when to automatically configure the gcc flags that way when building jemalloc.
I know that --disable-stats at least allows my computation workflows to run fine, yet it doesn't mean that I am confident about jemalloc.
If SSE/AVX cannot be used in malloc implementation, simple idea is to disable it:
gcc -mno-sse -fvisibility=hidden -fPIC -DPIC -c -D_GNU_SOURCE -D_REENTRANT -Iinclude -Iinclude -o src/jemalloc.pic.o src/jemalloc.c
In file included from include/jemalloc/internal/jemalloc_internal.h:1037:0,
include/jemalloc/internal/prof.h: In function 'prof_sample_threshold_update':
include/jemalloc/internal/prof.h:349:40: error: SSE register return with SSE disabled
prof_tdata->threshold = (uint64_t)(log(u) /
Disabling SSE does not allow jemalloc to be compiled, even with --disable-stats.
349 prof_tdata->threshold = (uint64_t)(log(u) /
350 log(1.0 - (1.0 / (double)((uint64_t)1U << opt_lg_prof_sample))))
351 + (uint64_t)1U;
This requires SSE register.
From Microsoft x86_64 calling conventions, same applies for Linux:
"All floating point operations are done using the 16 XMM registers."
Same with argument passing, all floats/doubles goes through XMM registers.
So these pieces of code _requires_ XMM registers. We just need to make sure those pieces are never in ld.so call path.
- By default jemalloc should be compiled without SSE/AXV. This is also means that such options as "stats" by default is off. Special note must be added about possible problems.
- Make sure that jemalloc passes the same or a similar test as in glibc on multiple platforms.
> What about setting LD_BIND_NOW=1 in the environment so that all the register-corrupting badness happens prior to application execution (when it doesn't matter)?
It probably depends on application. It would increase a start time of application, which is not always preferred.
My current thoughts would be that jemalloc shouldn't try to go around it, but try to do similar QA as glibc, get rid of SSE/AVX registers in functions possibly used by ld.so.
More information about the jemalloc-discuss