<html><head></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space; "><span class="Apple-style-span" style="font-family: monospace; "><div><span class="Apple-style-span" style="font-family: monospace; ">Hi,</span></div><div><span class="Apple-style-span" style="font-family: monospace; "><br></span></div><div><span class="Apple-style-span" style="font-family: monospace; ">I am having problems with jemalloc 3.4.1 (currently we use 2.2.2 in production). I found that with jemalloc 3.4.1 function first argument will be changed if first argument is passed by XMM0 register. Compiled with GCC 4.8.1 (tested also with 4.8.2). No problems on Scientific Linux 6 (RHEL6-based), but it fails on Scientific Linux 5 (RHEL5-based). All of this is because _dl_lookup_symbol_x calls _realloc_ in Scientific Linux 5.</span></div><div><span class="Apple-style-span" style="font-family: monospace; "><br></span></div><div><span class="Apple-style-span" style="font-family: monospace; ">This probably makes jemalloc 3.4.1 and the whole 3.X.Y series not recommended for RHEL5 and RHEL5-based distributions.</span></div><div><span class="Apple-style-span" style="font-family: monospace; "><br></span></div><div><span class="Apple-style-span" style="font-family: monospace; ">Original email below.</span></div><div><span class="Apple-style-span" style="font-family: monospace; "><br></span></div><div><span class="Apple-style-span" style="font-family: monospace; ">- - - - - - -</span></div><div><span class="Apple-style-span" style="font-family: monospace; "><br></span></div>My initial investigations were done on slc6_amd64_gcc481 and the release is available for slc5_amd64_gcc481.</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">Most of the workflows will fail on this [slc5_amd64_gcc481] architecture, while on slc6_amd64_gcc481 all workflows pass.</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">If you are interested into the cause and calling conventions continue reading.</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">Most workflows fails with:</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">----- Begin Fatal Exception 08-Nov-2013 14:19:25 CET-----------------------</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">An exception of category 'InvalidIntervalError' occurred while</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  [0] Processing run: 208307 lumi: 1 event: 643482</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  [1] Running path 'reconstruction_step'</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  [2] Calling event method for module TrackIPProducer/'impactParameterTagInfos'</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">Exception Message:</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">Upper boundary below lower boundary in histogram integral.</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">----- End Fatal Exception -------------------------------------------------</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">Code triggering exception (CondFormats/PhysicsToolsObjects/interface/Histogram.icc):</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">244 template<typename Value_t, typename Axis_t></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">245 Value_t Histogram<Value_t, Axis_t>::integral(Axis_t hBound, Axis_t lBound,</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">246                                              int mode) const</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">247 {</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">248         if (hBound < lBound)</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">249                 throw cms::Exception("InvalidIntervalError")</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">250                         << "Upper boundary below lower boundary in "</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">251                         << "histogram integral." << std::endl;</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">The problem by example (description below):</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">Dump of assembler code for function PhysicsTools::Calibration::Histogram<float, float>::normalizedIntegral(float, float, int) const:</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaabc67ceb0 <+0>:     push   %rbx</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaabc67ceb1 <+1>:     mov    %rdi,%rbx</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaabc67ceb4 <+4>:     sub    $0x10,%rsp</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaabc67ceb8 <+8>:     callq  0x2aaabc6331e0 <_ZNK12PhysicsTools11Calibration9HistogramIffE8integralEffi@plt></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaabc67cebd <+13>:    mov    %rbx,%rdi</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaabc67cec0 <+16>:    movss  %xmm0,0xc(%rsp)</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaabc67cec6 <+22>:    callq  0x2aaabc632c80 <_ZNK12PhysicsTools11Calibration9HistogramIffE13normalizationEv@plt></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaabc67cecb <+27>:    movss  0xc(%rsp),%xmm1</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaabc67ced1 <+33>:    add    $0x10,%rsp</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaabc67ced5 <+37>:    divss  %xmm0,%xmm1</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaabc67ced9 <+41>:    pop    %rbx</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaabc67ceda <+42>:    movaps %xmm1,%xmm0</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaabc67cedd <+45>:    retq   </span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">End of assembler dump.</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">this = 0x2aab170a9ff0</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">hBound = 57.6329994</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">lBound = 0</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">mode = 1</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">Breakpoint 1, PhysicsTools::Calibration::Histogram<float, float>::integral (this=0x2aab170a9ff0, hBound=-2.23135843e-10, lBound=0, mode=1)</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">   at /build/davidlt/CMSSW_7_0_0_pre8_jemalloc341/src/CondFormats/PhysicsToolsObjects/interface/Histogram.icc:245</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">245     Value_t Histogram<Value_t, Axis_t>::integral(Axis_t hBound, Axis_t lBound,</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">1: x/i $pc</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">=> 0x2aaabc67cbdc <PhysicsTools::Calibration::Histogram<float, float>::integral(float, float, int) const>:      push   %r14</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">this = 0x2aab170a9ff0</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">hBound = -2.23135843e-10</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">lBound = 0</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">mode = 1</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">KA-BOOM!</span><span class="Apple-style-span" style="font-family: monospace; "> </span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">_normalizedIntegral_ calls _integral_ with IDENTICAL arguments, yet once we reach _integral_ body our _hBound_ is changed to a different value.</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">We call _integral_ via PLT and we try to resolve the symbol (/lib64/ld-linux-x86-64.so.2). Between these two functions while we are resolving the symbol the value is modified.</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">That happens in _dl_lookup_symbol_x (/lib64/ld-linux-x86-64.so.2) as on SLC5 is calls _realloc_, and on SLC6 library calls _malloc_. This is the reason why in works fine under SLC6, the change in dynamic linker/loader.</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">_hBound_ is stored in $xmm0.v4_float[0]. It happens to be that in _realloc_ (jemalloc) for this (src/jemalloc.c):</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">1244     ta->allocated += usize;</span><div><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">1244 line compiler will generate SSE based code (using $xmm0).</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaaad381666 <+630>:   mov    %r12,0x28(%rsp)</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaaad38166b <+635>:   movq   0x28(%rsp),%xmm0</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaaad381671 <+641>:   movhps 0x20(%rsp),%xmm0</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaaad381676 <+646>:   paddq  (%rax),%xmm0</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaaad38167a <+650>:   movdqa %xmm0,(%rax)</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaaad38167e <+654>:   add    $0x38,%rsp</span><span class="Apple-style-span" style="font-family: monospace; "> </span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">Just a few instructions which modify _hBound_ value.</span></div><div><span class="Apple-style-span" style="font-family: monospace; "><br></span></div><div><div><font class="Apple-style-span" face="monospace">Old value = 57.6329994</font></div><div><font class="Apple-style-span" face="monospace">New value = 6.72623263e-44</font></div><div><font class="Apple-style-span" face="monospace">0x00002aaaad381671 in realloc (ptr=<optimized out>, size=<optimized out>) at src/jemalloc.c:1244</font></div><div><font class="Apple-style-span" face="monospace">1244<span class="Apple-tab-span" style="white-space:pre">       </span>src/jemalloc.c: No such file or directory.</font></div><div><font class="Apple-style-span" face="monospace">1: x/i $pc</font></div><div><font class="Apple-style-span" face="monospace">=> 0x2aaaad381671 <realloc+641>:<span class="Apple-tab-span" style="white-space:pre">   </span>movhps 0x20(%rsp),%xmm0</font></div><div><font class="Apple-style-span" face="monospace">Continuing.</font></div><div><font class="Apple-style-span" face="monospace">Watchpoint 7: $xmm0.v4_float[0]</font></div><div><font class="Apple-style-span" face="monospace"><br></font></div><div><font class="Apple-style-span" face="monospace">Old value = 6.72623263e-44</font></div><div><font class="Apple-style-span" face="monospace">New value = -2.22548424e-10</font></div><div><font class="Apple-style-span" face="monospace">0x00002aaaad38167a in realloc (ptr=<optimized out>, size=<optimized out>) at src/jemalloc.c:1244</font></div><div><font class="Apple-style-span" face="monospace">1244<span class="Apple-tab-span" style="white-space:pre">  </span>in src/jemalloc.c</font></div><div><font class="Apple-style-span" face="monospace">1: x/i $pc</font></div><div><font class="Apple-style-span" face="monospace">=> 0x2aaaad38167a <realloc+650>:<span class="Apple-tab-span" style="white-space:pre">    </span>movdqa %xmm0,(%rax)</font></div><div><font class="Apple-style-span" face="monospace">Continuing.</font></div><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">If you look into "Calling conventions for different C++ compilers and operating systems". (I assume should be fine for C also, as they are compatible).</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">64-bit Linux. Callee-saved registers: RBX, RBP, R12-R15. All fine in jemallo _realloc_:</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">Dump of assembler code for function realloc:</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaaad3803f0 <+0>:     push   %r15</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaaad3803f2 <+2>:     push   %r14</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaaad3803f4 <+4>:     push   %r13</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaaad3803f6 <+6>:     push   %r12</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaaad3803f8 <+8>:     push   %rbp</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaaad3803f9 <+9>:     mov    %rsi,%rbp</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">  0x00002aaaad3803fc <+12>:    push   %rbx</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">But all other registers are scratch registers.</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">Also looking into "System V Application Binary Interface AMD64 Architecture Processor Supplement" (October 7, 2013) [3.2.1 section]</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">Registers %rbp, %rbx and %r12 through %r15 "belong" to the calling function and the called function is required to preserve their values. In other words, a called function must preserve these registers' values for its caller. Remaining registers "belong" to the called function. If a calling function wants to preserve such a register value across a function call, it must save the value in its local stack frame.</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">Simply put, according to this /lib64/ld-linux-x86-64.so.2 dynamic linker/loader (_dl_lookup_symbol_x) before calling _realloc_ had to take the action to protect xmm0 register value.</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">You cannot compile jemalloc without SSE:</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">include/jemalloc/internal/prof.h:349:40: error: SSE register return with SSE disabled</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">If we cannot jemalloc from using SSE registers, how can we go around the problem?</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">1240   if (config_stats && ret != NULL) {</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">1241     thread_allocated_t *ta;</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">1242     assert(usize == isalloc(ret, config_prof));</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">1243     ta = thread_allocated_tsd_get();</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">1244     ta->allocated += usize;</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">1245     ta->deallocated += old_size;</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">1246   }</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">In _realloc_ 1244 line is wrapped around if with config_stats. Compiling jemalloc with --disable-stats options disables statistic collection, should also slightly increase performance.</span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">It's a bit worrisome that arguments can change in between function calls.</span><font class="Apple-style-span" face="monospace"><br></font><span class="Apple-style-span" style="font-family: monospace; "><br></span><span class="Apple-style-span" style="font-family: monospace; ">david</span></div></body></html>