<div dir="ltr">We added --enable-fill and junk:true to the mix and ran headlong into a wall of libnuma init code. Looks like glibc malloc followed by jemalloc free. But even backporting the glibc hooks code from 3.x didn't help.<div> <br></div><div style>We'll try 3.4 in the morning, see where that takes us. </div><div style><br></div><div style>Thanks for the pointers so far. :)</div><div style><br></div><div style>Cheers,</div><div style><br></div> <div style> Doc</div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Mon, Jun 3, 2013 at 9:11 PM, Jason Evans <span dir="ltr"><<a href="mailto:jasone@canonware.com" target="_blank">jasone@canonware.com</a>></span> wrote:<br> <blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div style="word-wrap:break-word"><div><div class="h5"><div><div>On Jun 3, 2013, at 8:19 PM, Rogier 'DocWilco' Mulhuijzen <<a href="mailto:rogier+jemalloc@fastly.com" target="_blank">rogier+jemalloc@fastly.com</a>> wrote:</div> <blockquote type="cite"><div dir="ltr"><div>We're currently using jemalloc 2.2.5 statically linked into a private fork of Varnish with a very high rate of malloc/calloc/free, and we're seeing segfaults on a somewhat frequent basis. (one a day on a group of 6 hosts.)</div> <div><br></div><div>We had the same segfaults with 2.2.3, and upgrading to 2.2.5 seems not to have helped.</div><div><br></div><div>(Also, we tried upgrading to 3.3.1 and things just got worse, tried enabling debugging which made it even more worse. Under time pressure, we dropped down to 2.2.5)</div> <div><br></div><div>I should mention that I backported the mmap strategy from 3.3.1 into 2.2.5, to prevent VM fragmentation, which was causing us to run into vm.max_map_count.</div><div><br></div><div> So, to the meat of the problem! (We saw these in both 2.2.3 without the mmap strategy backported, and 2.2.5 with mmap strategy backported.) </div><div><br></div><div>Unfortunately, we don't have core files (we're running with 153G resident, and 4075G virtual process size on one of the hosts that I'm looking at right now) so the internal Varnish (libgcc based) backtrace is all we have:</div> <div><br></div><div><div style="font-size:13px"><font face="courier new, monospace"><b>0x483894</b>: <span style="background-color:rgb(255,255,204)">arena_tcache_fill_small</span>+1a4<br></font></div> <div style="font-size:13px"><font face="courier new, monospace">0x4916b9: tcache_alloc_small_hard+19</font></div><div style="font-size:13px"><font face="courier new, monospace">0x4841bf: arena_malloc+1bf</font></div><div style="font-size:13px"> <font face="courier new, monospace">0x47b498: calloc+218</font></div><div><br></div><div>Looking that up:</div><div><br></div><div><div><div><font face="courier new, monospace"># addr2line -e /usr/sbin/varnishd -i 0x483894</font></div> <div><font face="courier new, monospace">/varnish-cache/lib/libjemalloc/include/jemalloc/internal/bitmap.h:101</font></div><div><font face="courier new, monospace">/varnish-cache/lib/libjemalloc/include/jemalloc/internal/bitmap.h:140</font></div> <div><font face="courier new, monospace">/varnish-cache/lib/libjemalloc/src/arena.c:264</font></div><div><font face="courier new, monospace">/varnish-cache/lib/libjemalloc/src/arena.c:1395</font></div></div><div><br></div> <div>Which looks like:</div><div><br></div><div><div><font face="courier new, monospace"><span style="white-space:pre-wrap">97 </span>goff = bit >> LG_BITMAP_GROUP_NBITS;</font></div><div><font face="courier new, monospace"><span style="white-space:pre-wrap">98 </span>gp = &bitmap[goff];</font></div> <div><font face="courier new, monospace"><span style="white-space:pre-wrap">99 </span>g = *gp;</font></div><div><font face="courier new, monospace"><span style="white-space:pre-wrap">100 </span>assert(g & (1LU << (bit & BITMAP_GROUP_NBITS_MASK)));</font></div> <div><font face="courier new, monospace"><span style="white-space:pre-wrap"><b>101</b> </span>g ^= 1LU << (bit & BITMAP_GROUP_NBITS_MASK);</font></div><div><font face="courier new, monospace"><span style="white-space:pre-wrap">102 </span>*gp = g;</font></div> <div><br></div><div>Which makes no sense at first, since there's no deref being done there, but a disassembly (thanks Devon) shows:</div><div><br></div></div></div><div><div><font face="courier new, monospace"> 483883: 48 c1 ef 06 shr $0x6,%rdi</font></div> <div><font face="courier new, monospace"> 483887: 83 e1 3f and $0x3f,%ecx</font></div><div><font face="courier new, monospace"> 48388a: 4c 8d 04 fa lea (%rdx,%rdi,8),%r8</font></div> <div><font face="courier new, monospace"> 48388e: 49 d3 e1 shl %cl,%r9</font></div><div><font face="courier new, monospace"> 483891: 4c 89 c9 mov %r9,%rcx</font></div><div> <font face="courier new, monospace"> <b>483894</b>: 49 33 08 xor (%r8),%rcx</font></div><div><br></div><div>The optimizer got rid of g and just does the xor straight on *gp. So gp is an illegal address. According to our segfault handler, it's NULL.</div> <div><br></div><div>For gp to be NULL, both bitmap and goff need to be NULL. And bitmap being NULL is somewhat impossible due to:</div><div><br></div><div><div><font face="courier new, monospace"><span style="white-space:pre-wrap"> </span>if ((run = bin->runcur) != NULL && run->nfree > 0)</font></div> <div><font face="courier new, monospace"><span style="white-space:pre-wrap"> </span>ptr = arena_run_reg_alloc(run, &arena_bin_info[binind]);</font></div><div><br></div></div><div>bitmap is an offset to run, so both the offset and the run need to be NULL (or perfectly matched to cancel eachother out, but also unlikely.)</div> <div><br></div><div>bin->runcur and bin->bitmap_offset both being NULL seems _very_ unlikely.</div><div><br></div><div>And that's about as far as we've gotten.</div><div><br></div> <div>Help?</div></div></div></div></blockquote><br></div></div></div><div>This sort of crash can happen as a distant result of a double-free. For example:</div><div><br></div><div>1) free(p)</div><div>2) free(p) corrupts run counters, causing the run to be deallocated.</div> <div>3) free(q) causes the deallocated run to be placed back in service, but with definite corruption.</div><div>4) malloc(…) tries to allocate from run, but run metadata are in a bad state.</div><br><div>My suggestion is to enable assertions (something like: CFLAGS=-O3 ./configure --enable-debug --disable-tcache), disable tcache (which can keep double free bugs from exercising the assertions), and look for the source of corruption. I'll be surprised if the problem deviates substantially from the above, but if it does, then my next bet will be a buffer overflow corrupting run metadata.</div> <span class="HOEnZb"><font color="#888888"><div><br></div><div>Jason</div></font></span></div></blockquote></div><br></div>