jemalloc coring in je_bitmap_set
jasone at canonware.com
Tue Aug 18 12:44:58 PDT 2015
On Aug 18, 2015, at 11:53 AM, Paul Marquess <Paul.Marquess at owmobility.com> wrote:
>> From: Jason Evans [mailto:jasone at canonware.com]
>> On Aug 18, 2015, at 8:49 AM, Paul Marquess <Paul.Marquess at owmobility.com> wrote:
>>>> From: Jason Evans [mailto:jasone at canonware.com]
>>>> On Aug 18, 2015, at 5:14 AM, Paul Marquess <Paul.Marquess at owmobility.com> wrote:
>>>>> I see a reference to a fix for arena_tcache_fill_small and corruption in the 4.0 ChangeLog. Any chance it could be the root cause for this issue?
>>>> It's possible, but the failure mode for that bug depends on failing to map memory (i.e. extreme memory pressure).
>>> do you mean a failure in the call to mmap? Assume that isn't necessarily catastrophic (otherwise I assume you would assert straight away).
>> Yes, mmap() and sbrk() failure. It should simply result in malloc() returning NULL, but the arena_tcache_fill_small bug you mentioned caused corruption that would later cause crashes.
> Guess we need to wrap jemalloc's malloc and get it to assert when it gets a null. Perhaps get a dump of jemallocs state -- would the stats interface in jemalloc will still be operational if we are OOM? Alternative is to get the stats from the core -- I see there are a couple of core file postmortem scripts for jemalloc knocking about, but none seem to support 3.6.
You might be able to strace and audit the mmap() failures, but an easier solution would be to add an abort() in the known bad code path within arena_tcache_fill_small() so that you know if you've hit the failure mode.
> Something else has occurred to me - we had a problem with THP and uninterruptable sleep (~30 seconds) very recently that was fixed by tuning the swappiness parameter. When researching that I spotted a number of threads that suggested that the combination of THP and jemalloc can result in memory growth. This thread is an example https://www.digitalocean.com/company/blog/transparent-huge-pages-and-alternative-memory-allocators/ . I know it's too much of a stretch to suggest that this is the root cause of the OOM, but if it does cause memory growth it won't help.
> Do you have any feeling whether it is safe to have jemalloc and THP at the same time?
I've had pretty poor experience with the mixture even within the past month. The problem is that at some point (under a day of intermittent benchmarking in all the cases I observed) the kernel gets into a fragmented memory state that it cannot recover from without a reboot, and the only obvious indications are decreased performance and increased page faults.
More information about the jemalloc-discuss