Transparent Huge Pages

Jakob Blomer jakob.blomer at cern.ch
Tue Feb 21 03:14:46 PST 2012


You're right, the MADV_DONTNEED trick does work for the test program. 
If I understand correctly, the way it works is to first provoke a page 
fault to grab a 2M page, and then, by marking all but the first one of 
these 4k pages with MADV_DONTNEED, let the kernel split it into 4k pages 
and release the real memory.

The problem here is that after some time khugepaged kicks in and merges 
all the small pages back together into a large page.  That might be 
avoidable with another call to madvise() with MADV_NOHUGEPAGE (not in 
the RHEL 6.2 kernel, as far as I can see).  But even then, it looks to 
me a bit shaky.  Is there a guarantee that the kernel follows a 
MADV_DONTNEED advise?

I think for the moment I will check if in practice it just turns out to 
be a constant overhead of ~15M.  And in this case, live with it...

Cheers,
Jakob

On 2/20/12 11:08 PM, Justin Lebar wrote:
> Okay, this behavior is not entirely ridiculous, but at least Firefox's
> fork of jemalloc will need to change to work well with this.
>
> What happens if you MADV_DONTNEED all but the first 4k after you touch
> the first byte?  What about if you MADV_DONTNEED the whole thing
> before you touch any part?
>
> On Mon, Feb 20, 2012 at 7:55 PM, Jakob Blomer<jakob.blomer at cern.ch>  wrote:
>> After thinking a bit more about it, I don't think it's a bug but this is
>> just the way transparent huge pages work.  For properly aligned memory, the
>> kernel takes a 2M page.  This just means 2M of real memory are gone, and I
>> think not even splitting afterwards can change that.
>>
>> The following program requires 300-400k RSS without transparent huge pages,
>> but>2M with THP.
>>
>> #include<unistd.h>
>> #include<sys/mman.h>
>> #include<stdio.h>
>> #include<errno.h>
>>
>> int main() {
>>   int size = 4*1024*1024;
>>   int _2m = 2*1024*1024;
>>   char *mapping = mmap(0x42000000, size, PROT_READ | PROT_WRITE,
>>                        MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
>>   mapping[0] = '\0';
>>   printf("Region of size %d mapped at %p (error %d), aligned at 2M: %d\n",
>> size, mapping, errno, (long)mapping%_2m);
>>
>>   sleep(30);
>>   return 0;
>> }
>>
>> Cheers,
>> Jakob
>>
>>
>> On 2/20/12 5:10 PM, Justin Lebar wrote:
>>>
>>> Hm, upon further consideration...
>>>
>>> If you mmap a huge page (say, 1MB), then MADV_DONTNEED a few 4-KB
>>> chunks inside, transparent huge pages should break up the huge page so
>>> it can decommit the parts I asked it to decommit.  If it doesn't, that
>>> sounds like a kernel bug to me!
>>>
>>> Similarly, if I mmap 1MB, get a huge page, and then touch only a few
>>> bytes in the middle, the kernel shouldn't commit a huge page.
>>>
>>> If huge pages is behaving how I expect, I don't see why it would cause
>>> your application to use more memory.
>>>
>>> Just to check, you're measuring RSS, not vsize, right?
>>>
>>> On Mon, Feb 20, 2012 at 4:59 PM, Justin Lebar<justin.lebar at gmail.com>
>>>   wrote:
>>>>>
>>>>> jemalloc seems to be prone to transparent huge pages
>>>>> (https://lwn.net/Articles/423584), presumably due to its use of mmap().
>>>>>   In
>>>>> my case (fuse module), the initial memory consumption jumped from ~12M
>>>>> to
>>>>> ~27M.  The use of --enable-dss helps a little, bringing the consumption
>>>>> down
>>>>> to ~19M.
>>>>
>>>>
>>>> Ouch!
>>>>
>>>>> Did anyone else experienced similar behavior?  Is there an easy way of
>>>>> avoiding transparent huge pages for jemalloc'ed memory?  The only
>>>>> workaround
>>>>> that comes to my mind is a malloc wrapper that runs madvise(...,
>>>>> MADV_NOHUGEPAGE) on every newly allocated chunk.
>>>>
>>>>
>>>> You'd probably want to do this only on the 1MB chunks jemalloc
>>>> allocates for small and tiny allocations.  For huge allocations (more
>>>> than 1MB), it's likely the user will touch the whole thing, so huge
>>>> pages could be a benefit.
>>>>
>>>>>
>>>>> Cheers,
>>>>> Jakob
>>>>>
>>>>> _______________________________________________
>>>>> jemalloc-discuss mailing list
>>>>> jemalloc-discuss at canonware.com
>>>>> http://www.canonware.com/mailman/listinfo/jemalloc-discuss
>>>
>>> .
>>>
>>
> .
>




More information about the jemalloc-discuss mailing list