Diagnosing an out-of-memory issue
Matthew Fleming
mdf at purestorage.com
Sun Jun 26 13:00:59 PDT 2016
I'm not sure which details will be relevant so I may be including too much
info below. I'm using jemalloc with custom hooks to manage about 54GB of
virtual space under Linux on x86_64. The hooks manage the address space in
2MB chunks so I can use HUGETLB for the mappings. Slightly simplified, the
hooks do the following (roughly as expected, I think):
alloc: mmap size bytes, aligned appropriately
dalloc: munmap the space (not really, I recycle the memory internally,
but it's logically the same)
commit: return false
decommit: return true
purge: return true
split: return false
merge: return false
We're experiencing some out-of-memory issues, mostly due to a known runaway
allocation site that we're working to fix. However, while debugging this
I'm seeing some numbers for jemalloc usage that leave me concerned. At the
time of the OOM, I can see that we indeed have 54GB of virtual space used
(we're using rlimit to set a 54GB limit for the process).
However, I also see the following from je_malloc_stats_print at the time we
cross the 54GB virtual threshold which seems low to me:
Allocated: 38825889776, active: 47743795200, metadata: 1304838720,
resident: 50520985600, mapped: 56247713792
Current active ceiling: 47758442496
The "mapped" number is about what I expect, though it's 170MB less than my
internal tracking for how much memory I've mmap'd on behalf of jemalloc.
The "allocated" number is about 217MB less than my internal stats for
allocated memory; this may be mostly explained by the sampled nature of the
internal tracking.
Where I'm wondering if I'm mis-using jemalloc is in the allocated vs active
vs mapped numbers. Allocated/active implies 0.813 utilization; is this
expected? Active/mapped adds further gives a 0.848 utilization; is this
expected? It seems somewhere between unfortunate and buggy that jemalloc
calls my alloc hook for more virtual/physical space, when there's only 69%
of the total mapped space used. This turns my 54GB vmem limit into
something like a 37GB limit on actual allocations, a loss of 17GB!
One thing I think I did wrong that I am fixing is that I had
set opt.lg_tcache_max: 21; based on the actual use of the system I don't
think I need to have a per-thread cache for anything over 16kB. I have no
visibility (even slightly laggy) to how much memory is held in the tcache,
though. This would be a nice addition to the available stats, even if the
number isn't completely accurate.
Given the above and the below output from je_malloc_stats_print, is there
something I should configure differently to increase the utilization?
Thanks,
matthew
___ Begin jemalloc statistics ___
Version: 4.1.0-0-gdf900dbfaf4835d3efc06d771535f3e781544913
Assertions disabled
config.malloc_conf: ""
Run-time option settings:
opt.abort: false
opt.lg_chunk: 21
opt.dss: "secondary"
opt.narenas: 21
opt.purge: "decay"
opt.decay_time: 10 (arenas.decay_time: 10)
opt.stats_print: false
opt.junk: "false"
opt.quarantine: 0
opt.redzone: false
opt.zero: false
opt.tcache: true
opt.lg_tcache_max: 21
CPUs: 12
Arenas: 48
Pointer size: 8
Quantum size: 16
Page size: 4096
Unused dirty page decay time: 10
Maximum thread-cached size class: 1835008
Chunk size: 2097152 (2^21)
Allocated: 38825889776, active: 47743795200, metadata: 1304838720,
resident: 50520985600, mapped: 56247713792
Current active ceiling: 47758442496
Merged arenas stats:
assigned threads: 11
dss allocation precedence: N/A
decay time: N/A
purging: dirty: 359649, sweeps: 18735557, madvises: 64538355, purged:
915821832
allocated nmalloc ndalloc nrequests
small: 1618612208 26450158003 26440185546 148302974020
large: 36131438592 441168860 439050125 1180219164
huge: 1075838976 54888130 54887967 54888130
total: 38825889776 26946214993 26934123638 149538081314
active: 47743795200
mapped: 56233033728
metadata: mapped: 1292746752, allocated: 793984
bins: size ind allocated nmalloc ndalloc
nrequests curregs curruns regs pgs util nfills
nflushes newruns reruns
8 0 4192 349342 348818
41984555 524 11 512 1 0.093 347692 347682
15 0
16 1 1805168 43122017 43009194
201623700 112823 547 256 1 0.805 8143530
6417247 905 31912645
32 2 3945472 1842947572 1842824276
18921652196 123296 1670 128 1 0.576 40800047
22703370 4755 134712298
48 3 88656480 2001520582 1999673572
15560123972 1847010 7541 256 3 0.956 61107051
25005586 14729 446762237
64 4 56346816 936219506 935339087
5919950953 880419 15190 64 1 0.905 133815037
21059415 138786 222734703
80 5 86376160 1402859775 1401780073
6023020031 1079702 4880 256 5 0.864 45967794
20401097 85696 166631904
96 6 331611264 2042094072 2038639788
10953875367 3454284 27965 128 3 0.965 40994516
24013014 40509 288532462
112 7 136533600 1253096836 1251877786
7849997010 1219050 5020 256 7 0.948 23127251
17518027 10065 56736373
128 8 28319360 2430633468 2430412223
9470625182 221245 7119 32 1 0.971 87741552
76841006 1678615 637627316
160 9 21233920 3233662185 3233529473
17894546291 132712 1286 128 5 0.806 52806079
35356743 20838 368359381
192 10 64915968 1209644070 1209305966
4695311003 338104 5856 64 3 0.902 40455748
23101690 118479 137558943
224 11 1229312 272846888 272841400
7012427279 5488 97 128 7 0.442 59281528
11699583 42724 6366157
256 12 44845824 540042217 539867038
4205630140 175179 11070 16 1 0.989 38709263
34813124 5431286 103861682
320 13 4193280 354635837 354622733
8541328111 13104 539 64 5 0.379 67427289
12833951 1946 62194021
384 14 30578688 830982635 830903003
3808003373 79632 3345 32 3 0.743 41546452
29267868 79509 271336916
448 15 44839872 528279209 528179120
2270660939 100089 1639 64 7 0.954 32588339
14463657 185137 64071098
512 16 445440 209085658 209084788
2527518820 870 191 8 1 0.569 34641562
24354278 386421 121284418
640 17 1509120 189654651 189652293
2529781103 2358 180 32 5 0.409 53300183
12946814 49319 37855722
768 18 817152 210276500 210275436
3257956662 1064 102 16 3 0.651 35559459
18299827 352671 61638834
896 19 5743360 3790511276 3790504866
7581256328 6410 463 32 7 0.432 128296275
119172450 107917 722034569
1024 20 9027584 415024766 415015950
1368606679 8816 2264 4 1 0.973 54423863
43598539 14338600 279004617
1280 21 3084800 334930678 334928268
2149980741 2410 278 16 5 0.541 35416539
24664973 16634 124638965
1536 22 107687424 33395873 33325764
143703270 70109 9289 8 3 0.943 14551529
10066925 22560 30812888
1792 23 36802304 84087854 84067317
217549352 20537 1533 16 7 0.837 12925762
11132031 212546 19066936
2048 24 8978432 74733317 74728933
132169103 4384 2269 2 1 0.966 32978464
12327652 28115125 21184016
2560 25 14789120 626371065 626365288
2296577585 5777 1108 8 5 0.651 70421894
63681746 260083 501109733
3072 26 10238976 87645674 87642341
164219425 3333 907 4 3 0.918 35166344
13919648 12387988 25106936
3584 27 14504448 140711376 140707329
145743368 4047 637 8 7 0.794 57560488
18782918 7780878 27391074
4096 28 1024000 9058231 9057981
8589353 250 250 1 1 1 4689969 4712652
9058231 0
5120 29 55188480 104117284 104106505
129556193 10779 3174 4 5 0.849 27152021
16324563 6920719 54731978
6144 30 55007232 30050999 30042046
30253925 8953 4774 2 3 0.937 12228886 7225550
9103712 13147192
7168 31 127353856 84675496 84657729
85715811 17767 7999 4 7 0.555 37265985 12369079
1683955 43293648
8192 32 93347840 91124841 91113446
89087490 11395 11395 1 2 1 32622573 12729483
91124841 0
10240 33 45895680 47916142 47911660
469251054 4482 2303 2 5 0.973 22100898
11166697 13230071 24133515
12288 34 30437376 29315410 29312933
27345931 2477 2477 1 3 1 10001686 6828625
29315410 0
14336 35 51294208 934534701 934531123
1577351725 3578 1866 2 7 0.958 125545462
93872858 167354355 709325811
large: size ind allocated nmalloc ndalloc
nrequests curruns
16384 36 34473050112 219503281 217399213
853588850 2104068
20480 37 1966080 5286476 5286380
8286948 96
24576 38 983040 3069401 3069361
3894568 40
28672 39 1318912 2215034 2214988
2677460 46
32768 40 261586944 115345719 115337736
116374883 7983
40960 41 32194560 10100309 10099523
18344991 786
49152 42 3047424 2590612 2590550
3339092 62
57344 43 15310848 8760214 8759947
12070456 267
65536 44 3801088 3080967 3080909
3960506 58
81920 45 6717440 1998825 1998743
2963024 82
98304 46 2555904 872938 872912
1024995 26
114688 47 314245120 25977780 25975040
27400439 2740
131072 48 7208960 319016 318961
333657 55
163840 49 207585280 11961322 11960055
80123089 1267
196608 50 5111808 384416 384390
399129 26
229376 51 2752512 315906 315894
326293 12
262144 52 4718592 158872 158854
164259 18
327680 53 44564480 657448 657312
682735 136
393216 54 7864320 300621 300601
311973 20
458752 55 9633792 197798 197777
208182 21
524288 56 5242880 224315 224305
237273 10
655360 57 486277120 27600223 27599481
43253891 742
786432 58 7864320 100402 100392
101401 10
917504 59 8257536 9420 9411
10343 9
1048576 60 45088768 18217 18174
18225 43
1310720 61 26214400 9154 9134
12066 20
1572864 62 135266304 109479 109393
109741 86
1835008 63 11010048 695 689
695 6
huge: size ind allocated nmalloc ndalloc
nrequests curhchunks
2097152 64 71303168 22983 22949
22983 34
2621440 65 99614720 54861945 54861907
54861945 38
3145728 66 6291456 104 102
104 2
3670016 67 7340032 34 32
34 2
4194304 68 0 10 10
10 0
5242880 69 0 2904 2904
2904 0
6291456 70 6291456 7 6
7 1
7340032 71 44040192 13 7
13 6
8388608 72 50331648 8 2
8 6
10485760 73 754974720 111 39
111 72
12582912 74 0 2 2
2 0
14680064 75 14680064 4 3
4 1
16777216 76 0 1 1
1 0
20971520 77 20971520 2 1
2 1
25165824 78 0 1 1
1 0
29360128 79 0 1 1
1 0
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://jemalloc.net/mailman/jemalloc-discuss/attachments/20160626/e4f782b8/attachment-0001.html>
More information about the jemalloc-discuss
mailing list