cfallin commented on issue #3819:
@alexcrichton I've been playing a bit with different modules to see if the above heuristics work, and I'm becoming increasingly worried. For example I have a wizened SpiderMonkey wasm with
init_size
of 7.8MB anddata_size
of 4.7MB; with some more playing with allocation ordering and such I'm sure I could push that under the 50%-density limit. Wasm modules that are Wizened from garbage-collected languages are especially vulnerable, because the memory layout tends to be more special than just "keep appending data", but really any process that snapshots could result in lower-density images because the right malloc/free pattern can create fragmentation in the heap.My fundamental worry is, still, the performance cliff: if the user gets unlucky, they fall into a much lower-performance mode. It feels like we're optimizing for the wrong thing here: an abstract notion of sparse memory usage and leanness, but at the cost of real-world scenarios.
To give another example, if I write a program with a native toolchain and embed data in it, and that data happens to have lots of zeroes,
ld
is happy to produce a.data
or.rodata
that is huge, proportional with my program's initial memory footprint. It's optimizing for mmap-ability of the data.I think we should do the same thing: fundamentally, we're compiling to a representation that is "close to the metal", and part of that closeness is that it has a one-to-one image of what will go into memory. Making that a behavior that one has to fit the right heuristics to get just feels wrong somehow; like JavaScript engines all over again.
So, I'd like to argue that we revert this change, and go back to a static limit of some sort, as I had proposed above. We could perhaps make it user-configurable (I'd happily agree that baked-in arbitrary limits are bad), but I want to make sure the cliff is harder to hit than "oops, did too much wizening and heap is a bit sparse" :-)
Thoughts?
cfallin edited a comment on issue #3819:
@alexcrichton I've been playing a bit with different modules to see if the above heuristics work, and I'm becoming increasingly worried. For example I have a wizened SpiderMonkey wasm with
init_size
of 7.8MB anddata_size
of 4.7MB; with some more playing with allocation ordering and such I'm sure I could push that under the 50%-density limit. Wasm modules that are Wizened from garbage-collected languages are especially vulnerable, because the memory layout tends to be more special than just "keep appending data", but really any process that snapshots could result in lower-density images because the right malloc/free pattern can create fragmentation in the heap.My fundamental worry is, still, the performance cliff: if the user gets unlucky, they fall into a much lower-performance mode. It feels like we're optimizing for the wrong thing here: an abstract notion of sparse memory usage and leanness, but at the cost of real-world scenarios.
To give another example, if I write a program with a native toolchain and embed data in it, and that data happens to have lots of zeroes,
ld
is happy to produce a.data
or.rodata
that is huge, proportional with my program's initial memory footprint. It's optimizing for mmap-ability of the data.I think we should do the same thing: fundamentally, we're compiling to a representation that is "close to the metal", and part of that closeness is that it has a one-to-one image of what will go into memory. Making that a behavior that one has to fit the right heuristics to get just feels wrong somehow; like JavaScript engines all over again.
So, I'd like to argue that we revert this change, and go back to a static limit of some sort, as I had proposed above (EDIT: actually in the discussion in #3815). We could perhaps make it user-configurable (I'd happily agree that baked-in arbitrary limits are bad), but I want to make sure the cliff is harder to hit than "oops, did too much wizening and heap is a bit sparse" :-)
Thoughts?
cfallin commented on issue #3819:
Ah, and one more advantage of a static limit-based approach is that we can take advantage of the "image and leftovers" aspect of our initialization data structure: we can build a dense image, ready to mmap, for all memory up to the last data segment that is below our bound; and then only do eager initialization for data beyond that.
This means that e.g. if we have relatively dense heap down in the 0..N MB range, and then a random byte up at 1GiB, we don't reject the whole thing and do eager init of N megabytes; instead we continue to do memfd as normal and then just do eager init of the one random high byte.
I'll go ahead and create an issue for this rather than braindumping on a closed PR, sorry :-)
Last updated: Dec 23 2024 at 13:07 UTC