alexcrichton opened issue #3815:
Some local fuzzing found today that this module:
(module (memory 65536) (data (i32.const 0) "a") (data (i32.const 1_000_000_000) "b") )
produces a nearly 1GB large artifact, almost entirely consisting of the heap image. Additionally compile-time memory usage is 1GB or so large because we create the in-memory image at compile time.
I think that during static memory initialization we'll want to be more clever about creating the heap image. For example the paged memory initialization is sparse in the sense of it only keeps track of initialized pages, and for this form of initialization we probably also want to do something sparse like that and determine at the end based on nonzero-ranges whether the image is one-memfd-mmap compatible.
cfallin commented on issue #3815:
Interesting, so there is an explicit limit of 1GiB exactly on memfd image size: here; the fuzzer figured out how to turn the knob all the way up I guess.
We could definitely reduce that limit, especially since we support an "image and leftovers" initialization technique now -- a CoW image for "most" of the heap down in a reasonable range (16MiB? 128MiB?) and explicit eager instantiation for whatever bytes are way up high.
We could also think about segmented/sparse approaches where we have multiple pieces, each of which has to be mmap'd, but I'm much more hesitant to go there as it makes the mapping path significantly more complex and adds mmap syscalls and increases the size of the in-kernel address space tree, making pagefaults marginally more expensive.
I'm leaning toward just lowering the limit given the above, but: thoughts?
alexcrichton commented on issue #3815:
I feel like the most robust and simple fix for now might be to, when switching to a static init, do paged init first and if that succeeds only transform to a static initialization scheme if the initialized pages fit within a smaller range than 1GB.
Ideally though the limit on the memfd image should be somewhat proportional to the size of the data segements in the original module. We naturally shouldn't blow up 2 bytes (above) to 1GB but at the same time if you provide a 1GB data section we should presumably still allow that because memfd is still going to be a ton faster than eagerly copying a gigabyte of data.
I definitely think we should avoid adding complication to the current initialization scheme. For example I don't think we should support multiple mmaps nor one mmap plus extra initialization. I don't think we have any modules in the wild which actually need that so all we need to do is get the one-image technique working well but also accounting for esoteric edge cases that avoid stressing the system.
So given all that, I would propose:
- Delegate to
try_paged_init
fromtry_static_init
. Only proceed if paged initialization took hold.- Discard all entirely-zero pages
- Calculate the min/max and if it's less than 2x (?) the size of the original data segments then write all those pages to an in-memory vector as an initialization image.
I think that would solve the above module (we'd create two pages and then realize their limits are way too big: 1GB vs 2 bytes). It would also handle LLVM-produced Rust modules where the heap starts with a megabyte of zeros for the stack, we'd never actually create those zeros and we'd only create the one heap image. Additionally we'd handle modules that already have very large data segments by avoiding making them too much larger but still speeding up their initialization. I believe this would enable us to remove the size limit as well because paged initialization doesn't currently have a size limit and the memory usage is all proportional to the number of data sections in the original wasm module.
cfallin commented on issue #3815:
One thing to note re: above
where the heap starts with a megabyte of zeros for the stack, we'd never actually create those zeros and we'd only create the one heap image.
We actually already do this; we trim the zeroes off the front and back of the image, and use anonymous zero memory when mapping. So "real-world" use-cases will I think almost always be OK -- it's just zeroes "in the middle" (with initialized content on both sides) that cause bloat.
Re: the rest, I'm not opposed to heuristics like this, but I would like to inject a little caution at inventing rules that create performance cliffs. Otherwise, e.g., if layout heuristics in some toolchain change, or someone adds or removes static global data, or whatever, they could potentially find inexplicable speedups or slowdowns. I'd rather just have a maximum image size, based on what we think is reasonable (that's also a cliff but an easier-to-understand one). Seems to go better with the Wasm ethos of explicitness and lack of complex heuristics/cliffs as well. But I'm not completely tied to this, just an (admittedly somewhat strong) preference.
alexcrichton commented on issue #3815:
We only trim zeros at the end, however. The entire image is built in-memory and then before finalizing and serializing to disk (or memfd) it's trimmed. This means that today we have a 1MB overhead for Rust-based modules which have a leading 1MB stack.
I agree about performance cliffs, but no matter how you slice it it's always a performance cliff. If we set the limit to 128MB then as soon as someone has a 129MB heap image their instantiation becomes much slower. I think our goal is to try to make sure the cliff is hit as rarely as possible, and using the initial heap size as a gauge instead of a static constant I think will be less surprising in the long run.
cfallin commented on issue #3815:
We only trim zeros at the end, however
Ah, yes, sorry, you're right here; I had forgotten that we start the mmap from the first nonzero page, but we don't shift the data down to remove the front matter that's never mapped.
So I think my main concern was with very small modules, such that crossing below a 1/2-full threshold results in a sudden degradation. E.g., I have a 128K memory image; I previously had 65K of static data; I remove a few constant arrays, now I'm at 63K, and suddenly my instantiations are a lot slower.
Perhaps we could take some hybrid approach? Images in a "small" category (1MiB or less?) always get to use memfd; that takes care of the above. Then beyond a certain size, we start caring about density, and check for some ratio of nonzero to total pages, as you suggest. Does that seem reasonable?
alexcrichton commented on issue #3815:
Seems reasonable to me, I've pushed that as https://github.com/bytecodealliance/wasmtime/pull/3819
alexcrichton closed issue #3815:
Some local fuzzing found today that this module:
(module (memory 65536) (data (i32.const 0) "a") (data (i32.const 1_000_000_000) "b") )
produces a nearly 1GB large artifact, almost entirely consisting of the heap image. Additionally compile-time memory usage is 1GB or so large because we create the in-memory image at compile time.
I think that during static memory initialization we'll want to be more clever about creating the heap image. For example the paged memory initialization is sparse in the sense of it only keeps track of initialized pages, and for this form of initialization we probably also want to do something sparse like that and determine at the end based on nonzero-ranges whether the image is one-memfd-mmap compatible.
Last updated: Jan 24 2025 at 00:11 UTC