wasmtime / PR #3766 Consolidate methods of memory initial... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #3766 Consolidate methods of memory initial...

Wasmtime GitHub notifications bot (Feb 03 2022 at 21:07):

alexcrichton opened PR #3766 from better-memory-init to main:

This commit consolidates the few locations that we have which are
performing memory initialization. Namely the uffd logic for creating
paged memory as well as the memfd logic for creating a memory image now
share an implementation to avoid duplicating bounds-checks or other
validation conditions. The main purpose of this commit is to fix a
fuzz-bug where a multiplication overflowed. The overflow itself was
benign but it seemed better to fix the overflow in only one place
instead of multiple.

The overflow in question is specifically when an initializer is checked
to be statically out-of-bounds and multiplies a memory's minimum size by
the wasm page size, returning the result as a u64. For
memory64-memories of size 1 << 48 this multiplication will overflow.
This was actually a preexisting bug with the try_paged_init function
which was copied for memfd, but cropped up here since memfd is used more
often than paged initialization. The fix here is to skip validation of
the end index if the size of memory is 1 << 64 since if the end
index can be represented as a u64 then it's in-bounds. This is
somewhat of an esoteric case, though, since a memory of minimum size 1 << 64 can't ever exist (we can't even ask the os for that much memory,
and even if we could it would fail).

Wasmtime GitHub notifications bot (Feb 03 2022 at 21:08):

alexcrichton updated PR #3766 from better-memory-init to main.

Wasmtime GitHub notifications bot (Feb 03 2022 at 21:46):

alexcrichton updated PR #3766 from better-memory-init to main.

Wasmtime GitHub notifications bot (Feb 03 2022 at 21:47):

alexcrichton updated PR #3766 from better-memory-init to main.

Wasmtime GitHub notifications bot (Feb 03 2022 at 22:32):

alexcrichton requested cfallin for a review on PR #3766.

Wasmtime GitHub notifications bot (Feb 03 2022 at 23:36):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Feb 03 2022 at 23:36):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Feb 03 2022 at 23:36):

cfallin created PR review comment:

get_cur_size_pages or get_cur_size_in_pages maybe? get_cur_page_size is otherwise ambiguous to me ("sure, the current page size is 64KiB").

Wasmtime GitHub notifications bot (Feb 03 2022 at 23:36):

cfallin created PR review comment:

Comments with arg-names (the /* get_cur_page_size = */ ... idiom) might be useful here to make clearer what each closure is used for. (Or at least that's a style I like, but subjective, feel free to ignore!)

Wasmtime GitHub notifications bot (Feb 03 2022 at 23:36):

cfallin created PR review comment:

if !self.module.memory_initialization.is_segmented() { return; } maybe, with suitable helper on the enum?

Wasmtime GitHub notifications bot (Feb 03 2022 at 23:36):

cfallin created PR review comment:

Could we have doc-comments here to describe what each return-code means?

Wasmtime GitHub notifications bot (Feb 03 2022 at 23:36):

cfallin created PR review comment:

Could we avoid the conditional-on-overflow logic by testing for (end / WASM_PAGE_SIZE) > get_cur_page_size(memory_index) instead, i.e., doing the comparison in the Wasm-page-count domain?

(Or I guess, the above but rounding up, so ((end + WASM_PAGE_SIZE - 1) / WASM_PAGE_SIZE), maybe with that wrapped in a helper...)

Wasmtime GitHub notifications bot (Feb 03 2022 at 23:36):

cfallin created PR review comment:

I wonder if there is a way to refactor this so that we write the segments directly to the memfd -- the advantage of that scheme is that we don't pay RSS cost for zero-filled holes in the middle of the image. Here we will write the whole image into the memfd, including the zeroes, so the kernel (which afaik doesn't check written pages for "all zeroes, can maintain as a hole") needs to keep around all the pages.

Wasmtime GitHub notifications bot (Feb 03 2022 at 23:36):

cfallin created PR review comment:

Against that, though, is the fact that for a lot of images one big pwrite to the memfd is probably faster than a lot of little ones. Maybe worth benchmarking the memfd creation (by continually loading and throwing away a Module and either instantiating once, or forcing memfd creation eagerly) for some module with a big heap?

And also, this approach is much cleaner and easier to maintain, and IIRC the zero-filled holes in the memfd will be filled in by fresh zero-pages the first time someone CoWs each one, so perhaps it's reasonable to just frontload that work once we commit to having a memfd image. Will leave this up to you...

Wasmtime GitHub notifications bot (Feb 03 2022 at 23:36):

cfallin created PR review comment:

interesting idiom here -- any reason not to do [offset..(offset+data.len())]? This is kind of elegant but it took me a second to parse (my brain wanted to see it as a 2D array somehow at first).

Wasmtime GitHub notifications bot (Feb 04 2022 at 15:41):

alexcrichton created PR review comment:

Ah yeah my hope here was to avoid having to deal with rounding logic and being off-by-one by accident. I think this would work if end were rounded up but that's still tricky checked arithmetic to do as well.

Wasmtime GitHub notifications bot (Feb 04 2022 at 15:41):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Feb 04 2022 at 15:42):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Feb 04 2022 at 15:42):

alexcrichton created PR review comment:

I've done this in a few other places in Wasmtime as well but the main benefits in my opinion are:

Slightly more ergonomic (don't have to repeat the base twice)

Guaranteed no overflow (no additions which could accidentally overflow)

Although I agree that if it's the first time seeing this it can be a bit odd.

Wasmtime GitHub notifications bot (Feb 04 2022 at 15:45):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Feb 04 2022 at 15:45):

alexcrichton created PR review comment:

Ah yeah I forgot to mention this in the PR description, my bad, but I changed this for the main reason if theoretically future-proofing at some point. Eventually implementing https://github.com/bytecodealliance/wasmtime/issues/3758 will entail having doing this logic, all the time, at compile time and storing the .data section appropriately. In light of that I figured it'd be good to avoid the memfd dependence in building this image so it can be moved to wasmtime-environ or something like that in the future.

Otherwise with https://github.com/bytecodealliance/wasmtime/issues/3758 we'd ideally only need to create the memfd, do a single write, and then start mmap-ing from there.

In terms of the rss cost for this I'm not personally too too worried because we effectively already do this for the paged initialization transition where we get most of the data segments resident in our own rss. If this becomes an issue though we can tweak to improve various metrics though.

Wasmtime GitHub notifications bot (Feb 04 2022 at 15:49):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Feb 04 2022 at 15:49):

alexcrichton created PR review comment:

Oh I meant to come back to this but I ended up not using anything but Ok for testing, so I went ahead and switched this to a bool.

Wasmtime GitHub notifications bot (Feb 04 2022 at 15:50):

alexcrichton updated PR #3766 from better-memory-init to main.

Wasmtime GitHub notifications bot (Feb 04 2022 at 16:04):

alexcrichton created PR review comment:

I'm not opposed to such an idiom but personally I feel it's more motivation for a different API (it's a pain otherwise to ensure all call-sites have comments like this), so I brought back the InitMemory enum as a way to encapsulate the first two arguments here so there's only one closure-as-argument.

Wasmtime GitHub notifications bot (Feb 04 2022 at 16:04):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Feb 04 2022 at 16:04):

alexcrichton updated PR #3766 from better-memory-init to main.

Wasmtime GitHub notifications bot (Feb 04 2022 at 18:38):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Feb 04 2022 at 18:38):

cfallin created PR review comment:

Wow, nice, the enum-of-named-args is fantastic. Thanks!

Wasmtime GitHub notifications bot (Feb 04 2022 at 18:40):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Feb 04 2022 at 18:40):

cfallin created PR review comment:

Ah, yes, that's a great point; I had forgotten that paged initialization already effectively does this. I think this is definitely reasonable overhead given that.

Wasmtime GitHub notifications bot (Feb 04 2022 at 18:42):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Feb 04 2022 at 18:42):

cfallin created PR review comment:

I'll take this as a "learned a nice Rust idiom from Alex" moment then :-)

Wasmtime GitHub notifications bot (Feb 04 2022 at 18:43):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Feb 04 2022 at 19:17):

alexcrichton merged PR #3766.

Last updated: Apr 17 2025 at 02:30 UTC