Stream: git-wasmtime

Topic: wasmtime / PR #3787 Use mmap'd `*.cwasm` as a source for ...


view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 04:58):

alexcrichton opened PR #3787 from reuse-cwasm-mmap to main:

This commit addresses #3758 and makes it possible to avoid memfd_create when loading a module from a precompiled binary stored on disk. In this situation we already mmap the file from disk, and we can use the same technique that memfd uses where a copy-on-write mapping is made whenever a module is instantiated. This measn that all Unix platforms, not only Linux, can benefit from copy-on-write so long as the module comes from a precompiled module on disk.

The first commit here is refactoring to enable this functionality on Linux. After the first commit we avoid creation of a memfd and instead map the raw underlying *.cwasm into memories. This involved moving the creation of the memory image to compile-time of Module rather than construction-time of Module, as well as aligning the data section to ensure it shows up at a page-offset boundary in the file (which is required by mmap). The second commit then enables this support to work on macOS which involved some #[cfg] followed by tweaking the madvise logic to instead blow away the mapping (no reuse on systems without madvise(DONTNEED) as there's no portable way to reset the CoW overlay)

I tried for a bit to get this working on Windows, but while I could get things to compile I don't believe the same technique we're using here for Unix works on Windows. Windows appears to reject mapping a file onto a pre-existing region allocated with VirtualAlloc, meaning that all attemps to map the file into memory have failed so far for me. This StackOverflow question seems to suggest that this may simply not work on Windows unless we use undocumented APIs. In any case the major benefit of this PR is avoiding extra file descriptors on Unix for modules created from files on disk, so while having Windows support would be nice it's not necessarily required.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 06:43):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 06:43):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 06:43):

cfallin created PR review comment:

Any particular reason this uses region while the other mmap-related bits use rustix? (The awkward MprotectFlags maybe? Or stemming from an attempt to get this working on Windows too?)

We actually have an interesting diversity of wrappers around mprotect in the codebase -- region::protect seems more common based on a cursory grep just now, while rustix::io::mprotect is used in the fiber library, here in memfd, and to set up a signal stack. So this is probably a larger question of "which syscall wrapper do we prefer". But IMHO at least within one unit of abstraction (the MemFdSlot) we should probably stick to one or else comment why we choose either one at each callsite...

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 06:43):

cfallin created PR review comment:

It might be a little clearer to put the (u32, Range<u32>) in a struct here (though I see it's mirroring the enum arm above)...

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 06:43):

cfallin created PR review comment:

Could we refactor this to hold an Arc<File> instead, to avoid the slightly subtle manual-mutual-exclusion note here?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 15:35):

alexcrichton updated PR #3787 from reuse-cwasm-mmap to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 15:37):

alexcrichton updated PR #3787 from reuse-cwasm-mmap to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 15:46):

alexcrichton updated PR #3787 from reuse-cwasm-mmap to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 16:05):

alexcrichton updated PR #3787 from reuse-cwasm-mmap to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 16:06):

alexcrichton updated PR #3787 from reuse-cwasm-mmap to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 16:08):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 16:08):

alexcrichton created PR review comment:

Ah yeah this was "future proofing" to prepare for a Windows implementation, although that didn't pan out and may not ever pan out.

I dunno what to do about how we call mprotect. I personally think where we can we should use cross-platform crates, but I wouldn't necessarily go so far as to say we should actively change platform-specific areas to use cross-platform abstractions.

I'm happy to revert this back if you'd prefer.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 16:13):

alexcrichton updated PR #3787 from reuse-cwasm-mmap to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 18:28):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 18:28):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 18:28):

cfallin created PR review comment:

images seems to be created up to num_defined_memories below, but it's a PrimaryMap<MemoryIndex, _>; could we either use DefinedMemoryIndex or fill it up to the total memory count?

This seems a bit different than #3782 as we are type-safe wrt the index, but would just lead to an index-out-of-bounds if there is an imported memory I think...

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 18:28):

cfallin created PR review comment:

Interesting, was this a bug in the original code? I see the insertion above contents.entry(page_index) so this is correct now, but seemed to be using a page index as a byte offset previously. Or was the first tuple element used as a page index elsewhere?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 18:28):

cfallin created PR review comment:

Maybe add a note here why this still works with mmap? In other words why is it OK that we don't align subsequent segments? (Is it that when using mmap-memory mode, all segments are multiple-of-page-sized?)

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 18:28):

cfallin created PR review comment:

here also the index variable defined_memory seems to imply that images only contains DefinedMemoryIndexs; should be consistent with what we do above. map contains an entry for every memory, defined or imported, so this loop is correct if images is over all memories as well; just a naming issue I think.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 18:28):

cfallin created PR review comment:

Update comment here? "check if Linux" -> "check if Unix family" and something about how we can partially support memfd-like operation on other Unices too...

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 18:28):

cfallin created PR review comment:

OK, yeah, that makes sense. There's a broader story here about how we are maybe wanting to use rustix for most system-interfacing things (@sunfishcode comment?) but that's definitely in tension with reusing existing crates that do what we want. Here given the use of rustix in surrounding code I think I'd tie-break toward using rustix but will defer to what @sunfishcode prefers.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 20:04):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 20:04):

alexcrichton created PR review comment:

I looked at a few other places but I think this was just a typo because everywhere else named and used this as a page index. I was a bit worried myself though and had to do a few double-takes as I updated this!

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 20:08):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 20:08):

alexcrichton created PR review comment:

Ah I think this is actually trying to be a bit too clever. Since all the segments are already aligned in their length there's actually no need to pass in a conditional alignment and this can always pass in the desired alignment with an assert that everything ends up adjacent to one another.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 20:13):

alexcrichton updated PR #3787 from reuse-cwasm-mmap to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 20:14):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 20:14):

alexcrichton created PR review comment:

I'll just switch this back to rustix, it's pretty minor and not really worth debating to me.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 20:17):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 20:17):

alexcrichton created PR review comment:

I ended up using MemoryIndex here instead of DefinedMemoryIndex because the interface to init_memory works on MemoryIndex (as an initializer can be for any memory) and otherwise translating between the two would require extra callbacks in the InitMemory::Runtime case.

Otherwise though none of the specialized initialization techniques work with imported memories anyway so I don't think anything is lost with using MemoryIndex since in the cases the optimization is applicable the two are equal. I'll double-check these areas though and make sure they're all prepared to use MemoryIndex.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 20:18):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 20:18):

alexcrichton created PR review comment:

Ah yeah this is a historical name, good catch though and definitely needed a rename.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 20:19):

alexcrichton updated PR #3787 from reuse-cwasm-mmap to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 20:39):

alexcrichton updated PR #3787 from reuse-cwasm-mmap to main.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 10 2022 at 21:40):

alexcrichton merged PR #3787.


Last updated: Oct 23 2024 at 20:03 UTC