Stream: git-wasmtime

Topic: wasmtime / issue #8212 Make WebAssembly `ref.func` instru...


view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2024 at 19:19):

jameysharp added the wasmtime label to Issue #8212.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2024 at 19:19):

jameysharp added the performance label to Issue #8212.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 21 2024 at 19:19):

jameysharp opened issue #8212:

Feature

I would like to remove the callee vmctx field from VMFuncRef. Anywhere that we currently have a pointer to a VMFuncRef, we would instead have a pair of that pointer plus a callee vmctx pointer.

Benefit/Implementation

Currently any use of ref.func in Wasmtime is compiled to a libcall, including table initialization from an element segment. If the function in question is declared within the current module, then the libcall has to initialize a VMFuncRef structure within the vmctx and then return a pointer to that structure. (If it's an import, then we have a VMFuncRef pointer in the corresponding import, and can just return that.)

This is currently necessary because VMFuncRef includes the callee's vmctx pointer, which is not known until instantiation time.

But all of the other fields are constant for a particular function once the module is loaded: The type ID is determined by the engine based on what other modules were loaded previously, and the function pointers are relocated according to the load address of the module, but none of that changes afterward.

So removing the callee vmctx field means we can initialize all the VMFuncRef structures when the module is loaded. Then we can keep a single array of them attached to the module, and remove the space reserved for them in each vmctx. So although tables will need an additional word per element in each instance due to the fat-pointer representation, I think that's more than offset by removing five words per funcref from the vmctx for every instance.

At that point, ref.func on a locally declared function just needs to get the address of a constant index into that module-global array of VMFuncRefs, and pair it with the current vmctx. So ref.func should compile to a base-pointer load and an add for locally declared functions (compared to two loads for imported functions). And that base-pointer load will be notrap and readonly, so it can be subject to GVN and LICM if ref.func is used multiple times or in a loop.

Similarly, initializing tables from element segments can be fast: after loading the VMFuncRef array base pointer once, each locally declared functions can be computed by adding a compile-time constant offset. Maybe it's fast enough to remove the lazy-init optimization entirely, as in #8002.

Alternatives

In #8195 I suggested an alternative representation for read-only funcref tables. Now I've learned that the type IDs aren't known until the module is loaded so that plan doesn't work as written, but the tables could still be quickly unpacked when the module is loaded.

The above proposal is more general than #8195: I believe this should work equally well in all WebAssembly modules and components, even when the table is writable, or an active element segment is applied to an imported table, or an element segment uses global.get. This proposal also doesn't require trampolines for imported functions like that one did. On the other hand, the read-only tables proposal might speed up module loading slightly relative to this plan.

cc: @fitzgen @alexcrichton @cfallin


Last updated: Nov 22 2024 at 17:03 UTC