wasmtime / PR #8509 Wasmtime: add one-entry call-indirect... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #8509 Wasmtime: add one-entry call-indirect...

Wasmtime GitHub notifications bot (Apr 29 2024 at 22:44):

cfallin opened PR #8509 from cfallin:call-indirect-caching to bytecodealliance:main:

In WebAssembly, an indirect call is somewhat slow, because of the indirection required by CFI (control-flow integrity) sandboxing. In particular, a "function pointer" in most source languages compiled to Wasm is represented by an index into a table of funcrefs. The call_indirect instruction then has to do the following steps to invoke a function pointer:

Load the funcref table's base and length values from the vmctx.

Bounds-check the invoked index against the actual table size; trap if out-of-bounds.

Spectre mitigation (cmove) on that bounds-check.

Load the vmfuncref from the table given base and index.

For lazy table init, check if this is a non-initialized funcref pointer, and initialize the entry.

Load the signature from the funcref struct and compare it against the call_indirect's expected signature; trap if wrong.

Load the actual code pointer for the callee's Wasm-ABI entry point.

Load the callee vmctx (which may be different for a cross-module call).

Put that vmctx in arg 0, our vmctx in arg 1, and invoke the loaded code pointer with an indirect call instruction.

Compare and contrast to the process involved in invoking a native function pointer:

Invoke the code pointer with an indirect call instruction.

This overhead buys us something -- it is part of the SFI sandbox boundary -- but it is very repetitive and unnecessary work in most cases when indirect function calls are performed repeatedly (such as within an inner loop).

This PR introduces the idea of caching: if we know that the result of all the above checks won't change, then if we use the same index as "the last time" (for some definition), we can skip straight to the "invoke the code pointer" step, with a cached code pointer from that last time.

Concretely, it introduces a two-word struct inlined into the vmctx for each call_indirect instruction in the module (up to a limit):

The last invoked index;

The code pointer that index corresponded to.

When compiling the module, we check whether the table could possibly be mutable at a given index once read: any instructions like table.set, or the whole table exported thus writable from the outside. We also check whether index 0 is a non-null funcref. If neither of these things are true, then we know we can cache an index-to-code-pointer mapping, and we know we can use index 0 as a sentinel for "no cached value".

We then make use of the struct for each indirect call site and generate code to check if the index matches; if so, call cached pointer; if not, load the vmfuncref, check the signature, check that the callee vmctx is the same as caller (intra-module case), and stash the code pointer and index away (fill the cache), then make the call.

On an in-development branch of SpiderMonkey-in-Wasm with ICs (using indirect calls), this is about a 20% speedup; I haven't yet measured on other benchmarks. It is expected that this might be an instantiation-time slowdown due to a larger vmctx (but we could use madvise to zero if needed).

This feature is off by default right now.

Wasmtime GitHub notifications bot (Apr 29 2024 at 22:44):

cfallin requested fitzgen for a review on PR #8509.

Wasmtime GitHub notifications bot (Apr 29 2024 at 22:44):

cfallin requested wasmtime-core-reviewers for a review on PR #8509.

Wasmtime GitHub notifications bot (Apr 29 2024 at 22:47):

cfallin commented on PR #8509:

(I'll run some benchmarks to see what the compile-time impact is like, and how this benefits non-SpiderMonkey workloads; TBD.)

Wasmtime GitHub notifications bot (Apr 29 2024 at 22:56):

cfallin updated PR #8509.

Wasmtime GitHub notifications bot (Apr 29 2024 at 23:45):

cfallin commented on PR #8509:

Benchmarks:

Sightglass measures the following for spidermonkey.wasm:
compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 1456249767.40 ± 1315409020.02 (confidence = 99%)

  original.so is 1.01x to 1.17x faster than cache-indirect-calls.so!

  [17934023164 18070879867.60 18215783322] cache-indirect-calls.so
  [15377276916 16614630100.20 17840489432] original.so

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 27069455.80 ± 22219096.46 (confidence = 99%)

  original.so is 1.00x to 1.05x faster than cache-indirect-calls.so!

  [1079208474 1095097604.00 1142114396] cache-indirect-calls.so
  [1035034082 1068028148.20 1094035960] original.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [834480 856949.40 902120] cache-indirect-calls.so
  [831744 860156.60 978994] original.so
so compilation time does take a little hit but my system is noisy; somewhere between 1% and 17%, the mean looks to be around 8.7%. Instantiation apparently unaffected. And execution time... slightly slower? (But again, noisy system.)

I tried running the next benchmark in the list (pulldown-cmark) and Sightglass crashed mysteriously, so on to trusty hyperfine: no deltas in runtime on spidermonkey or bz2, my two usual suspects.

I suspect that mostly we'll see mostly in-the-noise results for most benchmarks because indirect calls are somewhat rare in C-ish code; it will matter mostly when compiling dynamic languages, or something like Java with a lot of virtual calls. The speedup on weval'd SpiderMonkey is real though so I'd like to at least have this in tree as an option :-)

Wasmtime GitHub notifications bot (Apr 30 2024 at 02:44):

github-actions[bot] commented on PR #8509:

Subscribe to Label Action

cc @saulecabrera

<details>
This issue or pull request has been labeled: "wasmtime:api", "wasmtime:config", "winch"

Thus the following users have been cc'd because of the following labels:

saulecabrera: winch

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Apr 30 2024 at 02:44):

github-actions[bot] commented on PR #8509:

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:

[ ] If you added a new Config method, you wrote extensive documentation for
it.

<details>

Our documentation should be of the following form:

```text
Short, simple summary sentence.

More details. These details can be multiple paragraphs. There should be
information about not just the method, but its parameters and results as
well.

Is this method fallible? If so, when can it return an error?

Can this method panic? If so, when does it panic?

Example

Optional example here.
```

</details>

[ ] If you added a new Config method, or modified an existing one, you
ensured that this configuration is exercised by the fuzz targets.

<details>

For example, if you expose a new strategy for allocating the next instance
slot inside the pooling allocator, you should ensure that at least one of our
fuzz targets exercises that new strategy.

Often, all that is required of you is to ensure that there is a knob for this
configuration option in [wasmtime_fuzzing::Config][fuzzing-config] (or one
of its nested structs).

Rarely, this may require authoring a new fuzz target to specifically test this
configuration. See [our docs on fuzzing][fuzzing-docs] for more details.

</details>

[ ] If you are enabling a configuration option by default, make sure that it
has been fuzzed for at least two weeks before turning it on by default.

[fuzzing-config]: https://github.com/bytecodealliance/wasmtime/blob/ca0e8d0a1d8cefc0496dba2f77a670571d8fdcab/crates/fuzzing/src/generators.rs#L182-L194
[fuzzing-docs]: https://docs.wasmtime.dev/contributing-fuzzing.html

<details>

To modify this label's message, edit the <code>.github/label-messager/wasmtime-config.md</code> file.

To add new label messages or remove existing label messages, edit the
<code>.github/label-messager.json</code> configuration file.

Learn more.

</details>

Wasmtime GitHub notifications bot (Apr 30 2024 at 15:26):

fitzgen commented on PR #8509:

FWIW, I did a 100-iteration sightglass run of the default suite and got these results:

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [472878 584750.13 1435913] cache.so
  [471975 647599.28 1762837] main.so

execution :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [7422842 8744920.24 28196476] cache.so
  [7439782 9281270.19 32569639] main.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [165456 245749.33 728069] cache.so
  [161764 251740.72 723740] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [95484875 104221743.11 196171026] cache.so
  [95454637 105918836.28 208434699] main.so

compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [358282063 404458268.52 568579026] cache.so
  [367557812 399158971.64 551388853] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [194765165 225581880.12 437412507] cache.so
  [197703277 222917937.46 281571195] main.so

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [895587497 923280792.22 972838288] cache.so
  [891917856 927697965.55 1121115203] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [8325711261 8502522129.94 8732981273] cache.so
  [8370183347 8530524550.49 9984684780] main.so

instantiation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [214742 289658.22 800685] cache.so
  [204642 290351.57 854625] main.so

I think it is safe to say that this doesn't have an affect on our regular suite, and I expect that the JS speed ups you are reporting rely on weval'ing first.

Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen submitted PR review:

Overall looks great, but I have a few nitpicks, mostly related to improving things for future code readers. Thanks!

Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen submitted PR review:

Overall looks great, but I have a few nitpicks, mostly related to improving things for future code readers. Thanks!

Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

Care to add a brief comment to this test file about what we are aiming to exercise here, so that future changes that happen to tweak this disassembly know what property to check is preserved in their changes?

Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

All the rest of our offset getters are using typed indices; can we define a CallIndirectSiteIndex (feel free to word smith) type for these? It just takes three lines in crates/types/src/lib.rs:

https://github.com/bytecodealliance/wasmtime/blob/0f4c0d4a358187f99f23897d306c8b2a3c94287e/crates/types/src/lib.rs#L833-L846

Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

That's a lot of "index", perhaps we could reword this a little and rename the index parameter to call_site? I think this is a little more clear:
    /// Return the offset from `self` of `self.call_indirect_caches[call_site].index`.
(And similar above)

Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

(ditto for all these tests)

Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

Could you add some comments here for future readers and possibly factor this scan out into a helper method to help prevent this function from getting even more gigantic?

Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

This could really use some more detail. As a user, why shouldn't I just enable this all the time? What are the trade offs? What is the default? What does it do? When does it apply? How does it affect memory footprint and pooling allocator integration? Etc...

The automatic github actions comment has a good template in the first check box: https://github.com/bytecodealliance/wasmtime/pull/8509#issuecomment-2084268706

Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

indireaching? This is a typo and not an intentional contraction, right?

Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

Ditto

Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

Could we factor this one out into smaller helpers and add comments describing the code and blocks we are emitting as well? I'm finding it really hard to read this giant block of clif-building code directly. See for example the overview comment and comments for each block we generate here:

https://github.com/bytecodealliance/wasmtime/blob/0f4c0d4a358187f99f23897d306c8b2a3c94287e/crates/cranelift/src/gc/enabled.rs#L341-L449

Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

Speaking of that comment, the second item doesn't seem to be satisfied in this PR either: adding support for fuzzing this option. It should be just a couple-line change in these places:

https://github.com/bytecodealliance/wasmtime/blob/0f4c0d4a358187f99f23897d306c8b2a3c94287e/crates/fuzzing/src/generators/config.rs#L458

https://github.com/bytecodealliance/wasmtime/blob/0f4c0d4a358187f99f23897d306c8b2a3c94287e/crates/fuzzing/src/generators/config.rs#L150

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:11):

cfallin updated PR #8509.

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin updated PR #8509.

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin created PR review comment:

Done!

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin created PR review comment:

Done!

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin created PR review comment:

Done!

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin created PR review comment:

Done!

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin created PR review comment:

For sure, added explanations here and below giving the reasons for expecting/not expecting the opt, and the key bits of IR we're asserting.

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin created PR review comment:

Updated!

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:16):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:16):

cfallin created PR review comment:

Updated!

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:16):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:16):

cfallin created PR review comment:

Huh, should be "indirect call caching"; somehow my fingers or a misplaced editor command swallowed the middle bits of that, sorry!

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:16):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:16):

cfallin created PR review comment:

Added a similar comment and factored out a little bit of the logic (and reduced the CFG by one block while I was at it) -- hopefully clearer now!

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:30):

cfallin updated PR #8509.

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:31):

cfallin commented on PR #8509:

Updated, let me know what you think!

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:33):

cfallin commented on PR #8509:

(Oh, and forgot to mention: thanks for the thorough benchmarks on this! I've gotta dig into why Sightglass is so finicky for me...)

Wasmtime GitHub notifications bot (Apr 30 2024 at 20:36):

cfallin updated PR #8509.

Wasmtime GitHub notifications bot (Apr 30 2024 at 22:44):

github-actions[bot] commented on PR #8509:

Subscribe to Label Action

cc @fitzgen

<details>
This issue or pull request has been labeled: "fuzzing", "wasmtime:ref-types"

Thus the following users have been cc'd because of the following labels:

fitzgen: fuzzing, wasmtime:ref-types

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (May 01 2024 at 16:14):

fitzgen commented on PR #8509:

Oh also, I mentioned this in the Cranelift meeting, but wanted to put it down here for posterity too: it seems like the limit on the number of caches isn't actually implemented.

Wasmtime GitHub notifications bot (May 01 2024 at 16:53):

cfallin updated PR #8509.

Wasmtime GitHub notifications bot (May 01 2024 at 16:53):

cfallin commented on PR #8509:

Yep, I missed that entirely in this iteration of the patch -- sorry about that! Just updated. I ended up wiring the limit through to a new config option (with docs, CLI flag, fuzzing) so that I could easily test it without a huge test input.

Wasmtime GitHub notifications bot (May 01 2024 at 16:55):

cfallin updated PR #8509.

Wasmtime GitHub notifications bot (May 01 2024 at 21:20):

fitzgen submitted PR review:

:rocket:

Wasmtime GitHub notifications bot (May 01 2024 at 21:20):

fitzgen submitted PR review:

:rocket:

Wasmtime GitHub notifications bot (May 01 2024 at 21:20):

fitzgen created PR review comment:

This is great, thank you. I really appreciate having this stuff factored out into bite-sized chunks.

Wasmtime GitHub notifications bot (May 01 2024 at 23:15):

cfallin updated PR #8509.

Wasmtime GitHub notifications bot (May 01 2024 at 23:17):

cfallin commented on PR #8509:

The merge queue did its job (nice!) and found a conflict with #8515 which merged after this PR's CI ran. I pushed a fix which is fairly small but @fitzgen maybe PTAL at least commit just to be sure?

Wasmtime GitHub notifications bot (May 01 2024 at 23:19):

cfallin edited a comment on PR #8509:

The merge queue did its job (nice!) and found a conflict with #8515 which merged after this PR's CI ran. I pushed a fix which is fairly small but @fitzgen maybe PTAL at least the last commit just to be sure?

Wasmtime GitHub notifications bot (May 01 2024 at 23:21):

cfallin commented on PR #8509:

(or @jameysharp feel free to review last commit too as it interacts with your change in #8515!)

Wasmtime GitHub notifications bot (May 02 2024 at 03:47):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (May 02 2024 at 03:47):

fitzgen created PR review comment:

Oh I just noticed this new index type is in between the definition of VMSharedTypeIndex and the impl block defining methods for it. Can we move it before or after so that those two are together?

Wasmtime GitHub notifications bot (May 02 2024 at 03:47):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (May 02 2024 at 03:47):

fitzgen created PR review comment:

Is this allowed to return false even when the value is actually zero? Jamey is adding support for the extended const proposal and it extends const expressions with addition, subtraction, and multiplication, so there will soon be many ways to evaluate to zero. Doesn’t mean we can’t land this PR now but we should be clear about the semantics of this method

Wasmtime GitHub notifications bot (May 02 2024 at 06:11):

cfallin updated PR #8509.

Wasmtime GitHub notifications bot (May 02 2024 at 06:11):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (May 02 2024 at 06:11):

cfallin created PR review comment:

Clarified a bit -- renamed to provably_nonzero_i32 (inverted from is-possibly-nonzero -- clearer this way IMHO) and commented more. It's always allowed to return false (can't prove anything), but if true, must actually be a case we know is always nonzero (currently only an i32.const with nonzero constant).

Wasmtime GitHub notifications bot (May 02 2024 at 06:11):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (May 02 2024 at 06:11):

cfallin created PR review comment:

Ah, yeah, didn't notice that; moved down a bit.

Wasmtime GitHub notifications bot (May 02 2024 at 06:12):

cfallin has enabled auto merge for PR #8509.

Wasmtime GitHub notifications bot (May 02 2024 at 06:17):

jameysharp submitted PR review.

Wasmtime GitHub notifications bot (May 02 2024 at 06:17):

jameysharp created PR review comment:

I would suggest naming this with the opposite sense, like: is_definitely_nonzero, which returns true for I32Const(x) if x != 0, and may also return true if there are any other circumstances where it can prove this, but otherwise returns false. Then, extended constants or any other strange things will safely return false, meaning that you can't be sure your optimization is valid, and you'll only apply the optimization when it's definitely okay.

Further, when we follow Nick's suggestion for making the ConstExpr evaluator try to run even before an Instance is available, it will be clear how to invoke that to implement this. If it fails because it needs inputs which aren't known until instantiation, or it succeeds but returns 0, then we should return that the expression is not "definitely nonzero".

Wasmtime GitHub notifications bot (May 02 2024 at 06:19):

jameysharp submitted PR review.

Wasmtime GitHub notifications bot (May 02 2024 at 06:19):

jameysharp created PR review comment:

Hah, raced; I'm glad we're on the same page.

Wasmtime GitHub notifications bot (May 02 2024 at 06:49):

cfallin merged PR #8509.

Wasmtime GitHub notifications bot (May 03 2024 at 18:29):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (May 03 2024 at 18:29):

alexcrichton created PR review comment:

Could these be moved to -O options as "tuning" or "optimizaion" options?

The WasmOptions group here is intended for semantic differences in the generated code and these options shouldn't affect any generated code.

(thanks for adding cli flags though!)

Wasmtime GitHub notifications bot (May 03 2024 at 18:32):

cfallin created PR review comment:

Ah, yeah, for sure -- will do in a followup PR.

Wasmtime GitHub notifications bot (May 03 2024 at 18:32):

cfallin submitted PR review.

Last updated: Apr 18 2025 at 21:03 UTC