Stream: git-wasmtime

Topic: wasmtime / PR #8509 Wasmtime: add one-entry call-indirect...


view this post on Zulip Wasmtime GitHub notifications bot (Apr 29 2024 at 22:44):

cfallin opened PR #8509 from cfallin:call-indirect-caching to bytecodealliance:main:

In WebAssembly, an indirect call is somewhat slow, because of the indirection required by CFI (control-flow integrity) sandboxing. In particular, a "function pointer" in most source languages compiled to Wasm is represented by an index into a table of funcrefs. The call_indirect instruction then has to do the following steps to invoke a function pointer:

Compare and contrast to the process involved in invoking a native function pointer:

This overhead buys us something -- it is part of the SFI sandbox boundary -- but it is very repetitive and unnecessary work in most cases when indirect function calls are performed repeatedly (such as within an inner loop).

This PR introduces the idea of caching: if we know that the result of all the above checks won't change, then if we use the same index as "the last time" (for some definition), we can skip straight to the "invoke the code pointer" step, with a cached code pointer from that last time.

Concretely, it introduces a two-word struct inlined into the vmctx for each call_indirect instruction in the module (up to a limit):

When compiling the module, we check whether the table could possibly be mutable at a given index once read: any instructions like table.set, or the whole table exported thus writable from the outside. We also check whether index 0 is a non-null funcref. If neither of these things are true, then we know we can cache an index-to-code-pointer mapping, and we know we can use index 0 as a sentinel for "no cached value".

We then make use of the struct for each indirect call site and generate code to check if the index matches; if so, call cached pointer; if not, load the vmfuncref, check the signature, check that the callee vmctx is the same as caller (intra-module case), and stash the code pointer and index away (fill the cache), then make the call.

On an in-development branch of SpiderMonkey-in-Wasm with ICs (using indirect calls), this is about a 20% speedup; I haven't yet measured on other benchmarks. It is expected that this might be an instantiation-time slowdown due to a larger vmctx (but we could use madvise to zero if needed).

This feature is off by default right now.

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Apr 29 2024 at 22:44):

cfallin requested fitzgen for a review on PR #8509.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 29 2024 at 22:44):

cfallin requested wasmtime-core-reviewers for a review on PR #8509.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 29 2024 at 22:47):

cfallin commented on PR #8509:

(I'll run some benchmarks to see what the compile-time impact is like, and how this benefits non-SpiderMonkey workloads; TBD.)

view this post on Zulip Wasmtime GitHub notifications bot (Apr 29 2024 at 22:56):

cfallin updated PR #8509.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 29 2024 at 23:45):

cfallin commented on PR #8509:

Benchmarks:

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 1456249767.40 ± 1315409020.02 (confidence = 99%)

  original.so is 1.01x to 1.17x faster than cache-indirect-calls.so!

  [17934023164 18070879867.60 18215783322] cache-indirect-calls.so
  [15377276916 16614630100.20 17840489432] original.so

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 27069455.80 ± 22219096.46 (confidence = 99%)

  original.so is 1.00x to 1.05x faster than cache-indirect-calls.so!

  [1079208474 1095097604.00 1142114396] cache-indirect-calls.so
  [1035034082 1068028148.20 1094035960] original.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [834480 856949.40 902120] cache-indirect-calls.so
  [831744 860156.60 978994] original.so

so compilation time does take a little hit but my system is noisy; somewhere between 1% and 17%, the mean looks to be around 8.7%. Instantiation apparently unaffected. And execution time... slightly slower? (But again, noisy system.)

I tried running the next benchmark in the list (pulldown-cmark) and Sightglass crashed mysteriously, so on to trusty hyperfine: no deltas in runtime on spidermonkey or bz2, my two usual suspects.

I suspect that mostly we'll see mostly in-the-noise results for most benchmarks because indirect calls are somewhat rare in C-ish code; it will matter mostly when compiling dynamic languages, or something like Java with a lot of virtual calls. The speedup on weval'd SpiderMonkey is real though so I'd like to at least have this in tree as an option :-)

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 02:44):

github-actions[bot] commented on PR #8509:

Subscribe to Label Action

cc @saulecabrera

<details>
This issue or pull request has been labeled: "wasmtime:api", "wasmtime:config", "winch"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 02:44):

github-actions[bot] commented on PR #8509:

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:

[fuzzing-config]: https://github.com/bytecodealliance/wasmtime/blob/ca0e8d0a1d8cefc0496dba2f77a670571d8fdcab/crates/fuzzing/src/generators.rs#L182-L194
[fuzzing-docs]: https://docs.wasmtime.dev/contributing-fuzzing.html


<details>

To modify this label's message, edit the <code>.github/label-messager/wasmtime-config.md</code> file.

To add new label messages or remove existing label messages, edit the
<code>.github/label-messager.json</code> configuration file.

Learn more.

</details>

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 15:26):

fitzgen commented on PR #8509:

FWIW, I did a 100-iteration sightglass run of the default suite and got these results:

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [472878 584750.13 1435913] cache.so
  [471975 647599.28 1762837] main.so

execution :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [7422842 8744920.24 28196476] cache.so
  [7439782 9281270.19 32569639] main.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [165456 245749.33 728069] cache.so
  [161764 251740.72 723740] main.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [95484875 104221743.11 196171026] cache.so
  [95454637 105918836.28 208434699] main.so

compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [358282063 404458268.52 568579026] cache.so
  [367557812 399158971.64 551388853] main.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [194765165 225581880.12 437412507] cache.so
  [197703277 222917937.46 281571195] main.so

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [895587497 923280792.22 972838288] cache.so
  [891917856 927697965.55 1121115203] main.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [8325711261 8502522129.94 8732981273] cache.so
  [8370183347 8530524550.49 9984684780] main.so

instantiation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [214742 289658.22 800685] cache.so
  [204642 290351.57 854625] main.so

I think it is safe to say that this doesn't have an affect on our regular suite, and I expect that the JS speed ups you are reporting rely on weval'ing first.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen submitted PR review:

Overall looks great, but I have a few nitpicks, mostly related to improving things for future code readers. Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen submitted PR review:

Overall looks great, but I have a few nitpicks, mostly related to improving things for future code readers. Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

Care to add a brief comment to this test file about what we are aiming to exercise here, so that future changes that happen to tweak this disassembly know what property to check is preserved in their changes?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

All the rest of our offset getters are using typed indices; can we define a CallIndirectSiteIndex (feel free to word smith) type for these? It just takes three lines in crates/types/src/lib.rs:

https://github.com/bytecodealliance/wasmtime/blob/0f4c0d4a358187f99f23897d306c8b2a3c94287e/crates/types/src/lib.rs#L833-L846

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

That's a lot of "index", perhaps we could reword this a little and rename the index parameter to call_site? I think this is a little more clear:

    /// Return the offset from `self` of `self.call_indirect_caches[call_site].index`.

(And similar above)

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

(ditto for all these tests)

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

Could you add some comments here for future readers and possibly factor this scan out into a helper method to help prevent this function from getting even more gigantic?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

This could really use some more detail. As a user, why shouldn't I just enable this all the time? What are the trade offs? What is the default? What does it do? When does it apply? How does it affect memory footprint and pooling allocator integration? Etc...

The automatic github actions comment has a good template in the first check box: https://github.com/bytecodealliance/wasmtime/pull/8509#issuecomment-2084268706

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

indireaching? This is a typo and not an intentional contraction, right?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

Ditto

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

Could we factor this one out into smaller helpers and add comments describing the code and blocks we are emitting as well? I'm finding it really hard to read this giant block of clif-building code directly. See for example the overview comment and comments for each block we generate here:

https://github.com/bytecodealliance/wasmtime/blob/0f4c0d4a358187f99f23897d306c8b2a3c94287e/crates/cranelift/src/gc/enabled.rs#L341-L449

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 16:20):

fitzgen created PR review comment:

Speaking of that comment, the second item doesn't seem to be satisfied in this PR either: adding support for fuzzing this option. It should be just a couple-line change in these places:

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:11):

cfallin updated PR #8509.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin updated PR #8509.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin created PR review comment:

Done!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin created PR review comment:

Done!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin created PR review comment:

Done!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin created PR review comment:

Done!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin created PR review comment:

For sure, added explanations here and below giving the reasons for expecting/not expecting the opt, and the key bits of IR we're asserting.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:15):

cfallin created PR review comment:

Updated!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:16):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:16):

cfallin created PR review comment:

Updated!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:16):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:16):

cfallin created PR review comment:

Huh, should be "indirect call caching"; somehow my fingers or a misplaced editor command swallowed the middle bits of that, sorry!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:16):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:16):

cfallin created PR review comment:

Added a similar comment and factored out a little bit of the logic (and reduced the CFG by one block while I was at it) -- hopefully clearer now!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:30):

cfallin updated PR #8509.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:31):

cfallin commented on PR #8509:

Updated, let me know what you think!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:33):

cfallin commented on PR #8509:

(Oh, and forgot to mention: thanks for the thorough benchmarks on this! I've gotta dig into why Sightglass is so finicky for me...)

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 20:36):

cfallin updated PR #8509.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 22:44):

github-actions[bot] commented on PR #8509:

Subscribe to Label Action

cc @fitzgen

<details>
This issue or pull request has been labeled: "fuzzing", "wasmtime:ref-types"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 16:14):

fitzgen commented on PR #8509:

Oh also, I mentioned this in the Cranelift meeting, but wanted to put it down here for posterity too: it seems like the limit on the number of caches isn't actually implemented.

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 16:53):

cfallin updated PR #8509.

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 16:53):

cfallin commented on PR #8509:

Yep, I missed that entirely in this iteration of the patch -- sorry about that! Just updated. I ended up wiring the limit through to a new config option (with docs, CLI flag, fuzzing) so that I could easily test it without a huge test input.

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 16:55):

cfallin updated PR #8509.

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 21:20):

fitzgen submitted PR review:

:rocket:

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 21:20):

fitzgen submitted PR review:

:rocket:

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 21:20):

fitzgen created PR review comment:

This is great, thank you. I really appreciate having this stuff factored out into bite-sized chunks.

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 23:15):

cfallin updated PR #8509.

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 23:17):

cfallin commented on PR #8509:

The merge queue did its job (nice!) and found a conflict with #8515 which merged after this PR's CI ran. I pushed a fix which is fairly small but @fitzgen maybe PTAL at least commit just to be sure?

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 23:19):

cfallin edited a comment on PR #8509:

The merge queue did its job (nice!) and found a conflict with #8515 which merged after this PR's CI ran. I pushed a fix which is fairly small but @fitzgen maybe PTAL at least the last commit just to be sure?

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 23:21):

cfallin commented on PR #8509:

(or @jameysharp feel free to review last commit too as it interacts with your change in #8515!)

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 03:47):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 03:47):

fitzgen created PR review comment:

Oh I just noticed this new index type is in between the definition of VMSharedTypeIndex and the impl block defining methods for it. Can we move it before or after so that those two are together?

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 03:47):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 03:47):

fitzgen created PR review comment:

Is this allowed to return false even when the value is actually zero? Jamey is adding support for the extended const proposal and it extends const expressions with addition, subtraction, and multiplication, so there will soon be many ways to evaluate to zero. Doesn’t mean we can’t land this PR now but we should be clear about the semantics of this method

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 06:11):

cfallin updated PR #8509.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 06:11):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 06:11):

cfallin created PR review comment:

Clarified a bit -- renamed to provably_nonzero_i32 (inverted from is-possibly-nonzero -- clearer this way IMHO) and commented more. It's always allowed to return false (can't prove anything), but if true, must actually be a case we know is always nonzero (currently only an i32.const with nonzero constant).

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 06:11):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 06:11):

cfallin created PR review comment:

Ah, yeah, didn't notice that; moved down a bit.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 06:12):

cfallin has enabled auto merge for PR #8509.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 06:17):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 06:17):

jameysharp created PR review comment:

I would suggest naming this with the opposite sense, like: is_definitely_nonzero, which returns true for I32Const(x) if x != 0, and may also return true if there are any other circumstances where it can prove this, but otherwise returns false. Then, extended constants or any other strange things will safely return false, meaning that you can't be sure your optimization is valid, and you'll only apply the optimization when it's definitely okay.

Further, when we follow Nick's suggestion for making the ConstExpr evaluator try to run even before an Instance is available, it will be clear how to invoke that to implement this. If it fails because it needs inputs which aren't known until instantiation, or it succeeds but returns 0, then we should return that the expression is not "definitely nonzero".

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 06:19):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 06:19):

jameysharp created PR review comment:

Hah, raced; I'm glad we're on the same page.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2024 at 06:49):

cfallin merged PR #8509.

view this post on Zulip Wasmtime GitHub notifications bot (May 03 2024 at 18:29):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 03 2024 at 18:29):

alexcrichton created PR review comment:

Could these be moved to -O options as "tuning" or "optimizaion" options?

The WasmOptions group here is intended for semantic differences in the generated code and these options shouldn't affect any generated code.

(thanks for adding cli flags though!)

view this post on Zulip Wasmtime GitHub notifications bot (May 03 2024 at 18:32):

cfallin created PR review comment:

Ah, yeah, for sure -- will do in a followup PR.

view this post on Zulip Wasmtime GitHub notifications bot (May 03 2024 at 18:32):

cfallin submitted PR review.


Last updated: Dec 23 2024 at 12:05 UTC