Stream: git-wasmtime

Topic: wasmtime / PR #8881 Wasmtime: remove indirect-call caching.


view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2024 at 15:51):

cfallin requested wasmtime-core-reviewers for a review on PR #8881.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2024 at 15:51):

cfallin requested elliottt for a review on PR #8881.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2024 at 15:51):

cfallin opened PR #8881 from cfallin:remove-indirect-call-cache to bytecodealliance:main:

In the original development of this feature, guided by JS AOT compilation to Wasm of a microbenchmark heavily focused on IC sites, I was seeing a ~20% speedup. However, in more recent measurements, on full programs (e.g., the Octane benchmark suite), the benefit is more like 5%.

Moreover, in #8870, I attempted to switch over to a direct-mapped cache, to address a current shortcoming of the design, namely that it has a hard-capped number of callsites it can apply to (50k) to limit impact on VMContext struct size. With all of the needed checks for correctness, though, that change results in a 2.5% slowdown relative to no caching at all, so it was dropped.

In the process of thinking through that, I discovered the current design on main incorrectly handles null funcrefs: it invokes a null code pointer, rather than loading a field from a null struct pointer. The latter was specifically designed to cause the necessary Wasm trap in #8159, but I had missed that the call to a null code pointer would not have the same effect. As a result, we actually can crash the VM (safely at least, but still no good vs. a proper Wasm trap!) with the feature enabled. (It's off by default still.) That could be fixed too, but at this point with the small benefit on real programs, together with the limitation on module size for full benefit, I think I'd rather opt for simplicity and remove the cache entirely.

Thus, this PR removes call-indirect caching. It's not a direct revert because the original PR refactored the call-indirect generation into smaller helpers and IMHO it's a bit nicer to keep that. But otherwise all traces of the setting, code pre-scan during compilation and special conditions tracked on tables, and codegen changes are gone.

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2024 at 15:52):

cfallin edited PR #8881:

In the original development of this feature, guided by JS AOT compilation to Wasm of a microbenchmark heavily focused on IC sites, I was seeing a ~20% speedup. However, in more recent measurements, on full programs (e.g., the Octane benchmark suite), the benefit is more like 5%.

Moreover, in #8870, I attempted to switch over to a direct-mapped cache, to address a current shortcoming of the design, namely that it has a hard-capped number of callsites it can apply to (50k) to limit impact on VMContext struct size. With all of the needed checks for correctness, though, that change results in a 2.5% slowdown relative to no caching at all, so it was dropped.

In the process of thinking through that, I discovered the current design on main incorrectly handles null funcrefs (EDIT: thanks @alexcrichton for finding the actual bug): it invokes a null code pointer, rather than loading a field from a null struct pointer. The latter was specifically designed to cause the necessary Wasm trap in #8159, but I had missed that the call to a null code pointer would not have the same effect. As a result, we actually can crash the VM (safely at least, but still no good vs. a proper Wasm trap!) with the feature enabled. (It's off by default still.) That could be fixed too, but at this point with the small benefit on real programs, together with the limitation on module size for full benefit, I think I'd rather opt for simplicity and remove the cache entirely.

Thus, this PR removes call-indirect caching. It's not a direct revert because the original PR refactored the call-indirect generation into smaller helpers and IMHO it's a bit nicer to keep that. But otherwise all traces of the setting, code pre-scan during compilation and special conditions tracked on tables, and codegen changes are gone.

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2024 at 15:58):

cfallin updated PR #8881.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2024 at 17:10):

elliottt submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2024 at 17:11):

elliottt commented on PR #8881:

Looks like the fuzzer is still setting the flag, but otherwise this looks good!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2024 at 17:12):

cfallin requested fitzgen for a review on PR #8881.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2024 at 17:12):

cfallin requested wasmtime-fuzz-reviewers for a review on PR #8881.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2024 at 17:12):

cfallin updated PR #8881.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2024 at 17:12):

cfallin has enabled auto merge for PR #8881.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2024 at 17:36):

cfallin merged PR #8881.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2024 at 18:13):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 27 2024 at 18:14):

fitzgen commented on PR #8881:

(Whoops, I guess I'm late to the party!)


Last updated: Nov 22 2024 at 16:03 UTC