jameysharp requested elliottt for a review on PR #8457.
jameysharp requested wasmtime-compiler-reviewers for a review on PR #8457.
jameysharp opened PR #8457 from jameysharp:cache-allocatable
to bytecodealliance:main
:
The implementation of
PRegSet::from(self.vcode.machine_env())
does a surprising amount of work: it lazily initializes aOnceLock
in the backend, then loops over six vectors of registers and adds them all to thePRegSet
.We could likely implement that better, but in the meantime at least we can avoid repeating that work for every single machine instruction. The
PRegSet
itself only takes a few words to store so it's cheap to just keep it around.I discovered this because when the call to
self.vcode.machine_env()
is in the middle of the loop, that prevents holding a mutable borrow on parts ofself.vcode
, which I want to be able to do in another PR.
cfallin commented on PR #8457:
This one really surprised me so I went looking at history to figure out if we ever measured it: it looks like first appeared in #6957; previously the
PRegSet
was computed once when the lowering backend was created; and we didn't benchmark that. The PR did note caching theMachineEnv
but we missed caching the derived information.Out of curiosity have you seen a perf improvement with this PR? (We should do it even if not, but it seems to me it should improve compile time)
cfallin submitted PR review:
also I'll go ahead and r+ since I dug into this (sorry to preempt reviewers!).
jameysharp commented on PR #8457:
According to hyperfine, this commit is 1.02 ± 0.01 times faster when running
wasmtime compile -C cache=n ../sightglass/benchmarks/spidermonkey/benchmark.wasm
. That's actually a bigger effect than I expected since I thought operand collection is a pretty small part of the overall time.
jameysharp merged PR #8457.
Last updated: Jan 24 2025 at 00:11 UTC