fitzgen requested cfallin for a review on PR #12230.
fitzgen opened PR #12230 from fitzgen:inst-set-cost-function to bytecodealliance:main:
Fixes https://github.com/bytecodealliance/wasmtime/issues/12156
fitzgen requested wasmtime-compiler-reviewers for a review on PR #12230.
fitzgen requested wasmtime-default-reviewers for a review on PR #12230.
fitzgen updated PR #12230.
cfallin commented on PR #12230:
Thanks for the PR! (Nick knows but for anyone else watching) I'm out on PTO then at a conference, so I'll be able to review this Mon, Jan 12 or later.
cfallin submitted PR review:
Got time today to review this actually so: LGTM and I'm really excited about this!
If you don't mind, could we post Sightglass results here before merging? Specifically I'd like to see
cyclesmeasured for compilation and execution time; hopefully we see no or few regressions (as I know you've already seen during some experimentation).
fitzgen commented on PR #12230:
Hrm, yeah
cyclesis looking very different from what I previously saw withinsts-retired, unfortunately. Here is the default and spidermonkey suite's results, with statistically insignificant entries removed:compilation :: cycles :: benchmarks/bz2/benchmark.wasm Δ = 38816006.88 ± 4557789.09 (confidence = 95%) main.so is 1.22x to 1.28x faster than inst-set.so! [173565996 191159750.88 226710599] inst-set.so [134897941 152343744.00 181128243] main.so compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 30708277.27 ± 5797934.73 (confidence = 95%) main.so is 1.11x to 1.16x faster than inst-set.so! [227209069 254002445.00 308181456] inst-set.so [194211038 223294167.73 282640832] main.so compilation :: cycles :: benchmarks/spidermonkey/spidermonkey-regex.wasm Δ = 632063507.40 ± 84792443.24 (confidence = 95%) main.so is 1.09x to 1.11x faster than inst-set.so! [6551514230 7032180956.90 7890587513] inst-set.so [6198102237 6400117449.50 7008087519] main.so compilation :: cycles :: benchmarks/spidermonkey/spidermonkey-markdown.wasm Δ = 628333545.93 ± 98363544.32 (confidence = 95%) main.so is 1.08x to 1.11x faster than inst-set.so! [6494087498 7067526581.28 7805742475] inst-set.so [6158498649 6439193035.35 7315715259] main.so compilation :: cycles :: benchmarks/spidermonkey/spidermonkey-json.wasm Δ = 562993166.38 ± 115506969.41 (confidence = 95%) main.so is 1.07x to 1.10x faster than inst-set.so! [6486850237 7033140774.35 8340342367] inst-set.so [6044689306 6470147607.97 7743603904] main.so execution :: cycles :: benchmarks/spidermonkey/spidermonkey-json.wasm Δ = 6534858.80 ± 5891940.54 (confidence = 95%) main.so is 1.00x to 1.02x faster than inst-set.so! [504126227 528608965.58 576654072] inst-set.so [496952074 522074106.78 604679005] main.soI guess instructions retired isn't a good proxy for wall time in this case because we are switching from pretty much an array-of-scalars representation of cost to an array-of-trees representation, which is going to involve more pointer chasing and memory latency that isn't reflected in instructions retired.
Pretty clear to me that this PR cannot land as-is. Might be worth some profiling to see if there is any low-hanging fruit, but if we can't just engineer our way out of this would-be regression, then it is probably best to stop pursuing the instruction-set cost function any further :-/
fitzgen edited a comment on PR #12230:
Hrm, yeah
cyclesis looking very different from what I previously saw withinsts-retired, unfortunately. Here is the default and spidermonkey suites' results, with statistically insignificant entries removed:compilation :: cycles :: benchmarks/bz2/benchmark.wasm Δ = 38816006.88 ± 4557789.09 (confidence = 95%) main.so is 1.22x to 1.28x faster than inst-set.so! [173565996 191159750.88 226710599] inst-set.so [134897941 152343744.00 181128243] main.so compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 30708277.27 ± 5797934.73 (confidence = 95%) main.so is 1.11x to 1.16x faster than inst-set.so! [227209069 254002445.00 308181456] inst-set.so [194211038 223294167.73 282640832] main.so compilation :: cycles :: benchmarks/spidermonkey/spidermonkey-regex.wasm Δ = 632063507.40 ± 84792443.24 (confidence = 95%) main.so is 1.09x to 1.11x faster than inst-set.so! [6551514230 7032180956.90 7890587513] inst-set.so [6198102237 6400117449.50 7008087519] main.so compilation :: cycles :: benchmarks/spidermonkey/spidermonkey-markdown.wasm Δ = 628333545.93 ± 98363544.32 (confidence = 95%) main.so is 1.08x to 1.11x faster than inst-set.so! [6494087498 7067526581.28 7805742475] inst-set.so [6158498649 6439193035.35 7315715259] main.so compilation :: cycles :: benchmarks/spidermonkey/spidermonkey-json.wasm Δ = 562993166.38 ± 115506969.41 (confidence = 95%) main.so is 1.07x to 1.10x faster than inst-set.so! [6486850237 7033140774.35 8340342367] inst-set.so [6044689306 6470147607.97 7743603904] main.so execution :: cycles :: benchmarks/spidermonkey/spidermonkey-json.wasm Δ = 6534858.80 ± 5891940.54 (confidence = 95%) main.so is 1.00x to 1.02x faster than inst-set.so! [504126227 528608965.58 576654072] inst-set.so [496952074 522074106.78 604679005] main.soI guess instructions retired isn't a good proxy for wall time in this case because we are switching from pretty much an array-of-scalars representation of cost to an array-of-trees representation, which is going to involve more pointer chasing and memory latency that isn't reflected in instructions retired.
Pretty clear to me that this PR cannot land as-is. Might be worth some profiling to see if there is any low-hanging fruit, but if we can't just engineer our way out of this would-be regression, then it is probably best to stop pursuing the instruction-set cost function any further :-/
fitzgen updated PR #12230.
cfallin commented on PR #12230:
Ah, that's too bad! Very worthwhile exploration regardless...
fitzgen commented on PR #12230:
Idea: We could potentially try to do the current scalar cost first, but bail out if we ever saturate to infinity, and only then use the instruction-set cost function. Kind of heavy-handed, but might avoid compile-time regressions.
Last updated: Jan 09 2026 at 13:15 UTC