fitzgen requested alexcrichton for a review on PR #10643.
fitzgen opened PR #10643 from fitzgen:speed-up-call-benches to bytecodealliance:main:
This provides an improvement across the board for our
sync/no-hookbenchmarks:<details>
<summary>Benchmark Results</summary>
$ cargo bench --profile profiling --bench call '\bsync/no-hook' -- --baseline main Finished `profiling` profile [optimized + debuginfo] target(s) in 0.28s Running benches/call.rs (target/profiling/deps/call-b0a2bedd3336ad76) sync/no-hook/core - host-to-wasm - typed - nop time: [27.334 ns 27.499 ns 27.668 ns] change: [-16.388% -14.870% -13.479%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe sync/no-hook/core - host-to-wasm - untyped - nop time: [44.141 ns 44.429 ns 44.757 ns] change: [-18.380% -17.041% -15.670%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 1 (1.00%) high mild 2 (2.00%) high severe sync/no-hook/core - host-to-wasm - unchecked - nop time: [29.731 ns 29.983 ns 30.262 ns] change: [-25.104% -22.176% -19.159%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe sync/no-hook/core - host-to-wasm - typed - nop-params-and-results time: [28.990 ns 29.143 ns 29.303 ns] change: [-25.804% -24.562% -23.372%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 3 (3.00%) high mild 3 (3.00%) high severe sync/no-hook/core - host-to-wasm - untyped - nop-params-and-results time: [110.00 ns 110.65 ns 111.46 ns] change: [-11.967% -9.0070% -6.1347%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 2 (2.00%) high mild 5 (5.00%) high severe sync/no-hook/core - host-to-wasm - unchecked - nop-params-and-results time: [58.828 ns 59.089 ns 59.418 ns] change: [-15.596% -13.573% -11.484%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high severe sync/no-hook/core - wasm-to-host - typed - nop time: [6.6209 ns 6.6615 ns 6.7077 ns] change: [-53.555% -52.878% -52.116%] (p = 0.00 < 0.05) Performance has improved. Found 6 outliers among 100 measurements (6.00%) 5 (5.00%) high mild 1 (1.00%) high severe sync/no-hook/core - wasm-to-host - typed - nop-params-and-results time: [7.9783 ns 8.0173 ns 8.0611 ns] change: [-54.341% -53.947% -53.505%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 3 (3.00%) high severe sync/no-hook/core - wasm-to-host - untyped - nop time: [18.306 ns 18.393 ns 18.491 ns] change: [-29.104% -28.127% -27.171%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 3 (3.00%) high mild 4 (4.00%) high severe sync/no-hook/core - wasm-to-host - untyped - nop-params-and-results time: [67.741 ns 68.120 ns 68.601 ns] change: [-26.453% -25.061% -23.663%] (p = 0.00 < 0.05) Performance has improved. Found 12 outliers among 100 measurements (12.00%) 6 (6.00%) high mild 6 (6.00%) high severe sync/no-hook/core - wasm-to-host - unchecked - nop time: [6.8379 ns 6.8915 ns 6.9566 ns] change: [-55.623% -55.062% -54.481%] (p = 0.00 < 0.05) Performance has improved. Found 7 outliers among 100 measurements (7.00%) 5 (5.00%) high mild 2 (2.00%) high severe sync/no-hook/core - wasm-to-host - unchecked - nop-params-and-results time: [27.856 ns 28.024 ns 28.214 ns] change: [-17.320% -16.103% -15.038%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 6 (6.00%) high mild 4 (4.00%) high severe sync/no-hook/component - host-to-wasm - typed - nop time: [55.126 ns 55.506 ns 55.932 ns] change: [-19.458% -18.098% -16.736%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 2 (2.00%) high mild 6 (6.00%) high severe sync/no-hook/component - host-to-wasm - untyped - nop time: [101.42 ns 102.06 ns 102.82 ns] change: [-15.679% -14.108% -12.523%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 7 (7.00%) high mild 2 (2.00%) high severe sync/no-hook/component - host-to-wasm - typed - nop-params-and-results time: [61.482 ns 62.017 ns 62.591 ns] change: [-16.576% -15.100% -13.595%] (p = 0.00 < 0.05) Performance has improved. Found 10 outliers among 100 measurements (10.00%) 9 (9.00%) high mild 1 (1.00%) high severe sync/no-hook/component - host-to-wasm - untyped - nop-params-and-results time: [223.50 ns 224.72 ns 226.05 ns] change: [-21.732% -20.178% -18.679%] (p = 0.00 < 0.05) Performance has improved. Found 4 outliers among 100 measurements (4.00%) 1 (1.00%) high mild 3 (3.00%) high severe sync/no-hook/component - wasm-to-host - typed - nop time: [39.115 ns 39.295 ns 39.500 ns] change: [-15.139% -13.886% -12.721%] (p = 0.00 < 0.05) Performance has improved. Found 8 outliers among 100 measurements (8.00%) 1 (1.00%) low mild 2 (2.00%) high mild 5 (5.00%) high severe sync/no-hook/component - wasm-to-host - typed - nop-params-and-results time: [47.234 ns 47.458 ns 47.745 ns] change: [-13.833% -11.951% -9.8784%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 3 (3.00%) high mild 6 (6.00%) high severe sync/no-hook/component - wasm-to-host - untyped - nop time: [52.311 ns 52.556 ns 52.817 ns] change: [-12.736% -11.712% -10.693%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe sync/no-hook/component - wasm-to-host - untyped - nop-params-and-results time: [239.71 ns 241.59 ns 244.11 ns] change: [-29.804% -28.173% -26.415%] (p = 0.00 < 0.05) Performance has improved. Found 9 outliers among 100 measurements (9.00%) 4 (4.00%) high mild 5 (5.00%) high severe</details>
Depends on https://github.com/bytecodealliance/wasmtime/pull/10626
fitzgen requested wasmtime-core-reviewers for a review on PR #10643.
fitzgen requested wasmtime-default-reviewers for a review on PR #10643.
alexcrichton submitted PR review:
I'm a bit surprised about the
#[inline]on dynamically typed things since there are so many other inefficiences related to that, but otherwise at a high level looks good
alexcrichton created PR review comment:
This seems related to the dynamic calls case which we don't tend to try to put too much effort into optimizing?
alexcrichton created PR review comment:
This is a pretty big function, does inlining this really help that much?
alexcrichton created PR review comment:
Mind leaving a coment on this as to why?
alexcrichton created PR review comment:
Since this is dealing with a dynamic
Valdoes#[inline]really help much here?
fitzgen commented on PR #10643:
I'm a bit surprised about the
#[inline]on dynamically typed things since there are so many other inefficiences related to that, but otherwise at a high level looks goodI didn't measure every
#[inline]independently, I just looked at which functions were at the top of theperf reportand marked them#[inline](and some of their callees) if it seemed like it made sense, and then I measured the benchmarks with all the changes together. So it is possible that some of these are not strictly necessary, but I think that is fine, since testing all combinations of#[inline]s here would take forever.
fitzgen submitted PR review.
fitzgen created PR review comment:
This one was indeed showing up in profiles.
fitzgen submitted PR review.
fitzgen created PR review comment:
This one was also showing up in profiles.
fitzgen submitted PR review.
fitzgen created PR review comment:
This one was just an intuition that we didn't want to inline anything related to the interpreter. But it also wasn't getting inlined in the first place, afaik, so I'll just remove it.
fitzgen updated PR #10643.
fitzgen submitted PR review.
fitzgen created PR review comment:
It was hot in a benchmark that was regressing compared to
main; easy to fix; don't see why we wouldn't do this.
fitzgen updated PR #10643.
fitzgen requested alexcrichton for a review on PR #10643.
alexcrichton submitted PR review:
I know that I'm far more inline-averse than most others, but at the same time I don't think that I'm un-founded when I push back against
#[inline]either. For example I don't doubt that dynamicVal-style call paths can benefit from some inlining, but that's also not the purpose of those code paths since they're explicitly de-optimized in many other ways as well. Personally I'd prefer to avoid manual#[inline]annotations there and let the compiler figure things out. If the compiler isn't the best at figuring things out, which it isn't for these benchmarks, then personally I'd prefer to just shrug and move on rather than add annotations to optimize an explicitly non-optimal path.Another reason I'm personally pretty inline-averse is that it's such a brittle metric. For example
TypedFunc::call_rawis a pretty big function here to be inlining. I don't doubt that it's in the profile but I'm also unsure whether inlining shows benefit, e.g. if it basically just shifted all its cost to the caller without further benefitting from inlining. This sort of code golfing is IMO brittle in the long-term where just because it improves the micro-benchmark now doesn't mean it'll continue to improve further into the future.Overall I'm personally fine with
#[inline]on "obvious small functions" like one-liners or similar. Bigger functions I'm less sure about and honestly IMO are basically just not worth annotation. In theory we should be measuring the attribute on big functions to see if it's worth it and most of the time I suspect that the effort to measure far outweighs the effort gained from a possible inlining.At the end of the day I would relatively strongly prefer that
#[inline]were removed from anything related toVal. I would then slightly prefer to remove#[inline]on anything bigger than one line basically. For this latter point I'll leave it up to you though since I realize the cost is not really worth debating over as well. Although that's also not quite true since if we#[inline]everything it's obviously quite bad for compile times and binary sizes, so there's some line somewhere I just tend to be far more conservative than others I think.
fitzgen updated PR #10643.
fitzgen has enabled auto merge for PR #10643.
fitzgen commented on PR #10643:
I've removed
#[inline]from the larger methods.
fitzgen merged PR #10643.
Last updated: Dec 06 2025 at 06:05 UTC