fitzgen edited PR #4618 from dont-canonicalize-multiple-times
to main
:
It was previously canonicalizing the struct-return pointer internally, but some callers already had a canonicalized signature that they were giving it. This resulted in some unnecessary no-op copies of the signature, and therefore unnecessary heap allocations. Now, instead, we make it an invariant that callers always pass a canonicalized signature.
Gives a ~5% speedup to compilation on the Sightglass's Spidermonkey benchmark, and no difference in performance on the others:
compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 120607087.58 ± 33517739.42 (confidence = 99%) main.so is 0.95x to 0.97x faster than feature.so! feature.so is 1.03x to 1.06x faster than main.so! [2768415639 2868809793.49 3149141782] main.so [2479287976 2748202705.91 3044819409] feature.so compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm No difference in performance. [76865944 84866240.58 95562639] main.so [75639139 84154259.01 96727948] feature.so compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [113503142 133806222.94 150176738] main.so [115003031 133491736.64 164840112] feature.so
<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
fitzgen edited PR #4618 from dont-canonicalize-multiple-times
to main
:
We used to legalize multi-value returns to using struct-return pointers where
callees would store result values into the given struct-return buffer and
callers would provide the struct-return buffer when calling and then load the
results out of it. We haven't done that for a while and just rely on the calling
convention's normal method of returning multiple values now. The only special
casing thatArgumentPurpose::StructReturn
has now is
We legalize signatures that have a
StructReturn
parameter but no
StructReturn
result to add the missingStructReturn
resultWe automatically insert a copy from a function's
StructReturn
argument to
itsStructReturn
result valueThis isn't really useful for Cranelift embedders anymore since it doesn't even
handle putting the return values into the struct-return buffer or getting them
out again, has maintenance and cruft overhead for Cranelift hackers, and the
above signature legalization in (1) also imposes performance overhead on all
Cranelift compiles regardless of whether they use struct returns or not. It's
time we removed the vestigialArgumentPurpose::StructReturn
.Finally, here are the Sightglass benchmark wins we get from this removal:
compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 214956202.90 ± 31700992.96 (confidence = 99%) main.so is 0.91x to 0.94x faster than no-sret.so! no-sret.so is 1.07x to 1.09x faster than main.so! [2765571620 2866580329.79 3085702646] main.so [2396129997 2651624126.89 2923726602] no-sret.so compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 4176509.17 ± 2835408.05 (confidence = 99%) main.so is 0.95x to 0.99x faster than no-sret.so! no-sret.so is 1.01x to 1.05x faster than main.so! [115737735 133448206.82 149712338] main.so [108735836 129271697.65 166386156] no-sret.so compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm No difference in performance. [77356671 85735828.56 96331117] main.so [75824588 84176414.51 94308652] no-sret.so
fitzgen updated PR #4618 from dont-canonicalize-multiple-times
to main
.
fitzgen closed without merge PR #4618.
Last updated: Dec 23 2024 at 12:05 UTC