github-actions[bot] commented on issue #4618:
Subscribe to Label Action
cc @cfallin, @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:machinst", "cranelift:area:x64", "isle"Thus the following users have been cc'd because of the following labels:
- cfallin: isle
- fitzgen: isle
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
fitzgen commented on issue #4618:
Returning a
Cow
doesn't work because the signature gets stored inABICalle{e,r}Impl
and I tried threading a lifetime through those structs but then run into borrow errors where something wants unique access to the thing the signature is inside but the signature is shared borrowed byABICalle{e,r}Impl
s.
fitzgen commented on issue #4618:
That said, I am investigating moving this legalization out of the mach inst code and into the actual legalizer pass we have, which should just modify the signatures in place, instead of doing any
clone
s at all
cfallin commented on issue #4618:
Alternately: give
ABISig
an iterator-only interface, and an implementation that sneakily inserts the extra arg in the case where we're mutating now. That's the most efficient I think, at the cost of some refactoring at use-sites...
fitzgen commented on issue #4618:
I've rebased this on top of https://github.com/bytecodealliance/wasmtime/pull/4621 and rewritten it to do this signature legalization in-place in the legalization pass. This gives a nice phase separation and puts responsibility for legalization in the expected place. It is also an even larger speed up (against the remove-unused-fields PR) than the original PR was (against main):
compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 181420458.93 ± 26515667.61 (confidence = 99%) scratch/libwasmtime_bench_api.main.so is 0.93x to 0.95x faster than wasmtime/target/release/libwasmtime_bench_api.so! wasmtime/target/release/libwasmtime_bench_api.so is 1.06x to 1.08x faster than scratch/libwasmtime_bench_api.main.so! [2819380338 2928780796.54 3289265318] scratch/libwasmtime_bench_api.main.so [2417114451 2747360337.61 3380763132] wasmtime/target/release/libwasmtime_bench_api.so compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 2749394.66 ± 1902797.53 (confidence = 99%) scratch/libwasmtime_bench_api.main.so is 0.97x to 0.99x faster than wasmtime/target/release/libwasmtime_bench_api.so! wasmtime/target/release/libwasmtime_bench_api.so is 1.01x to 1.04x faster than scratch/libwasmtime_bench_api.main.so! [115751117 134432921.19 157572316] scratch/libwasmtime_bench_api.main.so [113527530 131683526.53 161295688] wasmtime/target/release/libwasmtime_bench_api.so compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm No difference in performance. [76366357 86553210.56 97936025] scratch/libwasmtime_bench_api.main.so [76233349 85885027.75 100047281] wasmtime/target/release/libwasmtime_bench_api.so
However, I starting looking into how
StructReturn
was actually used, and... it isn't? Originally, we used to legalize multi-value returns intoStructReturn
(I originally wrote that code way back when) but that hasn't happened for a while. So I'm wondering: why do we even haveStructReturn
anymore? Can we completely remove it? It does seem likecg_clif
uses it, but it isn't clear to me thatcg_clif
can't move to regular multi-value returns and let the backends' normal calling convention code handle this? Or if it needs something custom, then it can legalize calls to use struct return pointers when generating the input CLIF. It really seems likeStructReturn
is useless since cranelift isn't loading return values out of the struct-return space after the call for embedders. Literally the only special handling of it that Cranelift does is to generate a move from the struct return argument to the struct return result. Doesn't seem like this is providing much value to embedders, and it is a maintenance/performance burden for Cranelift itself...Or if removing
StructReturn
is unworkable for some reason, we can probably just make it a CLIF error (and enforce this in the verifier) to have aStructReturn
parameter but not aStructReturn
result, so that we don't have to do this signature legalization anymore. Would prefer to completely removeStructReturn
if possible, though.cc @bjorn3
fitzgen commented on issue #4618:
Updating this PR to be about removing
ArgumentPurpose::StructReturn
.We used to legalize multi-value returns to using struct-return pointers where
callees would store result values into the given struct-return buffer and
callers would provide the struct-return buffer when calling and then load the
results out of it. We haven't done that for a while and just rely on the calling
convention's normal method of returning multiple values now. The only special
casing thatArgumentPurpose::StructReturn
has now is
We legalize signatures that have a
StructReturn
parameter but no
StructReturn
result to add the missingStructReturn
resultWe automatically insert a copy from a function's
StructReturn
argument to
itsStructReturn
result valueThis isn't really useful for Cranelift embedders anymore since it doesn't even
handle putting the return values into the struct-return buffer or getting them
out again, has maintenance and cruft overhead for Cranelift hackers, and the
above signature legalization in (1) also imposes performance overhead on all
Cranelift compiles regardless of whether they use struct returns or not. It's
time we removed the vestigialArgumentPurpose::StructReturn
.Finally, here are the Sightglass benchmark wins we get from this removal:
compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm Δ = 214956202.90 ± 31700992.96 (confidence = 99%) main.so is 0.91x to 0.94x faster than no-sret.so! no-sret.so is 1.07x to 1.09x faster than main.so! [2765571620 2866580329.79 3085702646] main.so [2396129997 2651624126.89 2923726602] no-sret.so compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 4176509.17 ± 2835408.05 (confidence = 99%) main.so is 0.95x to 0.99x faster than no-sret.so! no-sret.so is 1.01x to 1.05x faster than main.so! [115737735 133448206.82 149712338] main.so [108735836 129271697.65 166386156] no-sret.so compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm No difference in performance. [77356671 85735828.56 96331117] main.so [75824588 84176414.51 94308652] no-sret.so
Last updated: Dec 23 2024 at 12:05 UTC