Stream: git-wasmtime

Topic: wasmtime / issue #4618 Cranelift: `ABISig::from_func_sig`...


view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2022 at 23:12):

github-actions[bot] commented on issue #4618:

Subscribe to Label Action

cc @cfallin, @fitzgen

<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:machinst", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2022 at 23:24):

fitzgen commented on issue #4618:

Returning a Cow doesn't work because the signature gets stored in ABICalle{e,r}Impl and I tried threading a lifetime through those structs but then run into borrow errors where something wants unique access to the thing the signature is inside but the signature is shared borrowed by ABICalle{e,r}Impls.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2022 at 23:24):

fitzgen commented on issue #4618:

That said, I am investigating moving this legalization out of the mach inst code and into the actual legalizer pass we have, which should just modify the signatures in place, instead of doing any clones at all

view this post on Zulip Wasmtime GitHub notifications bot (Aug 04 2022 at 23:26):

cfallin commented on issue #4618:

Alternately: give ABISig an iterator-only interface, and an implementation that sneakily inserts the extra arg in the case where we're mutating now. That's the most efficient I think, at the cost of some refactoring at use-sites...

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 16:53):

fitzgen commented on issue #4618:

I've rebased this on top of https://github.com/bytecodealliance/wasmtime/pull/4621 and rewritten it to do this signature legalization in-place in the legalization pass. This gives a nice phase separation and puts responsibility for legalization in the expected place. It is also an even larger speed up (against the remove-unused-fields PR) than the original PR was (against main):

compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 181420458.93 ± 26515667.61 (confidence = 99%)

  scratch/libwasmtime_bench_api.main.so is 0.93x to 0.95x faster than wasmtime/target/release/libwasmtime_bench_api.so!
  wasmtime/target/release/libwasmtime_bench_api.so is 1.06x to 1.08x faster than scratch/libwasmtime_bench_api.main.so!

  [2819380338 2928780796.54 3289265318] scratch/libwasmtime_bench_api.main.so
  [2417114451 2747360337.61 3380763132] wasmtime/target/release/libwasmtime_bench_api.so

compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 2749394.66 ± 1902797.53 (confidence = 99%)

  scratch/libwasmtime_bench_api.main.so is 0.97x to 0.99x faster than wasmtime/target/release/libwasmtime_bench_api.so!
  wasmtime/target/release/libwasmtime_bench_api.so is 1.01x to 1.04x faster than scratch/libwasmtime_bench_api.main.so!

  [115751117 134432921.19 157572316] scratch/libwasmtime_bench_api.main.so
  [113527530 131683526.53 161295688] wasmtime/target/release/libwasmtime_bench_api.so

compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [76366357 86553210.56 97936025] scratch/libwasmtime_bench_api.main.so
  [76233349 85885027.75 100047281] wasmtime/target/release/libwasmtime_bench_api.so

However, I starting looking into how StructReturn was actually used, and... it isn't? Originally, we used to legalize multi-value returns into StructReturn (I originally wrote that code way back when) but that hasn't happened for a while. So I'm wondering: why do we even have StructReturn anymore? Can we completely remove it? It does seem like cg_clif uses it, but it isn't clear to me that cg_clif can't move to regular multi-value returns and let the backends' normal calling convention code handle this? Or if it needs something custom, then it can legalize calls to use struct return pointers when generating the input CLIF. It really seems like StructReturn is useless since cranelift isn't loading return values out of the struct-return space after the call for embedders. Literally the only special handling of it that Cranelift does is to generate a move from the struct return argument to the struct return result. Doesn't seem like this is providing much value to embedders, and it is a maintenance/performance burden for Cranelift itself...

Or if removing StructReturn is unworkable for some reason, we can probably just make it a CLIF error (and enforce this in the verifier) to have a StructReturn parameter but not a StructReturn result, so that we don't have to do this signature legalization anymore. Would prefer to completely remove StructReturn if possible, though.

cc @bjorn3

view this post on Zulip Wasmtime GitHub notifications bot (Aug 05 2022 at 21:47):

fitzgen commented on issue #4618:

Updating this PR to be about removing ArgumentPurpose::StructReturn.

We used to legalize multi-value returns to using struct-return pointers where
callees would store result values into the given struct-return buffer and
callers would provide the struct-return buffer when calling and then load the
results out of it. We haven't done that for a while and just rely on the calling
convention's normal method of returning multiple values now. The only special
casing that ArgumentPurpose::StructReturn has now is

  1. We legalize signatures that have a StructReturn parameter but no
    StructReturn result to add the missing StructReturn result

  2. We automatically insert a copy from a function's StructReturn argument to
    its StructReturn result value

This isn't really useful for Cranelift embedders anymore since it doesn't even
handle putting the return values into the struct-return buffer or getting them
out again, has maintenance and cruft overhead for Cranelift hackers, and the
above signature legalization in (1) also imposes performance overhead on all
Cranelift compiles regardless of whether they use struct returns or not. It's
time we removed the vestigial ArgumentPurpose::StructReturn.

Finally, here are the Sightglass benchmark wins we get from this removal:

compilation :: nanoseconds :: benchmarks/spidermonkey/benchmark.wasm

  Δ = 214956202.90 ± 31700992.96 (confidence = 99%)

  main.so is 0.91x to 0.94x faster than no-sret.so!
  no-sret.so is 1.07x to 1.09x faster than main.so!

  [2765571620 2866580329.79 3085702646] main.so
  [2396129997 2651624126.89 2923726602] no-sret.so

compilation :: nanoseconds :: benchmarks/pulldown-cmark/benchmark.wasm

  Δ = 4176509.17 ± 2835408.05 (confidence = 99%)

  main.so is 0.95x to 0.99x faster than no-sret.so!
  no-sret.so is 1.01x to 1.05x faster than main.so!

  [115737735 133448206.82 149712338] main.so
  [108735836 129271697.65 166386156] no-sret.so

compilation :: nanoseconds :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [77356671 85735828.56 96331117] main.so
  [75824588 84176414.51 94308652] no-sret.so


Last updated: Nov 22 2024 at 16:03 UTC