Stream: git-wasmtime

Topic: wasmtime / issue #6843 egraphs: Add some `select` optimiz...


view this post on Zulip Wasmtime GitHub notifications bot (Aug 13 2023 at 23:44):

github-actions[bot] commented on issue #6843:

Subscribe to Label Action

cc @cfallin, @fitzgen

<details>
This issue or pull request has been labeled: "cranelift", "isle"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Aug 14 2023 at 18:12):

abrown commented on issue #6843:

I'll take a closer look after you do whatever changes you have planned to fix CI; just out of curiosity, though, what motivated looking more at select? I vaguely recall it being problematic in the past...

view this post on Zulip Wasmtime GitHub notifications bot (Aug 14 2023 at 20:37):

afonso360 commented on issue #6843:

I'll take a closer look after you do whatever changes you have planned to fix CI; just out of curiosity, though, what motivated looking more at select? I vaguely recall it being problematic in the past...

Huh, I don't remember there being issues with select, I think select_spectre_guard is the instruction that shall not be touched, right? Or are you referring to something else?

I'm looking at select because I was looking at some benchmarks for a project that uses cranelift and noticed some CLIF code that was missing some optimization opportunities. These optimizations aren't really what makes the difference, but It's just something that jumped out at me from looking at that code.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 16 2023 at 10:42):

afonso360 commented on issue #6843:

I'm still looking into the CI failure, but I've had to back out one of the optimizations, (bmask (fcmp x)) -> (fcmp x) is wrong since for scalars icmp/fcmp actually return 1 or 0. Only for vectors do they return -1, which is slightly confusing but ¯\\\_(ツ)\_\/¯

I think there is a somewhat similar optimization that we can pursue with (select c 1 0) however we no longer have the convenient intermediary bint to replace bmask. Doing something like (select c 1 0) -> (and (bmask c) 1) would work, but I'm not entirely sure how great that is.

It might be worth thinking about this and submitting a future PR more targeted towards that since this one is getting kinda big.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 17 2023 at 17:50):

jameysharp commented on issue #6843:

This is looking great! I hadn't thought of bswap/bitrev/popcount also being in the "truthiness-preserving" category.

It might help to introduce a "truthy" term that extracts the argument from a "truthiness-preserving" operation, then use that in both the bmask and select cases.

(decl multi truthy ...)
(rule (truthy (sextend _ x)) x)
(rule (truthy (uextend _ x)) x)
(rule (truthy (bmask _ x)) x)
(rule (truthy (ineg _ x)) x)
(rule (truthy (bswap _ x)) x)
(rule (truthy (bitrev _ x)) x)
(rule (truthy (popcnt _ x)) x)

(rule (simplify (bmask ty v)) (if-let x (truthy v)) (bmask ty x))
(rule (simplify (select ty v lhs rhs)) (if-let c (truthy v)) (select ty c lhs rhs))

The (select c 1 0) case could be generalized with another truthy rule, matching whenever the true branch is a non-zero constant and the false branch is a zero constant. That might be more useful than rewriting (select c -1 0) to bmask, even?

Definitely feel free to stop at any point though; you're right that there's quite a bit going on here already.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 17 2023 at 18:17):

jameysharp commented on issue #6843:

Oh, it occurs to me that rotate left/right also preserves truthiness. Now I'm wondering how to systematically find all instructions where the result is zero if and only if the input is zero...

view this post on Zulip Wasmtime GitHub notifications bot (Aug 18 2023 at 11:06):

afonso360 commented on issue #6843:

It might help to introduce a "truthy" term that extracts the argument from a "truthiness-preserving" operation, then use that in both the bmask and select cases.

Yeah, that's a nice simplification of these rules!

Now I'm wondering how to systematically find all instructions where the result is zero if and only if the input is zero...

I always think about Z3 with these things, but I have no idea how to actually do that. And I think we would need a way to rewrite CLIF ops into Z3 expressions, right?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 18 2023 at 17:34):

jameysharp commented on issue #6843:

If I'm not mistaken, the test case that's failing in CI is

(module
  (func (export "f32.no_fold_lt_select") (param $x f32) (param $y f32) (result f32)
    (select (local.get $x) (local.get $y) (f32.lt (local.get $x) (local.get $y)))
))
(assert_return (invoke "f32.no_fold_lt_select" (f32.const 0.0) (f32.const nan)) (f32.const nan))

According to wasmtime compile --emit-clif that compiles to this:

function u0:0(i64 vmctx, i64, f32, f32) -> f32 fast {
    gv0 = vmctx
    gv1 = load.i64 notrap aligned readonly gv0+8
    gv2 = load.i64 notrap aligned gv1
    stack_limit = gv2

                                block0(v0: i64, v1: i64, v2: f32, v3: f32):
@003d                               v5 = fcmp lt v2, v3
@003d                               v6 = uextend.i32 v5
@003e                               v7 = select v6, v2, v3
@003f                               jump block1(v7)

                                block1(v4: f32):
@003f                               return v4
}

With this PR, after the uextend is removed, the select/fcmp pair matches the egraph rules to rewrite to fmin_pseudo.f32, and I guess that produces the wrong result on this input?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 18 2023 at 18:55):

afonso360 commented on issue #6843:

With this PR, after the uextend is removed, the select/fcmp pair matches the egraph rules to rewrite to fmin_pseudo.f32, and I guess that produces the wrong result on this input?

Yeah, that's pretty much it! I was waiting for an OK to make sure this wouldn't affect wasmtime before publishing the PR removing the f{min,max}_pseudo rules, which are wrong. Sorry for the delay.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 18 2023 at 18:55):

afonso360 edited a comment on issue #6843:

With this PR, after the uextend is removed, the select/fcmp pair matches the egraph rules to rewrite to fmin_pseudo.f32, and I guess that produces the wrong result on this input?

Yeah, that's pretty much it! I was waiting for an OK to make sure this wouldn't affect wasmtime before publishing the PR removing the f{min,max}_pseudo rules, which are wrong. Sorry for the delay.

Once this is rebased on top of #6859 it should start passing the tests.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2023 at 19:04):

afonso360 commented on issue #6843:

With https://github.com/bytecodealliance/wasmtime/pull/6859 now merged this should start passing CI.

I've also added a entrypoint to the thruthy rule for (icmp ne v 0), and also added that as a truthiness preserving expression.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2023 at 20:39):

afonso360 commented on issue #6843:

Looks like the bench-api crate is failing to build, but I've ran sightglass manually, and sadly we don't get any improvements.

<details>
<summary>Results</summary>

     Running `target/debug/sightglass-cli benchmark --processes 5 --iterations-per-process 5 --engine ./main.so --engine ./opt-select.so`

execution :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [9966144 10602864.64 13498656] main.so
  [9952064 11139933.44 26247104] opt-select.so

compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [715466496 839635031.68 1025560928] main.so
  [749480032 821139647.16 1066050816] opt-select.so

instantiation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [199200 244094.72 403680] main.so
  [191488 249337.60 390976] opt-select.so

instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [539328 646485.76 1049376] main.so
  [556704 634736.68 798432] opt-select.so

execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [1427180032 1502991738.48 1606635646] main.so
  [1447102577 1527005523.92 1715667119] opt-select.so

instantiation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm

  No difference in performance.

  [271264 329072.64 416160] main.so
  [267264 334138.88 644704] opt-select.so

compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm

  No difference in performance.

  [18601799200 19312205390.44 19938090464] main.so
  [18438769888 19192694238.76 20311795325] opt-select.so

compilation :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [271569902 329694138.04 502450045] main.so
  [257652896 327933358.56 541305504] opt-select.so

execution :: cycles :: benchmarks/bz2/benchmark.wasm

  No difference in performance.

  [123519936 133585408.76 146265844] main.so
  [121208800 132945476.16 176276736] opt-select.so

</details>

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2023 at 20:43):

jameysharp commented on issue #6843:

Oh well, thanks for checking. My approval still stands; feel free to merge!


Last updated: Nov 22 2024 at 16:03 UTC