github-actions[bot] commented on issue #6843:
Subscribe to Label Action
cc @cfallin, @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "isle"Thus the following users have been cc'd because of the following labels:
- cfallin: isle
- fitzgen: isle
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
abrown commented on issue #6843:
I'll take a closer look after you do whatever changes you have planned to fix CI; just out of curiosity, though, what motivated looking more at
select
? I vaguely recall it being problematic in the past...
afonso360 commented on issue #6843:
I'll take a closer look after you do whatever changes you have planned to fix CI; just out of curiosity, though, what motivated looking more at select? I vaguely recall it being problematic in the past...
Huh, I don't remember there being issues with
select
, I thinkselect_spectre_guard
is the instruction that shall not be touched, right? Or are you referring to something else?I'm looking at
select
because I was looking at some benchmarks for a project that uses cranelift and noticed some CLIF code that was missing some optimization opportunities. These optimizations aren't really what makes the difference, but It's just something that jumped out at me from looking at that code.
afonso360 commented on issue #6843:
I'm still looking into the CI failure, but I've had to back out one of the optimizations,
(bmask (fcmp x)) -> (fcmp x)
is wrong since for scalarsicmp/fcmp
actually return 1 or 0. Only for vectors do they return -1, which is slightly confusing but ¯\\\_(ツ)\_\/¯I think there is a somewhat similar optimization that we can pursue with
(select c 1 0)
however we no longer have the convenient intermediarybint
to replacebmask
. Doing something like(select c 1 0) -> (and (bmask c) 1)
would work, but I'm not entirely sure how great that is.It might be worth thinking about this and submitting a future PR more targeted towards that since this one is getting kinda big.
jameysharp commented on issue #6843:
This is looking great! I hadn't thought of bswap/bitrev/popcount also being in the "truthiness-preserving" category.
It might help to introduce a "truthy" term that extracts the argument from a "truthiness-preserving" operation, then use that in both the
bmask
andselect
cases.(decl multi truthy ...) (rule (truthy (sextend _ x)) x) (rule (truthy (uextend _ x)) x) (rule (truthy (bmask _ x)) x) (rule (truthy (ineg _ x)) x) (rule (truthy (bswap _ x)) x) (rule (truthy (bitrev _ x)) x) (rule (truthy (popcnt _ x)) x) (rule (simplify (bmask ty v)) (if-let x (truthy v)) (bmask ty x)) (rule (simplify (select ty v lhs rhs)) (if-let c (truthy v)) (select ty c lhs rhs))
The
(select c 1 0)
case could be generalized with anothertruthy
rule, matching whenever the true branch is a non-zero constant and the false branch is a zero constant. That might be more useful than rewriting(select c -1 0)
to bmask, even?Definitely feel free to stop at any point though; you're right that there's quite a bit going on here already.
jameysharp commented on issue #6843:
Oh, it occurs to me that rotate left/right also preserves truthiness. Now I'm wondering how to systematically find all instructions where the result is zero if and only if the input is zero...
afonso360 commented on issue #6843:
It might help to introduce a "truthy" term that extracts the argument from a "truthiness-preserving" operation, then use that in both the bmask and select cases.
Yeah, that's a nice simplification of these rules!
Now I'm wondering how to systematically find all instructions where the result is zero if and only if the input is zero...
I always think about Z3 with these things, but I have no idea how to actually do that. And I think we would need a way to rewrite CLIF ops into Z3 expressions, right?
jameysharp commented on issue #6843:
If I'm not mistaken, the test case that's failing in CI is
(module (func (export "f32.no_fold_lt_select") (param $x f32) (param $y f32) (result f32) (select (local.get $x) (local.get $y) (f32.lt (local.get $x) (local.get $y))) )) (assert_return (invoke "f32.no_fold_lt_select" (f32.const 0.0) (f32.const nan)) (f32.const nan))
According to
wasmtime compile --emit-clif
that compiles to this:function u0:0(i64 vmctx, i64, f32, f32) -> f32 fast { gv0 = vmctx gv1 = load.i64 notrap aligned readonly gv0+8 gv2 = load.i64 notrap aligned gv1 stack_limit = gv2 block0(v0: i64, v1: i64, v2: f32, v3: f32): @003d v5 = fcmp lt v2, v3 @003d v6 = uextend.i32 v5 @003e v7 = select v6, v2, v3 @003f jump block1(v7) block1(v4: f32): @003f return v4 }
With this PR, after the
uextend
is removed, the select/fcmp pair matches the egraph rules to rewrite tofmin_pseudo.f32
, and I guess that produces the wrong result on this input?
afonso360 commented on issue #6843:
With this PR, after the uextend is removed, the select/fcmp pair matches the egraph rules to rewrite to fmin_pseudo.f32, and I guess that produces the wrong result on this input?
Yeah, that's pretty much it! I was waiting for an OK to make sure this wouldn't affect wasmtime before publishing the PR removing the
f{min,max}_pseudo
rules, which are wrong. Sorry for the delay.
afonso360 edited a comment on issue #6843:
With this PR, after the uextend is removed, the select/fcmp pair matches the egraph rules to rewrite to fmin_pseudo.f32, and I guess that produces the wrong result on this input?
Yeah, that's pretty much it! I was waiting for an OK to make sure this wouldn't affect wasmtime before publishing the PR removing the
f{min,max}_pseudo
rules, which are wrong. Sorry for the delay.Once this is rebased on top of #6859 it should start passing the tests.
afonso360 commented on issue #6843:
With https://github.com/bytecodealliance/wasmtime/pull/6859 now merged this should start passing CI.
I've also added a entrypoint to the
thruthy
rule for(icmp ne v 0)
, and also added that as a truthiness preserving expression.
afonso360 commented on issue #6843:
Looks like the bench-api crate is failing to build, but I've ran sightglass manually, and sadly we don't get any improvements.
<details>
<summary>Results</summary>Running `target/debug/sightglass-cli benchmark --processes 5 --iterations-per-process 5 --engine ./main.so --engine ./opt-select.so` execution :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [9966144 10602864.64 13498656] main.so [9952064 11139933.44 26247104] opt-select.so compilation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [715466496 839635031.68 1025560928] main.so [749480032 821139647.16 1066050816] opt-select.so instantiation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [199200 244094.72 403680] main.so [191488 249337.60 390976] opt-select.so instantiation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [539328 646485.76 1049376] main.so [556704 634736.68 798432] opt-select.so execution :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [1427180032 1502991738.48 1606635646] main.so [1447102577 1527005523.92 1715667119] opt-select.so instantiation :: cycles :: benchmarks/pulldown-cmark/benchmark.wasm No difference in performance. [271264 329072.64 416160] main.so [267264 334138.88 644704] opt-select.so compilation :: cycles :: benchmarks/spidermonkey/benchmark.wasm No difference in performance. [18601799200 19312205390.44 19938090464] main.so [18438769888 19192694238.76 20311795325] opt-select.so compilation :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [271569902 329694138.04 502450045] main.so [257652896 327933358.56 541305504] opt-select.so execution :: cycles :: benchmarks/bz2/benchmark.wasm No difference in performance. [123519936 133585408.76 146265844] main.so [121208800 132945476.16 176276736] opt-select.so
</details>
jameysharp commented on issue #6843:
Oh well, thanks for checking. My approval still stands; feel free to merge!
Last updated: Dec 23 2024 at 12:05 UTC