github-actions[bot] commented on issue #6851:
Subscribe to Label Action
cc @cfallin, @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "isle"Thus the following users have been cc'd because of the following labels:
- cfallin: isle
- fitzgen: isle
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
gurry commented on issue #6851:
@afonso360 The following two wasmtime spec tests are failing for the PR:
wast::Cranelift::spec::simd_splat wast::Cranelift::spec::simd_splat_pooling
They are failing for the following operators:
sadd_sat
,usub_sat
&sshr
. If I remove splat simplify opts for these operators the tests pass.Can you please help me find the root cause of this failure? Do I need to tweak the opts for the above mentioned operators somehow to make the tests pass?
afonso360 commented on issue #6851:
This ended up being quite a long reply, sorry! I just wanted to note down the steps that I use to debug these sort of stuff, it might be useful in the future! (There's a TLDR at the bottom)
So, running the tests individually shows us what went wrong. These test suites contain a bunch of individual tests so there is more than one failure. Here's what I got:
running 1 test test wast::Cranelift::spec::simd_splat_pooling ... FAILED failures: ---- wast::Cranelift::spec::simd_splat_pooling stdout ---- thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', cranelift/codegen/src/isa/x64/lower/isle.rs:550:21 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace thread '<unnamed>' panicked at 'should be implemented in ISLE: inst = `v10 = sadd_sat.i8 v5, v7`, type = `Some(types::I8)`', cranelift/codegen/src/machinst/lower.rs:749:21 thread '<unnamed>' panicked at 'should be implemented in ISLE: inst = `v11 = sadd_sat.i16 v5, v7`, type = `Some(types::I16)`', cranelift/codegen/src/machinst/lower.rs:749:21 thread '<unnamed>' panicked at 'should be implemented in ISLE: inst = `v10 = usub_sat.i8 v5, v7`, type = `Some(types::I8)`', cranelift/codegen/src/machinst/lower.rs:749:21 thread '<unnamed>' panicked at 'should be implemented in ISLE: inst = `v11 = usub_sat.i16 v5, v7`, type = `Some(types::I16)`', cranelift/codegen/src/machinst/lower.rs:749:21 thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', cranelift/codegen/src/isa/x64/lower/isle.rs:550:21 thread '<unnamed>' panicked at 'called `Option::unwrap()` on a `None` value', cranelift/codegen/src/isa/x64/lower/isle.rs:550:21
Starting with the easy ones:
sadd_sat.{i8,i16}
andusub_sat.{i8,i16}
are not implemented in ISLE. That means that we haven't yet implemented a machine code lowering for them in the x64 backend (ISLE is just the framework that we use for this!). But more importantly when reading the docs it looks like they aren't supported for scalars!This sometimes happens, we allow an operation for vectors but not for scalars. Usually because they don't make sense for scalars, I think in this case it's just that no one has felt the need to implement it yet.
I think we should be able to remove the optimizations for both of these opcodes, and the rest of the
*_sat
ones as well since they look like they only support vectors.
Now onto the hard one. I ran the testsuite individually with
RUST_LOG=trace cargo run -- wast --disable-cache tests/spec_testsuite/simd_splat.wast
and pulled one of the offending functions out of the log. Essentially I just looked for any function withsshr
since I didn't expect there to be duplicates in these files.I got this function out (I've slightly cleaned it up):
function u0:22(i64 vmctx, i64, i32, i32) -> i8x16 fast { block0(v0: i64, v1: i64, v2: i32, v3: i32): v5 = ireduce.i8 v2 v6 = splat.i8x16 v5 v7 = sshr v6, v3 return v7 }
It's really neat that it matches our optimization pretty much exactly! After optimizations it looks like this:
block0(v0: i64, v1: i64, v2: i32, v3: i32): v5 = ireduce.i8 v2 v8 = sshr v5, v3 v9 = splat.i8x16 v8 return v9
So, now that we have a small reproducible example of what is going wrong, lets debug it. My first step was compiling both the pre-optimization function and the post optimization function. And something immediately went wrong!
The optimized version compiles cleanly! Which is really weird, since I got that out of the optimizer from the first version. I haven't seen this before but I do have an idea of what might be going wrong.
We don't print the type of the
sshr
in the CLIF code. What I think might be happening is that we are saying to CLIF: Build me ansshr
that takes ani8
andi32
and produces ani8x16
and things blow up later in the pipeline.Changing the optimization source to explicitly request the
lane_type
fixes it! We are now requesting an i8 out of thesshr
(rule (simplify (sshr ty (splat ty x) y)) (splat ty (sshr (lane_type ty) x y)))
This is a really weird issue, and it probably should have been caught earlier in the pipeline instead of letting it propagate all the way to the backend which really just confuses things.
This relied a lot on intuition and also knowledge of how cranelift works, so don't worry if it's hard to figure it out! We really should have had better error messages to help people out on these issues.
TLDR:
* We should remove the{u,s}{add,sub}_sat
optimizations since these opcodes only exist for vectors but not for scalars.
* We should pass in thelane_type
into thesshr
operation, otherwise the compiler gets really confused. (this also applies toushr
/ishl
/rotr
/rotl
)
gurry commented on issue #6851:
Thanks @afonso360 for the detailed explanation and showing how to debug such failures in the future :+1:.
I've made the suggested changes now. Let's hope the checks pass.
gurry commented on issue #6851:
Checks passed. Please review.
jameysharp commented on issue #6851:
I assume you'll merge this, @afonso360? This PR looks great to me, thank you @gurry!
I feel like all these bugs should have been caught by the CLIF verifier, which I guess means that the verifier did not run after the egraph pass. Would that have helped more quickly pin down the causes of these bugs? Actually I'm confused about when the verifier runs and whether we already have a way to turn it on. Maybe we should enable it for all filetests?
jameysharp commented on issue #6851:
And the very next thing in my queue of GitHub notifications was #6855 which says exactly that, thank you @afonso360 :laughing:
Last updated: Nov 22 2024 at 17:03 UTC