fitzgen requested elliottt for a review on PR #7859.
fitzgen requested alexcrichton for a review on PR #7859.
fitzgen opened PR #7859 from fitzgen:egraph-cost-fix-point
to bytecodealliance:main
:
Fixes #7857
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
fitzgen requested wasmtime-compiler-reviewers for a review on PR #7859.
elliottt submitted PR review:
This looks great!
elliottt submitted PR review:
This looks great!
elliottt created PR review comment:
What do you think about making this check if the updated
best[value]
is infinity, rather than if it changed? Given the way that infinity saturates, I think we can assume that a finite result is the best we'll do here.
elliottt created PR review comment:
:tada:
elliottt created PR review comment:
Really glad you added this one!
elliottt submitted PR review.
elliottt created PR review comment:
Though the change I suggested hinges on all values definitely becoming non-infinite, so I think looking for the change in best value is better. Never mind!
cfallin submitted PR review.
cfallin created PR review comment:
Sorry for the drive-by from the sidelines, just a possible clarification request here though after thinking about cost updates during a long drive today:
It's not immediately obvious to me why this (once finite, then final) property is the case; I'm curious what reasoning y'all have gone through on this and/or what you've observed? I think a node's cost can continue to decrease as we discover more finite costs (consider a union node:
min(20, infinity) == 20
in first pass,min(20, 10) == 10
in second pass; then another node that uses that as an arg). Or is there an argument we can make why this shouldn't happen in practice?
elliottt submitted PR review.
elliottt created PR review comment:
This is a great point Chris! When @fitzgen and I were discussing the fixpoint change yesterday, we reasoned that it was okay to skip finite values because we were assuming two things:
- We would remove the behavior where
Cost
addition would saturate toMAX_COST
, notinfinity()
- As we can't produce cycles, a fixpoint would cause everything to eventually settle out to finite cost
As you pointed out, the flaw with this reasoning is that the handling of
Union
values will not behave this way, instead preferring finite values to infinite.Since addition now saturates to infinity which will ensure that
Result
nodes don't appear finite until all their dependencies have been processed, what do you think about only computing the min if both arguments to aUnion
are finite? I think that change would make more concrete our use of theinfinity()
cost: it's a marker for where all the arguments have not yet been processed.
fitzgen submitted PR review.
fitzgen created PR review comment:
In order for a
union(a, b)
to be finite but not in its final form would require one ofa
orb
to finite and the other infinite, but the only way we can still have an infinite cost for an operand value when computing the cost of the current value is if the operand value's index is larger than the current value's index. That cannot happen for union values, since they are only added to the DFG after their operands.This is, however, a pretty subtle argument, so I'd be fine skipping this early-
continue
optimization. I'll land this PR without it, because that is pretty obviously correct, and if we want to experiment with different approaches to optimizing the loop from there, we can open follow up PRs.
fitzgen updated PR #7859.
fitzgen has enabled auto merge for PR #7859.
elliottt submitted PR review.
elliottt created PR review comment:
Good point Nick, sorry for muddying the waters there.
fitzgen submitted PR review.
fitzgen created PR review comment:
Okay actually I was wrong, thanks Trevor for asking very pointed questions in private chat :-p
The union's operand values are always defined before the union, but if one of those operand values is a funky one where its operands are out of order, then the operand could still be infinite by the time we get to the union, and then the union's
min
would drop the infinite. That would be a finite cost that is potentially not in its final form, depending on the cost we still need to compute for the still-infinite operand.So this "optimization" of early-continuing was not correct! Bullet dodged.
This ae-graphs code is all very subtle, and we should spend some time thinking about what we can do to make things more obviously correct, even if it is just adding additional debug asserts and comments. It shouldn't take 3.5 engineers who are all intimately familiar with Cranelift a full day to diagnose and fix this kind of bug and still introduce subtle flaws in the fix.
fitzgen submitted PR review.
fitzgen created PR review comment:
And for clarity: since we are doing the "full" fixpoint now, even if we "drop" an operand's infinite cost via
min
in one iteration of the loop, we will consider that operand's value again on the next iteration of the fix point, and eventually, as the fixpoint is reached, we will have the correct costs for everything.
fitzgen commented on PR #7859:
(Re-adding to merge queue after misunderstanding regarding https://github.com/bytecodealliance/wasmtime/pull/7859#discussion_r1476296142)
cfallin submitted PR review.
cfallin created PR review comment:
Thanks @fitzgen and @elliottt (and @alexcrichton) for taking this on and sorry for not realizing this subtle case originally!
A further optimization (which I can take on when I'm back) that occurred to me today: we could track whether we see any "forward references" (perhaps integrate this into the fixpoint loop itself, though it won't change between iterations), and exit the loop after one iteration if none exist. This is the common case, and it would avoid doing a second (no-changes) pass. This extra cost is totally fine for now IMHO (correctness first!).
I agree the code is pretty subtle; to some degree I think that's inherent to the problem, and it's already pretty comment-dense in many (not all!) areas, but I can also try to add some more top-level documentation on invariants and the like when I'm back. I'd like to try to do some more semi-formal proofs too, similar to MachBuffer's comments, to convince us that we don't have any more issues lurking (and to help understanding).
fitzgen submitted PR review.
fitzgen created PR review comment:
Agreed, and not trying to point fingers or anything, just trying to improve the situation for everyone. I think something like https://github.com/bytecodealliance/wasmtime/issues/7856 would help a lot too.
fitzgen updated PR #7859.
fitzgen has enabled auto merge for PR #7859.
alexcrichton commented on PR #7859:
Given that the same riscv64 failure happened twice in a row my guess is that it's probably a deterministic failure rather than a spurious failure. That may mean that a preexisting riscv64 lowering rule is buggy and this is starting to expose that. I'll note though that I haven't attempted to reproduce locally yet.
alexcrichton commented on PR #7859:
Ah yes I can reproduce locally:
---- wasi_http_hash_all_with_override stdout ---- thread 'wasi_http_hash_all_with_override' panicked at cranelift/codegen/src/egraph/elaborate.rs:296:17: assertion failed: best[value].0.is_finite() ---- wasi_http_double_echo stdout ---- thread 'wasi_http_double_echo' panicked at cranelift/codegen/src/egraph/elaborate.rs:296:17: assertion failed: best[value].0.is_finite() ---- wasi_http_hash_all stdout ---- thread 'wasi_http_hash_all' panicked at cranelift/codegen/src/egraph/elaborate.rs:296:17: assertion failed: best[value].0.is_finite() note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ---- wasi_http_echo stdout ---- thread 'wasi_http_echo' panicked at cranelift/codegen/src/egraph/elaborate.rs:296:17: assertion failed: best[value].0.is_finite()
No output on CI due to https://github.com/rayon-rs/rayon/issues/1066 I think, not that it's actually a bug in rayon but an unfortunate consequence.
afonso360 commented on PR #7859:
I ran the
fuzzgen-icache
fuzzer to try and find a small reproducible example for the riscv bug, but it found a similar error for s390x:test compile set opt_level=speed target s390x function u1:0() -> f32x4 system_v { const0 = 0x00000000000000000000000000000000 block0: v27 = vconst.f32x4 const0 v57 = fma v27, v27, v27 ; v27 = const0, v27 = const0, v27 = const0 v58 = vconst.i32x4 const0 v60 = vconst.f32x4 const0 v61 = bitcast.f32x4 v58 ; v58 = const0 v28 = bitselect v61, v60, v57 ; v60 = const0 v62 = fma v28, v28, v28 v63 = fcmp ne v62, v62 v65 = vconst.f32x4 const0 v66 = bitcast.f32x4 v63 v29 = bitselect v66, v65, v62 ; v65 = const0 v67 = fma v29, v29, v29 v68 = fcmp ne v67, v67 v70 = vconst.f32x4 const0 v71 = bitcast.f32x4 v68 v30 = bitselect v71, v70, v67 ; v70 = const0 v72 = fma v30, v30, v30 v73 = fcmp ne v72, v72 v75 = vconst.f32x4 const0 v76 = bitcast.f32x4 v73 v31 = bitselect v76, v75, v72 ; v75 = const0 v77 = fma v31, v31, v31 v78 = fcmp ne v77, v77 v80 = vconst.f32x4 const0 v81 = bitcast.f32x4 v78 v32 = bitselect v81, v80, v77 ; v80 = const0 v82 = fma v32, v32, v32 v83 = fcmp ne v82, v82 v85 = vconst.f32x4 const0 v86 = bitcast.f32x4 v83 v33 = bitselect v86, v85, v82 ; v85 = const0 v87 = fma v33, v33, v33 v88 = fcmp ne v87, v87 v90 = vconst.f32x4 const0 v91 = bitcast.f32x4 v88 v34 = bitselect v91, v90, v87 ; v90 = const0 return v34 }
I'm still going to try to find a smaller one before trying to figure out which rule is causing issues
fitzgen commented on PR #7859:
Thanks Afonso!
afonso360 commented on PR #7859:
Here's another case that it found, this one for AArch64.
<details>
<summary>Testcase</summary>test compile set opt_level=speed target aarch64 function u1:0(f64x2, f64x2) -> f64x2, f64x2 tail { sig0 = (f64x2, f64x2) -> f64x2, f64x2 tail fn0 = colocated u2:0 sig0 block0(v0: f64x2, v1: f64x2): v2 = iconst.i8 0 v3 = iconst.i16 0 v4 = iconst.i32 0 v5 = iconst.i64 0 v6 = uextend.i128 v5 ; v5 = 0 v7 = func_addr.i64 fn0 return_call_indirect sig0, v7(v1, v1) block1 cold: v62 = f64const 0.0 v63 = splat.f64x2 v62 ; v62 = 0.0 v9, v10 = call fn0(v63, v63) v11, v12 = call fn0(v10, v10) v13, v14 = call fn0(v12, v12) v15, v16 = call fn0(v14, v14) v17, v18 = call fn0(v16, v16) v19, v20 = call fn0(v18, v18) v21, v22 = call fn0(v20, v20) v23, v24 = call fn0(v22, v22) v25, v26 = call fn0(v24, v24) v27, v28 = call fn0(v26, v26) v29, v30 = call fn0(v28, v28) v31, v32 = call fn0(v30, v30) v33, v34 = call fn0(v32, v32) v35, v36 = call fn0(v34, v34) v37, v38 = call fn0(v36, v36) v39, v40 = call fn0(v38, v38) v41, v42 = call fn0(v40, v40) v43, v44 = call fn0(v42, v42) v45, v46 = call fn0(v44, v44) v47, v48 = call fn0(v46, v46) v49, v50 = call fn0(v48, v48) return v49, v49 }
</details>
This one is interesting to me because almost all of this is dead code, but if we minimize it, it no longer crashes :eyes: . The trace log states the following:
TRACE cranelift_codegen::context > About to optimize with egraph phase: function u1:0(f64x2, f64x2) -> f64x2, f64x2 tail { sig0 = (f64x2, f64x2) -> f64x2, f64x2 tail fn0 = colocated u2:0 sig0 block0(v0: f64x2, v1: f64x2): v7 = func_addr.i64 fn0 return_call_indirect sig0, v7(v1, v1) }
So it does optimize away the deadcode internally, but then still tries to elaborate some of the previously eliminated instructions. Which doesn't make sense to me, but I haven't kept up with the inner workings of the egraphs stuff.
I'm not familiar enough with egraphs to be able to debug this, but if you need any help reworking one of the lowering rules let me know!
fitzgen updated PR #7859.
fitzgen updated PR #7859.
jameysharp submitted PR review:
Looks good to me! Just one optional suggestion.
jameysharp submitted PR review:
Looks good to me! Just one optional suggestion.
jameysharp created PR review comment:
Can I suggest adding the same assertion with the operands swapped?
b+a
should also be infinity.
fitzgen updated PR #7859.
fitzgen submitted PR review.
fitzgen created PR review comment:
Good idea! Done.
fitzgen has enabled auto merge for PR #7859.
fitzgen merged PR #7859.
Last updated: Nov 22 2024 at 16:03 UTC