wasmtime / PR #7859 Cranelift: Use a fixpoint loop to com... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #7859 Cranelift: Use a fixpoint loop to com...

Wasmtime GitHub notifications bot (Feb 02 2024 at 03:40):

fitzgen requested elliottt for a review on PR #7859.

Wasmtime GitHub notifications bot (Feb 02 2024 at 03:40):

fitzgen requested alexcrichton for a review on PR #7859.

Wasmtime GitHub notifications bot (Feb 02 2024 at 03:40):

fitzgen opened PR #7859 from fitzgen:egraph-cost-fix-point to bytecodealliance:main:

Fixes #7857

Wasmtime GitHub notifications bot (Feb 02 2024 at 03:40):

fitzgen requested wasmtime-compiler-reviewers for a review on PR #7859.

Wasmtime GitHub notifications bot (Feb 02 2024 at 04:01):

elliottt submitted PR review:

This looks great!

Wasmtime GitHub notifications bot (Feb 02 2024 at 04:01):

elliottt submitted PR review:

This looks great!

Wasmtime GitHub notifications bot (Feb 02 2024 at 04:01):

elliottt created PR review comment:

What do you think about making this check if the updated best[value] is infinity, rather than if it changed? Given the way that infinity saturates, I think we can assume that a finite result is the best we'll do here.

Wasmtime GitHub notifications bot (Feb 02 2024 at 04:01):

elliottt created PR review comment:

:tada:

Wasmtime GitHub notifications bot (Feb 02 2024 at 04:01):

elliottt created PR review comment:

Really glad you added this one!

Wasmtime GitHub notifications bot (Feb 02 2024 at 04:12):

elliottt submitted PR review.

Wasmtime GitHub notifications bot (Feb 02 2024 at 04:12):

elliottt created PR review comment:

Though the change I suggested hinges on all values definitely becoming non-infinite, so I think looking for the change in best value is better. Never mind!

Wasmtime GitHub notifications bot (Feb 02 2024 at 07:12):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Feb 02 2024 at 07:12):

cfallin created PR review comment:

Sorry for the drive-by from the sidelines, just a possible clarification request here though after thinking about cost updates during a long drive today:

It's not immediately obvious to me why this (once finite, then final) property is the case; I'm curious what reasoning y'all have gone through on this and/or what you've observed? I think a node's cost can continue to decrease as we discover more finite costs (consider a union node: min(20, infinity) == 20 in first pass, min(20, 10) == 10 in second pass; then another node that uses that as an arg). Or is there an argument we can make why this shouldn't happen in practice?

Wasmtime GitHub notifications bot (Feb 02 2024 at 13:20):

elliottt submitted PR review.

Wasmtime GitHub notifications bot (Feb 02 2024 at 13:20):

elliottt created PR review comment:

This is a great point Chris! When @fitzgen and I were discussing the fixpoint change yesterday, we reasoned that it was okay to skip finite values because we were assuming two things:

We would remove the behavior where Cost addition would saturate to MAX_COST, not infinity()

As we can't produce cycles, a fixpoint would cause everything to eventually settle out to finite cost

As you pointed out, the flaw with this reasoning is that the handling of Union values will not behave this way, instead preferring finite values to infinite.

Since addition now saturates to infinity which will ensure that Result nodes don't appear finite until all their dependencies have been processed, what do you think about only computing the min if both arguments to a Union are finite? I think that change would make more concrete our use of the infinity() cost: it's a marker for where all the arguments have not yet been processed.

Wasmtime GitHub notifications bot (Feb 02 2024 at 15:55):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Feb 02 2024 at 15:55):

fitzgen created PR review comment:

In order for a union(a, b) to be finite but not in its final form would require one of a or b to finite and the other infinite, but the only way we can still have an infinite cost for an operand value when computing the cost of the current value is if the operand value's index is larger than the current value's index. That cannot happen for union values, since they are only added to the DFG after their operands.

This is, however, a pretty subtle argument, so I'd be fine skipping this early-continue optimization. I'll land this PR without it, because that is pretty obviously correct, and if we want to experiment with different approaches to optimizing the loop from there, we can open follow up PRs.

Wasmtime GitHub notifications bot (Feb 02 2024 at 16:00):

fitzgen updated PR #7859.

Wasmtime GitHub notifications bot (Feb 02 2024 at 16:01):

fitzgen has enabled auto merge for PR #7859.

Wasmtime GitHub notifications bot (Feb 02 2024 at 16:08):

elliottt submitted PR review.

Wasmtime GitHub notifications bot (Feb 02 2024 at 16:08):

elliottt created PR review comment:

Good point Nick, sorry for muddying the waters there.

Wasmtime GitHub notifications bot (Feb 02 2024 at 16:39):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Feb 02 2024 at 16:39):

fitzgen created PR review comment:

Okay actually I was wrong, thanks Trevor for asking very pointed questions in private chat :-p

The union's operand values are always defined before the union, but if one of those operand values is a funky one where its operands are out of order, then the operand could still be infinite by the time we get to the union, and then the union's min would drop the infinite. That would be a finite cost that is potentially not in its final form, depending on the cost we still need to compute for the still-infinite operand.

So this "optimization" of early-continuing was not correct! Bullet dodged.

This ae-graphs code is all very subtle, and we should spend some time thinking about what we can do to make things more obviously correct, even if it is just adding additional debug asserts and comments. It shouldn't take 3.5 engineers who are all intimately familiar with Cranelift a full day to diagnose and fix this kind of bug and still introduce subtle flaws in the fix.

Wasmtime GitHub notifications bot (Feb 02 2024 at 16:43):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Feb 02 2024 at 16:43):

fitzgen created PR review comment:

And for clarity: since we are doing the "full" fixpoint now, even if we "drop" an operand's infinite cost via min in one iteration of the loop, we will consider that operand's value again on the next iteration of the fix point, and eventually, as the fixpoint is reached, we will have the correct costs for everything.

Wasmtime GitHub notifications bot (Feb 02 2024 at 16:44):

fitzgen commented on PR #7859:

(Re-adding to merge queue after misunderstanding regarding https://github.com/bytecodealliance/wasmtime/pull/7859#discussion_r1476296142)

Wasmtime GitHub notifications bot (Feb 02 2024 at 21:22):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Feb 02 2024 at 21:22):

cfallin created PR review comment:

Thanks @fitzgen and @elliottt (and @alexcrichton) for taking this on and sorry for not realizing this subtle case originally!

A further optimization (which I can take on when I'm back) that occurred to me today: we could track whether we see any "forward references" (perhaps integrate this into the fixpoint loop itself, though it won't change between iterations), and exit the loop after one iteration if none exist. This is the common case, and it would avoid doing a second (no-changes) pass. This extra cost is totally fine for now IMHO (correctness first!).

I agree the code is pretty subtle; to some degree I think that's inherent to the problem, and it's already pretty comment-dense in many (not all!) areas, but I can also try to add some more top-level documentation on invariants and the like when I'm back. I'd like to try to do some more semi-formal proofs too, similar to MachBuffer's comments, to convince us that we don't have any more issues lurking (and to help understanding).

Wasmtime GitHub notifications bot (Feb 05 2024 at 17:03):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Feb 05 2024 at 17:03):

fitzgen created PR review comment:

Agreed, and not trying to point fingers or anything, just trying to improve the situation for everyone. I think something like https://github.com/bytecodealliance/wasmtime/issues/7856 would help a lot too.

Wasmtime GitHub notifications bot (Feb 05 2024 at 17:12):

fitzgen updated PR #7859.

Wasmtime GitHub notifications bot (Feb 05 2024 at 17:13):

fitzgen has enabled auto merge for PR #7859.

Wasmtime GitHub notifications bot (Feb 05 2024 at 17:57):

alexcrichton commented on PR #7859:

Given that the same riscv64 failure happened twice in a row my guess is that it's probably a deterministic failure rather than a spurious failure. That may mean that a preexisting riscv64 lowering rule is buggy and this is starting to expose that. I'll note though that I haven't attempted to reproduce locally yet.

Wasmtime GitHub notifications bot (Feb 05 2024 at 18:12):

alexcrichton commented on PR #7859:

Ah yes I can reproduce locally:

---- wasi_http_hash_all_with_override stdout ----
thread 'wasi_http_hash_all_with_override' panicked at cranelift/codegen/src/egraph/elaborate.rs:296:17:
assertion failed: best[value].0.is_finite()

---- wasi_http_double_echo stdout ----
thread 'wasi_http_double_echo' panicked at cranelift/codegen/src/egraph/elaborate.rs:296:17:
assertion failed: best[value].0.is_finite()

---- wasi_http_hash_all stdout ----
thread 'wasi_http_hash_all' panicked at cranelift/codegen/src/egraph/elaborate.rs:296:17:
assertion failed: best[value].0.is_finite()
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace

---- wasi_http_echo stdout ----
thread 'wasi_http_echo' panicked at cranelift/codegen/src/egraph/elaborate.rs:296:17:
assertion failed: best[value].0.is_finite()

No output on CI due to https://github.com/rayon-rs/rayon/issues/1066 I think, not that it's actually a bug in rayon but an unfortunate consequence.

Wasmtime GitHub notifications bot (Feb 05 2024 at 18:42):

afonso360 commented on PR #7859:

I ran the fuzzgen-icache fuzzer to try and find a small reproducible example for the riscv bug, but it found a similar error for s390x:

test compile
set opt_level=speed
target s390x

function u1:0() -> f32x4 system_v {
    const0 = 0x00000000000000000000000000000000

block0:
    v27 = vconst.f32x4 const0
    v57 = fma v27, v27, v27  ; v27 = const0, v27 = const0, v27 = const0
    v58 = vconst.i32x4 const0
    v60 = vconst.f32x4 const0
    v61 = bitcast.f32x4 v58  ; v58 = const0
    v28 = bitselect v61, v60, v57  ; v60 = const0
    v62 = fma v28, v28, v28
    v63 = fcmp ne v62, v62
    v65 = vconst.f32x4 const0
    v66 = bitcast.f32x4 v63
    v29 = bitselect v66, v65, v62  ; v65 = const0
    v67 = fma v29, v29, v29
    v68 = fcmp ne v67, v67
    v70 = vconst.f32x4 const0
    v71 = bitcast.f32x4 v68
    v30 = bitselect v71, v70, v67  ; v70 = const0
    v72 = fma v30, v30, v30
    v73 = fcmp ne v72, v72
    v75 = vconst.f32x4 const0
    v76 = bitcast.f32x4 v73
    v31 = bitselect v76, v75, v72  ; v75 = const0
    v77 = fma v31, v31, v31
    v78 = fcmp ne v77, v77
    v80 = vconst.f32x4 const0
    v81 = bitcast.f32x4 v78
    v32 = bitselect v81, v80, v77  ; v80 = const0
    v82 = fma v32, v32, v32
    v83 = fcmp ne v82, v82
    v85 = vconst.f32x4 const0
    v86 = bitcast.f32x4 v83
    v33 = bitselect v86, v85, v82  ; v85 = const0
    v87 = fma v33, v33, v33
    v88 = fcmp ne v87, v87
    v90 = vconst.f32x4 const0
    v91 = bitcast.f32x4 v88
    v34 = bitselect v91, v90, v87  ; v90 = const0
    return v34
}

I'm still going to try to find a smaller one before trying to figure out which rule is causing issues

Wasmtime GitHub notifications bot (Feb 05 2024 at 18:58):

fitzgen commented on PR #7859:

Thanks Afonso!

Wasmtime GitHub notifications bot (Feb 05 2024 at 19:47):

afonso360 commented on PR #7859:

Here's another case that it found, this one for AArch64.

<details>
<summary>Testcase</summary>

test compile
set opt_level=speed
target aarch64

function u1:0(f64x2, f64x2) -> f64x2, f64x2 tail {
    sig0 = (f64x2, f64x2) -> f64x2, f64x2 tail
    fn0 = colocated u2:0 sig0

block0(v0: f64x2, v1: f64x2):
    v2 = iconst.i8 0
    v3 = iconst.i16 0
    v4 = iconst.i32 0
    v5 = iconst.i64 0
    v6 = uextend.i128 v5  ; v5 = 0
    v7 = func_addr.i64 fn0
    return_call_indirect sig0, v7(v1, v1)

block1 cold:
    v62 = f64const 0.0
    v63 = splat.f64x2 v62  ; v62 = 0.0
    v9, v10 = call fn0(v63, v63)
    v11, v12 = call fn0(v10, v10)
    v13, v14 = call fn0(v12, v12)
    v15, v16 = call fn0(v14, v14)
    v17, v18 = call fn0(v16, v16)
    v19, v20 = call fn0(v18, v18)
    v21, v22 = call fn0(v20, v20)
    v23, v24 = call fn0(v22, v22)
    v25, v26 = call fn0(v24, v24)
    v27, v28 = call fn0(v26, v26)
    v29, v30 = call fn0(v28, v28)
    v31, v32 = call fn0(v30, v30)
    v33, v34 = call fn0(v32, v32)
    v35, v36 = call fn0(v34, v34)
    v37, v38 = call fn0(v36, v36)
    v39, v40 = call fn0(v38, v38)
    v41, v42 = call fn0(v40, v40)
    v43, v44 = call fn0(v42, v42)
    v45, v46 = call fn0(v44, v44)
    v47, v48 = call fn0(v46, v46)
    v49, v50 = call fn0(v48, v48)
    return v49, v49
}

</details>

This one is interesting to me because almost all of this is dead code, but if we minimize it, it no longer crashes :eyes: . The trace log states the following:

 TRACE cranelift_codegen::context              > About to optimize with egraph phase:
function u1:0(f64x2, f64x2) -> f64x2, f64x2 tail {
    sig0 = (f64x2, f64x2) -> f64x2, f64x2 tail
    fn0 = colocated u2:0 sig0

block0(v0: f64x2, v1: f64x2):
    v7 = func_addr.i64 fn0
    return_call_indirect sig0, v7(v1, v1)
}

So it does optimize away the deadcode internally, but then still tries to elaborate some of the previously eliminated instructions. Which doesn't make sense to me, but I haven't kept up with the inner workings of the egraphs stuff.

I'm not familiar enough with egraphs to be able to debug this, but if you need any help reworking one of the lowering rules let me know!

Wasmtime GitHub notifications bot (Feb 05 2024 at 22:09):

fitzgen updated PR #7859.

Wasmtime GitHub notifications bot (Feb 05 2024 at 22:16):

fitzgen updated PR #7859.

Wasmtime GitHub notifications bot (Feb 05 2024 at 22:18):

jameysharp submitted PR review:

Looks good to me! Just one optional suggestion.

Wasmtime GitHub notifications bot (Feb 05 2024 at 22:18):

jameysharp submitted PR review:

Looks good to me! Just one optional suggestion.

Wasmtime GitHub notifications bot (Feb 05 2024 at 22:18):

jameysharp created PR review comment:

Can I suggest adding the same assertion with the operands swapped? b+a should also be infinity.

Wasmtime GitHub notifications bot (Feb 05 2024 at 22:25):

fitzgen updated PR #7859.

Wasmtime GitHub notifications bot (Feb 05 2024 at 22:25):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Feb 05 2024 at 22:25):

fitzgen created PR review comment:

Good idea! Done.

Wasmtime GitHub notifications bot (Feb 05 2024 at 22:31):

fitzgen has enabled auto merge for PR #7859.

Wasmtime GitHub notifications bot (Feb 05 2024 at 23:22):

fitzgen merged PR #7859.

Last updated: Apr 18 2025 at 10:03 UTC