wasmtime / PR #8588 riscv64: Add `fmsub`/`fnmsub`/`fnmadd... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #8588 riscv64: Add `fmsub`/`fnmsub`/`fnmadd...

Wasmtime GitHub notifications bot (May 09 2024 at 10:12):

afonso360 opened PR #8588 from afonso360:riscv-fma-v2 to bytecodealliance:main:

:wave: Hey,

This PR adds lowering rules for the fmsub/fnmsub/fnmadd instructions.

These instructions were already implemented in the backend, but had no lowering rules associated with them, so they were never emitted.

Wasmtime GitHub notifications bot (May 09 2024 at 10:12):

afonso360 requested abrown for a review on PR #8588.

Wasmtime GitHub notifications bot (May 09 2024 at 10:12):

afonso360 requested wasmtime-compiler-reviewers for a review on PR #8588.

Wasmtime GitHub notifications bot (May 09 2024 at 15:32):

alexcrichton submitted PR review:

Nice!

Wasmtime GitHub notifications bot (May 09 2024 at 15:55):

alexcrichton merged PR #8588.

Wasmtime GitHub notifications bot (May 10 2024 at 03:03):

jameysharp commented on PR #8588:

This is neat!

Out of curiosity, can't you use these same fnmsub/fnmadd instructions if y is negated instead of x? Similarly if both x and y are negated, I think you can use fmadd/fmsub, depending on whether z is negated or not. And of course you could also match on (fneg (fma ...)), leading to 16 possible rules in total.

That said, I'm a little worried about whether fusing negation into an fma can change its results. I'm hoping the answer is "no", but do either of you happen to know for sure and can you explain it to me? I'm reasonably confident that negating the result is equivalent to negating both terms and both cases are safe to fuse without loss of precision. I'm less confident that, for example, fmadd x y (fneg z) will always give the same results as fmsub x y z if the magnitude of z is very different than x*y.

Anyway, I thought way too much about this and I think you can fold those 16 cases into four rules based on how many of x, y, z, or the result are wrapped in fneg:

Even number of fnegs in (result, x, y): use fm*

Odd number of fnegs in (result, x, y): use fnm*

Even number of fnegs in (x, y, z): use *add

Odd number of fnegs in (x, y, z): use *sub

With the right helpers you can write the four rules all at the same priority. This is a little awkward and would be easier if ISLE had or-patterns or match expressions or something, but I think the following should work and generate about as good of a pattern-matching tree as we're capable of right now.
(type IsFneg (enum (Result (negate u64) (value Value))))

(decl is_fneg (Value) IsFneg)
(rule 1 (is_fneg (fneg x)) (IsFneg.Result 1 x))
(rule 0 (is_fneg x) (IsFneg.Result 0 x))

(rule (lower (has_type ty (fma x y z)))
  (if-let (IsFneg.Result neg_x x) (is_fneg x))
  (if-let (IsFneg.Result neg_y y) (is_fneg y))
  (if-let (IsFneg.Result neg_z z) (is_fneg z))
  (rv_fma ty (u64_xor neg_x neg_y) neg_z x y z))

; this rule will need a rule priority to not overlap with other top-level fneg rules
(rule (lower (has_type ty (fneg (fma x y z))))
  (if-let (IsFneg.Result neg_x x) (is_fneg x))
  (if-let (IsFneg.Result neg_y y) (is_fneg y))
  (if-let (IsFneg.Result neg_z z) (is_fneg z))
  (rv_fma ty (u64_xor 1 (u64_xor neg_x neg_y)) (u64_xor 1 neg_z) x y z))

; parity arguments indicate whether to negate the x*y term or the z term, respectively
(decl rv_fma (Type u64 u64 Value Value Value) InstOutput)
(rule 0 (rv_fma (ty_scalar_float ty) 0 0 x y z) (rv_fmadd ty (FRM.RME) x y z))
(rule 0 (rv_fma (ty_scalar_float ty) 0 1 x y z) (rv_fmsub ty (FRM.RME) x y z))
(rule 0 (rv_fma (ty_scalar_float ty) 1 0 x y z) (rv_fnmsub ty (FRM.RME) x y z))
(rule 0 (rv_fma (ty_scalar_float ty) 1 1 x y z) (rv_fnmadd ty (FRM.RME) x y z))
(rule 1 (rv_fma (ty_vec_fits_in_register ty) 0 0 x y z) (rv_vfmacc_vv z y x (unmasked) ty))
(rule 1 (rv_fma (ty_vec_fits_in_register ty) 0 1 x y z) (rv_vfmsac_vv z y x (unmasked) ty))
(rule 1 (rv_fma (ty_vec_fits_in_register ty) 1 0 x y z) (rv_vfnmsac_vv z y x (unmasked) ty))
(rule 1 (rv_fma (ty_vec_fits_in_register ty) 1 1 x y z) (rv_vfnmacc_vv z y x (unmasked) ty))
(rule 2 (rv_fma (ty_vec_fits_in_register ty) 0 0 (splat x) y z) (rv_vfmacc_vf z y x (unmasked) ty))
(rule 2 (rv_fma (ty_vec_fits_in_register ty) 0 1 (splat x) y z) (rv_vfmsac_vf z y x (unmasked) ty))
(rule 2 (rv_fma (ty_vec_fits_in_register ty) 1 0 (splat x) y z) (rv_vfnmsac_vf z y x (unmasked) ty))
(rule 2 (rv_fma (ty_vec_fits_in_register ty) 1 1 (splat x) y z) (rv_vfnmacc_vf z y x (unmasked) ty))
Also I think you might be able to handle (splat y) as well because I guess floating-point multiplication is commutative assuming NaN canonicalization. So you can safely swap x and y, if I'm not mistaken…

Wasmtime GitHub notifications bot (May 10 2024 at 09:56):

afonso360 commented on PR #8588:

I'm not entirely sure, for these 4 instructions I pretty much copied the LLVM lowerings for them. I also briefly looked at our x64 backend and noted that it had some similar rules for fnmadd. I can't really explain these transformations in great detail, FP Math makes my head hurt :confounded: .

I'm less confident that, for example, fmadd x y (fneg z) will always give the same results as fmsub x y z if the magnitude of z is very different than x*y.

That case isn't represented in our backend, and I'm not entirely sure how to prove it. My z3 skills are non existent, but I probably can implement it for the x64 backend and fuzz it for a few days.

(fneg (fma ...)) I think is a no-go since LLVM has this rule, but only with the nsz (No Signed Zero) flag, which means it probably swaps -0.0 with 0.0 or vice-versa.

Adding the fneg y variations seems like it would work, at least we have those rules in the x64 backend. And de-duplicating those rules with the vector rules seems really neat, since we have pretty much the same instructions for both.

Wasmtime GitHub notifications bot (May 10 2024 at 09:59):

afonso360 edited a comment on PR #8588:

I'm not entirely sure, for these 4 instructions I pretty much copied the LLVM lowerings for them. I also briefly looked at our x64 backend and noted that it had some similar rules for fnmadd. I can't really explain these transformations in great detail, FP Math makes my head hurt :confounded: .

I'm less confident that, for example, fmadd x y (fneg z) will always give the same results as fmsub x y z if the magnitude of z is very different than x*y.

That case isn't represented in our backend, and I'm not entirely sure how to prove it. My z3 skills are non existent, but I probably can implement it for the x64 backend and fuzz it for a few days.

Edit: I was searching for related fma transformations and found this alive2 testcase that seems like a good starting point to try to play around and prove these transformations. I've never used alive, but I'll give it a go this afternoon, seems like it should be fairly easy.

(fneg (fma ...)) I think is a no-go since LLVM has this rule, but only with the nsz (No Signed Zero) flag, which means it probably swaps -0.0 with 0.0 or vice-versa.

Adding the fneg y variations seems like it would work, at least we have those rules in the x64 backend. And de-duplicating those rules with the vector rules seems really neat, since we have pretty much the same instructions for both.

Wasmtime GitHub notifications bot (May 10 2024 at 10:01):

afonso360 edited a comment on PR #8588:

I'm not entirely sure, for these 4 instructions I pretty much copied the LLVM lowerings for them. I also briefly looked at our x64 backend and noted that it had some similar rules for fnmadd. I can't really explain these transformations in great detail, FP Math makes my head hurt :confounded: .

I'm less confident that, for example, fmadd x y (fneg z) will always give the same results as fmsub x y z if the magnitude of z is very different than x*y.

That case isn't represented in our backend, and I'm not entirely sure how to prove it. My z3 skills are non existent, but I probably can implement it for the x64 backend and fuzz it for a few days.

Edit: I was searching for related fma transformations and found this alive2 testcase that seems like a good starting point to try to play around and prove these transformations. I've never used alive, but I'll give it a go this afternoon, seems like it should be fairly easy.

(fneg (fma ...)) I think is a no-go since LLVM has this rule, but only with the nsz (No Signed Zero) flag, which means it probably swaps -0.0 with 0.0 or vice-versa.

Adding the fneg y variations seems like it would work, at least we have those rules in the x64 backend. And de-duplicating those rules with the vector rules seems really neat, since we have pretty much the same instructions for both. I'll try and merge those rules together like you suggested.

Wasmtime GitHub notifications bot (May 10 2024 at 10:08):

afonso360 edited a comment on PR #8588:

I'm not entirely sure, for these 4 instructions I pretty much copied the LLVM lowerings for them. I also briefly looked at our x64 backend and noted that it had some similar rules for fnmadd. I can't really explain these transformations in great detail, FP Math makes my head hurt :confounded: .

I'm less confident that, for example, fmadd x y (fneg z) will always give the same results as fmsub x y z if the magnitude of z is very different than x*y.

That case isn't represented in our X64 backend, and I'm not entirely sure how to prove it. My z3 skills are non existent, but I probably can implement it for the x64 backend and fuzz it for a few days.

Edit: I was searching for related fma transformations and found this alive2 testcase that seems like a good starting point to try to play around and prove these transformations. I've never used alive, but I'll give it a go this afternoon, seems like it should be fairly easy.

(fneg (fma ...)) I think is a no-go since LLVM has this rule, but only with the nsz (No Signed Zero) flag, which means it probably swaps -0.0 with 0.0 or vice-versa.

Adding the fneg y variations seems like it would work, at least we have those rules in the x64 backend. And de-duplicating those rules with the vector rules seems really neat, since we have pretty much the same instructions for both. I'll try and merge those rules together like you suggested.

Wasmtime GitHub notifications bot (May 10 2024 at 10:31):

afonso360 edited a comment on PR #8588:

I'm not entirely sure, for these 4 instructions I pretty much copied the LLVM lowerings for them. I also briefly looked at our x64 backend and noted that it had some similar rules for fnmadd. I can't really explain these transformations in great detail, FP Math makes my head hurt :confounded: .

I'm less confident that, for example, fmadd x y (fneg z) will always give the same results as fmsub x y z if the magnitude of z is very different than x*y.

That case isn't represented in our X64 backend, and I'm not entirely sure how to prove it. My z3 skills are non existent, but I probably can implement it for the x64 backend and fuzz it for a few days.

I also don't think magnitudes should matter here, fneg is a bitwise operation, it shouldn't do any early rounding that would be affected by magnitude.

Edit: I was searching for related fma transformations and found this alive2 testcase that seems like a good starting point to try to play around and prove these transformations. I've never used alive, but I'll give it a go this afternoon, seems like it should be fairly easy.

(fneg (fma ...)) I think is a no-go since LLVM has this rule, but only with the nsz (No Signed Zero) flag, which means it probably swaps -0.0 with 0.0 or vice-versa.

Adding the fneg y variations seems like it would work, at least we have those rules in the x64 backend. And de-duplicating those rules with the vector rules seems really neat, since we have pretty much the same instructions for both. I'll try and merge those rules together like you suggested.

Wasmtime GitHub notifications bot (May 11 2024 at 07:51):

afonso360 commented on PR #8588:

So, I played around with alive for a bit. Really cool tool! Unfortunately the compiler explorer for alive2 has a short timeout so I had to run these locally.

Out of curiosity, can't you use these same fnmsub/fnmadd instructions if y is negated instead of x?

Yes you can! Here's the proof that I built for this and it does validate.
<details>
<summary>Result</summary>
afonso@DESKTOP-1AHKMV2:~/git/alive2/build$ ./alive-tv --disable-undef-input --smt-to=1800000  ./tgt.ll

----------------------------------------
define half @src(half %x, half %y, half %z) {
#0:
  %negx = fneg half %x
  %fma = fma half %negx, half %y, half %z
  ret half %fma
}
=>
define half @tgt(half %x, half %y, half %z) {
#0:
  %negy = fneg half %y
  %fma = fma half %x, half %negy, half %z
  ret half %fma
}
Transformation seems to be correct!
</details>

I'm reasonably confident that negating the result is equivalent to negating both terms and both cases are safe to fuse without loss of precision.

This one doesn't seem to pan out, it seems to have the issue with the signed zero. Proof

<details>
<summary>Result</summary>
afonso@DESKTOP-1AHKMV2:~/git/alive2/proofs$ ../build/alive-tv --disable-undef-input --smt-to=1800000 ./neg_fma_eq_fma_negxz.ll

----------------------------------------
define half @src(half %x, half %y, half %z) {
#0:
  %fma = fma half %x, half %y, half %z
  %n = fneg half %fma
  ret half %n
}
=>
define half @tgt(half %x, half %y, half %z) {
#0:
  %negy = fneg half %y
  %negz = fneg half %z
  %fma = fma half %x, half %negy, half %negz
  ret half %fma
}
Transformation doesn't verify!

ERROR: Value mismatch

Example:
half %x = #x0000 (+0.0)
half %y = #x8100 (-0.000015258789?)
half %z = #x0000 (+0.0)

Source:
half %fma = #x0000 (+0.0)
half %n = #x8000 (-0.0)

Target:
half %negy = #x0100 (0.000015258789?)
half %negz = #x8000 (-0.0)
half %fma = #x0000 (+0.0)
Source value: #x8000 (-0.0)
Target value: #x0000 (+0.0)
</details>

I'm less confident that, for example, fmadd x y (fneg z) will always give the same results as fmsub x y z if the magnitude of z is very different than x*y

I can't really test this using alive, since it doesn't have a fmsub intrinsic. But I did something similar using fmul+f{add,sub}, which has 2 rounding steps instead of one, which is what I think really matters. In this case the proof does check out and you can remove an intermediary fneg. Proof

<details>
<summary>Result</summary>
afonso@DESKTOP-1AHKMV2:~/git/alive2/proofs$ ../build/alive-tv --disable-undef-input --smt-to=1800000 ./fma_split_neg.ll

----------------------------------------
define half @src(half %x, half %y, half %z) {
#0:
  %mul = fmul half %x, %y
  %negz = fneg half %z
  %res = fadd half %mul, %negz
  ret half %res
}
=>
define half @tgt(half %x, half %y, half %z) {
#0:
  %mul = fmul half %x, %y
  %res = fsub half %mul, %z
  ret half %res
}
Transformation seems to be correct!
</details>

We should probably add a mid end rule for this case!

Similarly if both x and y are negated, I think you can use fmadd/fmsub, depending on whether z is negated or not.

I'm not too worried about this case because it's already covered by a mid end rule. That being said, it looks like It comes pretty much for free with the solution that you proposed above, so might as well have it.

I'm going to write up a PR using that. Thanks for looking at this in so much detail!

Wasmtime GitHub notifications bot (May 11 2024 at 07:52):

afonso360 edited a comment on PR #8588:

So, I played around with alive for a bit. Really cool tool! Unfortunately the compiler explorer for alive2 has a short timeout so I had to run these locally.

Out of curiosity, can't you use these same fnmsub/fnmadd instructions if y is negated instead of x?

Yes you can! Here's the proof that I built for this and it does validate.
<details>
<summary>Result</summary>
afonso@DESKTOP-1AHKMV2:~/git/alive2/build$ ./alive-tv --disable-undef-input --smt-to=1800000  ./tgt.ll

----------------------------------------
define half @src(half %x, half %y, half %z) {
#0:
  %negx = fneg half %x
  %fma = fma half %negx, half %y, half %z
  ret half %fma
}
=>
define half @tgt(half %x, half %y, half %z) {
#0:
  %negy = fneg half %y
  %fma = fma half %x, half %negy, half %z
  ret half %fma
}
Transformation seems to be correct!
</details>

I'm reasonably confident that negating the result is equivalent to negating both terms and both cases are safe to fuse without loss of precision.

This one doesn't seem to pan out, it seems to have the issue with the signed zero. Proof

<details>
<summary>Result</summary>
afonso@DESKTOP-1AHKMV2:~/git/alive2/proofs$ ../build/alive-tv --disable-undef-input --smt-to=1800000 ./neg_fma_eq_fma_negxz.ll

----------------------------------------
define half @src(half %x, half %y, half %z) {
#0:
  %fma = fma half %x, half %y, half %z
  %n = fneg half %fma
  ret half %n
}
=>
define half @tgt(half %x, half %y, half %z) {
#0:
  %negy = fneg half %y
  %negz = fneg half %z
  %fma = fma half %x, half %negy, half %negz
  ret half %fma
}
Transformation doesn't verify!

ERROR: Value mismatch

Example:
half %x = #x0000 (+0.0)
half %y = #x8100 (-0.000015258789?)
half %z = #x0000 (+0.0)

Source:
half %fma = #x0000 (+0.0)
half %n = #x8000 (-0.0)

Target:
half %negy = #x0100 (0.000015258789?)
half %negz = #x8000 (-0.0)
half %fma = #x0000 (+0.0)
Source value: #x8000 (-0.0)
Target value: #x0000 (+0.0)
</details>

I'm less confident that, for example, fmadd x y (fneg z) will always give the same results as fmsub x y z if the magnitude of z is very different than x*y

I can't really test this using alive, since it doesn't have a fmsub intrinsic. But I did something similar using fmul+f{add,sub}, which has 2 rounding steps instead of one, which is what I think really matters here. In this case the proof does check out and you can remove an intermediary fneg. Proof

<details>
<summary>Result</summary>
afonso@DESKTOP-1AHKMV2:~/git/alive2/proofs$ ../build/alive-tv --disable-undef-input --smt-to=1800000 ./fma_split_neg.ll

----------------------------------------
define half @src(half %x, half %y, half %z) {
#0:
  %mul = fmul half %x, %y
  %negz = fneg half %z
  %res = fadd half %mul, %negz
  ret half %res
}
=>
define half @tgt(half %x, half %y, half %z) {
#0:
  %mul = fmul half %x, %y
  %res = fsub half %mul, %z
  ret half %res
}
Transformation seems to be correct!
</details>

We should probably add a mid end rule for this case!

Similarly if both x and y are negated, I think you can use fmadd/fmsub, depending on whether z is negated or not.

I'm not too worried about this case because it's already covered by a mid end rule. That being said, it looks like It comes pretty much for free with the solution that you proposed above, so might as well have it.

I'm going to write up a PR using that. Thanks for looking at this in so much detail!

Last updated: Apr 20 2025 at 01:31 UTC