wasmtime / PR #8954 Cranelift: Constant propagate floats · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #8954 Cranelift: Constant propagate floats

Wasmtime GitHub notifications bot (Jul 13 2024 at 21:37):

primoly edited PR #8954.

Wasmtime GitHub notifications bot (Jul 13 2024 at 22:44):

github-actions[bot] commented on PR #8954:

Subscribe to Label Action

cc @cfallin, @fitzgen

<details>
This issue or pull request has been labeled: "cranelift", "isle"

Thus the following users have been cc'd because of the following labels:

cfallin: isle

fitzgen: isle

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Jul 14 2024 at 07:27):

primoly edited PR #8954:

Adds constant propagation of the following instructions for $F32 and $F64:
fadd, fsub, fmul, fdiv, sqrt, ceil, floor, trunc, nearest.

fmin and fmax are still missing. Those are tricky due to their handling and propagation of NaNs: cranelift_codegen::ir::InstBuilder::fmin

Wasmtime GitHub notifications bot (Jul 15 2024 at 10:53):

primoly updated PR #8954.

Wasmtime GitHub notifications bot (Jul 15 2024 at 19:30):

fitzgen submitted PR review:

Thanks! Looks good to me, modulo one clarification below.

Wasmtime GitHub notifications bot (Jul 15 2024 at 19:30):

fitzgen submitted PR review:

Thanks! Looks good to me, modulo one clarification below.

Wasmtime GitHub notifications bot (Jul 15 2024 at 19:30):

fitzgen created PR review comment:

fma, and any other relaxed SIMD opcode that is non-deterministic, is a little bit tricky. We would want to double check the required Wasm semantics, and whether the non-determinism is allowed to "change" throughout the program or not. If it is spec'd to be non-deterministic, but always has the same behavior for a given evaluation of the N non-deterministic choices, then const folding could be problematic if we are compiling on an aarch64 machine but running on an x64 machine, for example. It would essentially be observable to the program whether the constant folding was happening or not, and I'm not sure if that is allowed by the spec.

So we should answer this question, and then document the answer here.

Wasmtime GitHub notifications bot (Jul 15 2024 at 19:45):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Jul 15 2024 at 19:45):

bjorn3 created PR review comment:

If it is spec'd to be non-deterministic, but always has the same behavior for a given evaluation of the N non-deterministic choices, then const folding could be problematic if we are compiling on an aarch64 machine but running on an x64 machine, for example.

IMHO it would be problematic either way. It would break reproducibility of the compiled executable/.cwasm file across compiler host architectures.

Wasmtime GitHub notifications bot (Jul 15 2024 at 19:54):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Jul 15 2024 at 19:54):

fitzgen created PR review comment:

FWIW, I tried to answer this question for myself and didn't come away with anything firm.

Filed https://github.com/WebAssembly/relaxed-simd/issues/155 to ask for clarification.

I think this TODO comment can just add a note about the potential gotcha with non-deterministic instructions and link to that issue.

Wasmtime GitHub notifications bot (Jul 15 2024 at 19:54):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Jul 15 2024 at 19:54):

afonso360 created PR review comment:

I think we may already be able to observe const propagation with this PR already. At least on RISC-V any NaN producing operation is guaranteed to produce the canonical NaN, but I think some other platforms preserve the payload bits.

Wasmtime GitHub notifications bot (Jul 15 2024 at 20:02):

fitzgen created PR review comment:

At least on RISC-V any NaN producing operation is guaranteed to produce the canonical NaN, but I think some other platforms preserve the payload bits.

NaNs don't have any guarantee around having the same behavior across instructions, AFAIK. But the relaxed SIMD instructions might, or at least I'm not sure, which is why I filed that issue.

It would break reproducibility of the compiled executable/.cwasm file across compiler host architectures.

I guess this is a philosophical choice. It isn't clear to me whether we want to provide the weaker (deterministic cross compilation given the same host architecture) or the stronger (deterministic cross compilation regardless of host architecture) guarantee.

If the latter, then we can't do compile-time evaluation of any floating point operation that could involve NaNs.

Wasmtime GitHub notifications bot (Jul 15 2024 at 20:02):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Jul 15 2024 at 20:11):

primoly submitted PR review.

Wasmtime GitHub notifications bot (Jul 15 2024 at 20:11):

primoly created PR review comment:

I read the docs as fma in Cranelift being deterministic with regards to rounding. However I don’t know if this is actually honoured by codegen, since properly emulating fma’s rounding might be too expensive. If it turns out that fma is (either set or list) non-deterministic (ignoring propagation of NaNs), then the documentation in InstructionBuilder has to be updated.

Wasmtime GitHub notifications bot (Jul 15 2024 at 20:17):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Jul 15 2024 at 20:17):

bjorn3 created PR review comment:

If the latter, then we can't do compile-time evaluation of any floating point operation that could involve NaNs.

You could still attempt evaluation and bail out optimizing if the output turns out to be NaN.

Wasmtime GitHub notifications bot (Jul 15 2024 at 20:21):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Jul 15 2024 at 20:21):

fitzgen created PR review comment:

Yeah that's true. We could make all these external constructors partial and have them return None if the operation is given or produces a NaN.

Wasmtime GitHub notifications bot (Jul 17 2024 at 09:17):

primoly commented on PR #8954:

Reading the WebAssembly spec with regards to NaNs[1][2] here are some remarks and questions.

When all the NaNs in the input are canonical (NaNcanon), then the output is canonical as well.[1] So there is no non-determinism in this case.[3] If at least one of the inputs is non-canonical, the output is non-deterministic, but must be an arithmetic NaN (which of course can be a NaNcanon).[2] I don’t know if this is actually handled correctly by Cranelift, because always propagating NaNs is incorrect, since non-arithmetic NaNs must be turned into arithmetic ones. Unless it is guaranteed that Ieee32 and Ieee64 NaNs are always arithmetic, but I don’t think this is the case. (opened https://github.com/bytecodealliance/wasmtime/issues/8967)

In Cranelift the functions used by the constant propagation of float in this PR (as well as https://github.com/bytecodealliance/wasmtime/pull/8625) convert the Ieee32/Ieee64 to Rust f32/f64, perform the operation and then convert the result back to Ieee32/Ieee64 by reinterpreting their bits. So the question: What happens to different kinds of NaNs during both calculation and conversions? Do they propagate, are they canonicalised or is this non-deterministic?

To avoid const prop leading to different results than before @bjorn3 mentioned that well could just ignore const prop for all instructions involving NaNs. Actually you could still include operations that involve only NaNcanon, since there propagation and canonicalisation would be the same. Except there still remains the problem with the sign.[3]

Personally, I would prefer to do the propagation in all cases, since programs should never assume platform specific behaviour (list non-determinism) for instructions that are defined to be set non-deterministic. I except that constant NaNs (and especially non NaNcanon) will almost never occur in practice, so my reason is just the avoidance of to much special casing.

[1] https://webassembly.github.io/spec/core/exec/numerics.html#nan-propagation

[2] https://webassembly.github.io/spec/core/syntax/values.html#syntax-float

[3] Not quite: The sign is not part of the NaNcanon, so there are effectively two NaNcanon: +NaNcanon and −NaNcanon. The sign of the output is non-deterministic for all operations except fneg, fabs and fcopysign.

Wasmtime GitHub notifications bot (Jul 17 2024 at 09:20):

primoly edited a comment on PR #8954:

Reading the WebAssembly spec with regards to NaNs[1][2] here are some remarks and questions.

When all the NaNs in the input are canonical (NaNcanon), then the output is canonical as well.[1] So there is no non-determinism in this case.[3] If at least one of the inputs is non-canonical, the output is non-deterministic, but must be an arithmetic NaN (which of course can be a NaNcanon).[2] I don’t know if this is actually handled correctly by Cranelift, because always propagating NaNs is incorrect, since non-arithmetic NaNs must be turned into arithmetic ones. Unless it is guaranteed that Ieee32 and Ieee64 NaNs are always arithmetic, but I don’t think this is the case. (opened https://github.com/bytecodealliance/wasmtime/issues/8967)

In Cranelift the functions used by the constant propagation of float in this PR (as well as https://github.com/bytecodealliance/wasmtime/pull/8625) convert the Ieee32/Ieee64 to Rust f32/f64, perform the operation and then convert the result back to Ieee32/Ieee64 by reinterpreting their bits. So the question: What happens to different kinds of NaNs during both calculation and conversions? Do they propagate, are they canonicalised or is this non-deterministic?

To avoid const prop leading to different results than before @bjorn3 mentioned that well could just ignore const prop for all instructions involving NaNs. Actually you could still include operations that involve only NaNcanon, since there propagation and canonicalisation would be the same. Except there still remains the problem with the sign.[3]

Personally, I would prefer to do the constant propagation in all cases (and just pick one spec compliant way of dealing with NaNs), since programs should never assume platform specific behaviour (list non-determinism) for instructions that are defined to be set non-deterministic. I except that constant NaNs (and especially non NaNcanon) will almost never occur in practice, so my reason is just the avoidance of to much special casing.

[1] https://webassembly.github.io/spec/core/exec/numerics.html#nan-propagation

[2] https://webassembly.github.io/spec/core/syntax/values.html#syntax-float

[3] Not quite: The sign is not part of the NaNcanon, so there are effectively two NaNcanon: +NaNcanon and −NaNcanon. The sign of the output is non-deterministic for all operations except fneg, fabs and fcopysign.

Wasmtime GitHub notifications bot (Jul 17 2024 at 16:24):

fitzgen commented on PR #8954:

I would be more comfortable with aborting the compile-time evaluation if either given or producing a NaN. It is more conservative, and we can always have follow up discussions/PRs focused on just that issue. I also suspect there isn't too much additional performance to gain w.r.t NaNs.

FWIW, it seems like compile-time evaluating relaxed SIMD instructions like fma is probably a no-go: https://github.com/WebAssembly/relaxed-simd/issues/155#issuecomment-2232721287

Wasmtime GitHub notifications bot (Jul 17 2024 at 22:50):

primoly updated PR #8954.

Wasmtime GitHub notifications bot (Jul 18 2024 at 18:11):

primoly updated PR #8954.

Wasmtime GitHub notifications bot (Jul 18 2024 at 18:17):

primoly commented on PR #8954:

I’ve updated the PR to always check whether the result is NaN and don’t do constant folding in that case. I also changed neg abs and copysign to not convert to Rust f32/f64 but operate on the bits directly, since otherwise NaNs could have their payload and even sign changed. I also added fmin and fmax folding.

Regarding fma: I still believe that it is a deterministic instruction when not involving NaNs, but since this PR doesn’t include it, this is something we can discuss in the future.

Wasmtime GitHub notifications bot (Jul 18 2024 at 18:24):

fitzgen commented on PR #8954:

Regarding fma: I still believe that it is a deterministic instruction when not involving NaNs, but since this PR doesn’t include it, this is something we can discuss in the future.

Yes, my b I was thinking of the relaxed_madd Wasm instruction.

Wasmtime GitHub notifications bot (Jul 18 2024 at 19:15):

fitzgen submitted PR review:

Looks good modulo a rebase to address conflicts with main and a couple tiny nitpicks. Thanks!!

Wasmtime GitHub notifications bot (Jul 18 2024 at 19:15):

fitzgen submitted PR review:

Looks good modulo a rebase to address conflicts with main and a couple tiny nitpicks. Thanks!!

Wasmtime GitHub notifications bot (Jul 18 2024 at 19:15):

fitzgen created PR review comment:

Can you follow this with a sentence saying why that is? Something like

We want the NaN bit patterns produced by an instruction to be consistent, and compile-time evaluation in a cross-compilation scenario risks producing different NaN bit patterns than the target would have at run-time.

Wasmtime GitHub notifications bot (Jul 18 2024 at 19:15):

fitzgen created PR review comment:

I think it is worth worth defining helpers like
impl Ieee{32,64} {
 pub(crate) fn non_nan(self) -> Option<Self> {
 Some(self).filter(|f| !f.is_nan())
 }
}
to clean up some of this repetition.

Wasmtime GitHub notifications bot (Jul 18 2024 at 20:08):

primoly updated PR #8954.

Wasmtime GitHub notifications bot (Jul 18 2024 at 20:23):

fitzgen submitted PR review:

Perfect! Thanks again!

Wasmtime GitHub notifications bot (Jul 18 2024 at 20:38):

primoly updated PR #8954.

Wasmtime GitHub notifications bot (Jul 19 2024 at 09:38):

primoly updated PR #8954.

Wasmtime GitHub notifications bot (Jul 19 2024 at 18:44):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Jul 19 2024 at 18:59):

fitzgen merged PR #8954.

Last updated: Apr 17 2025 at 05:03 UTC