primoly edited PR #8954.
github-actions[bot] commented on PR #8954:
Subscribe to Label Action
cc @cfallin, @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "isle"Thus the following users have been cc'd because of the following labels:
- cfallin: isle
- fitzgen: isle
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
primoly edited PR #8954:
Adds constant propagation of the following instructions for
$F32
and$F64
:
fadd
,fsub
,fmul
,fdiv
,sqrt
,ceil
,floor
,trunc
,nearest
.
fmin
andfmax
are still missing. Those are tricky due to their handling and propagation of NaNs: cranelift_codegen::ir::InstBuilder::fmin
primoly updated PR #8954.
fitzgen submitted PR review:
Thanks! Looks good to me, modulo one clarification below.
fitzgen submitted PR review:
Thanks! Looks good to me, modulo one clarification below.
fitzgen created PR review comment:
fma
, and any other relaxed SIMD opcode that is non-deterministic, is a little bit tricky. We would want to double check the required Wasm semantics, and whether the non-determinism is allowed to "change" throughout the program or not. If it is spec'd to be non-deterministic, but always has the same behavior for a given evaluation of the N non-deterministic choices, then const folding could be problematic if we are compiling on an aarch64 machine but running on an x64 machine, for example. It would essentially be observable to the program whether the constant folding was happening or not, and I'm not sure if that is allowed by the spec.So we should answer this question, and then document the answer here.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
If it is spec'd to be non-deterministic, but always has the same behavior for a given evaluation of the N non-deterministic choices, then const folding could be problematic if we are compiling on an aarch64 machine but running on an x64 machine, for example.
IMHO it would be problematic either way. It would break reproducibility of the compiled executable/
.cwasm
file across compiler host architectures.
fitzgen submitted PR review.
fitzgen created PR review comment:
FWIW, I tried to answer this question for myself and didn't come away with anything firm.
Filed https://github.com/WebAssembly/relaxed-simd/issues/155 to ask for clarification.
I think this
TODO
comment can just add a note about the potential gotcha with non-deterministic instructions and link to that issue.
afonso360 submitted PR review.
afonso360 created PR review comment:
I think we may already be able to observe const propagation with this PR already. At least on RISC-V any NaN producing operation is guaranteed to produce the canonical NaN, but I think some other platforms preserve the payload bits.
fitzgen created PR review comment:
At least on RISC-V any NaN producing operation is guaranteed to produce the canonical NaN, but I think some other platforms preserve the payload bits.
NaN
s don't have any guarantee around having the same behavior across instructions, AFAIK. But the relaxed SIMD instructions might, or at least I'm not sure, which is why I filed that issue.It would break reproducibility of the compiled executable/
.cwasm
file across compiler host architectures.I guess this is a philosophical choice. It isn't clear to me whether we want to provide the weaker (deterministic cross compilation given the same host architecture) or the stronger (deterministic cross compilation regardless of host architecture) guarantee.
If the latter, then we can't do compile-time evaluation of any floating point operation that could involve
NaN
s.
fitzgen submitted PR review.
primoly submitted PR review.
primoly created PR review comment:
I read the docs as
fma
in Cranelift being deterministic with regards to rounding. However I don’t know if this is actually honoured by codegen, since properly emulatingfma
’s rounding might be too expensive. If it turns out thatfma
is (either set or list) non-deterministic (ignoring propagation of NaNs), then the documentation inInstructionBuilder
has to be updated.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
If the latter, then we can't do compile-time evaluation of any floating point operation that could involve NaNs.
You could still attempt evaluation and bail out optimizing if the output turns out to be NaN.
fitzgen submitted PR review.
fitzgen created PR review comment:
Yeah that's true. We could make all these external constructors
partial
and have them returnNone
if the operation is given or produces aNaN
.
primoly commented on PR #8954:
Reading the WebAssembly spec with regards to NaNs<sup>[1][2]</sup> here are some remarks and questions.
When all the NaNs in the input are canonical (NaN<sub>canon</sub>), then the output is canonical as well.<sup>[1]</sup> So there is no non-determinism in this case.<sup>[3]</sup> If at least one of the inputs is non-canonical, the output is non-deterministic, but must be an arithmetic NaN (which of course can be a NaN<sub>canon</sub>).<sup>[2]</sup> I don’t know if this is actually handled correctly by Cranelift, because always propagating NaNs is incorrect, since non-arithmetic NaNs must be turned into arithmetic ones. Unless it is guaranteed that
Ieee32
andIeee64
NaNs are always arithmetic, but I don’t think this is the case. (opened https://github.com/bytecodealliance/wasmtime/issues/8967)In Cranelift the functions used by the constant propagation of float in this PR (as well as https://github.com/bytecodealliance/wasmtime/pull/8625) convert the
Ieee32
/Ieee64
to Rustf32
/f64
, perform the operation and then convert the result back toIeee32
/Ieee64
by reinterpreting their bits. So the question: What happens to different kinds of NaNs during both calculation and conversions? Do they propagate, are they canonicalised or is this non-deterministic?To avoid const prop leading to different results than before @bjorn3 mentioned that well could just ignore const prop for all instructions involving NaNs. Actually you could still include operations that involve only NaN<sub>canon</sub>, since there propagation and canonicalisation would be the same. Except there still remains the problem with the sign.<sup>[3]</sup>
Personally, I would prefer to do the propagation in all cases, since programs should never assume platform specific behaviour (list non-determinism) for instructions that are defined to be set non-deterministic. I except that constant NaNs (and especially non NaN<sub>canon</sub>) will almost never occur in practice, so my reason is just the avoidance of to much special casing.
[1] https://webassembly.github.io/spec/core/exec/numerics.html#nan-propagation
[2] https://webassembly.github.io/spec/core/syntax/values.html#syntax-float
[3] Not quite: The sign is not part of the NaN<sub>canon</sub>, so there are effectively two NaN<sub>canon</sub>: +NaN<sub>canon</sub> and −NaN<sub>canon</sub>. The sign of the output is non-deterministic for all operations except
fneg
,fabs
andfcopysign
.
primoly edited a comment on PR #8954:
Reading the WebAssembly spec with regards to NaNs<sup>[1][2]</sup> here are some remarks and questions.
When all the NaNs in the input are canonical (NaN<sub>canon</sub>), then the output is canonical as well.<sup>[1]</sup> So there is no non-determinism in this case.<sup>[3]</sup> If at least one of the inputs is non-canonical, the output is non-deterministic, but must be an arithmetic NaN (which of course can be a NaN<sub>canon</sub>).<sup>[2]</sup> I don’t know if this is actually handled correctly by Cranelift, because always propagating NaNs is incorrect, since non-arithmetic NaNs must be turned into arithmetic ones. Unless it is guaranteed that
Ieee32
andIeee64
NaNs are always arithmetic, but I don’t think this is the case. (opened https://github.com/bytecodealliance/wasmtime/issues/8967)In Cranelift the functions used by the constant propagation of float in this PR (as well as https://github.com/bytecodealliance/wasmtime/pull/8625) convert the
Ieee32
/Ieee64
to Rustf32
/f64
, perform the operation and then convert the result back toIeee32
/Ieee64
by reinterpreting their bits. So the question: What happens to different kinds of NaNs during both calculation and conversions? Do they propagate, are they canonicalised or is this non-deterministic?To avoid const prop leading to different results than before @bjorn3 mentioned that well could just ignore const prop for all instructions involving NaNs. Actually you could still include operations that involve only NaN<sub>canon</sub>, since there propagation and canonicalisation would be the same. Except there still remains the problem with the sign.<sup>[3]</sup>
Personally, I would prefer to do the constant propagation in all cases (and just pick one spec compliant way of dealing with NaNs), since programs should never assume platform specific behaviour (list non-determinism) for instructions that are defined to be set non-deterministic. I except that constant NaNs (and especially non NaN<sub>canon</sub>) will almost never occur in practice, so my reason is just the avoidance of to much special casing.
[1] https://webassembly.github.io/spec/core/exec/numerics.html#nan-propagation
[2] https://webassembly.github.io/spec/core/syntax/values.html#syntax-float
[3] Not quite: The sign is not part of the NaN<sub>canon</sub>, so there are effectively two NaN<sub>canon</sub>: +NaN<sub>canon</sub> and −NaN<sub>canon</sub>. The sign of the output is non-deterministic for all operations except
fneg
,fabs
andfcopysign
.
fitzgen commented on PR #8954:
I would be more comfortable with aborting the compile-time evaluation if either given or producing a
NaN
. It is more conservative, and we can always have follow up discussions/PRs focused on just that issue. I also suspect there isn't too much additional performance to gain w.r.tNaN
s.
FWIW, it seems like compile-time evaluating relaxed SIMD instructions like
fma
is probably a no-go: https://github.com/WebAssembly/relaxed-simd/issues/155#issuecomment-2232721287
primoly updated PR #8954.
primoly updated PR #8954.
primoly commented on PR #8954:
I’ve updated the PR to always check whether the result is NaN and don’t do constant folding in that case. I also changed
neg
abs
andcopysign
to not convert to Rustf32
/f64
but operate on the bits directly, since otherwise NaNs could have their payload and even sign changed. I also addedfmin
andfmax
folding.Regarding
fma
: I still believe that it is a deterministic instruction when not involving NaNs, but since this PR doesn’t include it, this is something we can discuss in the future.
fitzgen commented on PR #8954:
Regarding
fma
: I still believe that it is a deterministic instruction when not involving NaNs, but since this PR doesn’t include it, this is something we can discuss in the future.Yes, my b I was thinking of the
relaxed_madd
Wasm instruction.
fitzgen submitted PR review:
Looks good modulo a rebase to address conflicts with
main
and a couple tiny nitpicks. Thanks!!
fitzgen submitted PR review:
Looks good modulo a rebase to address conflicts with
main
and a couple tiny nitpicks. Thanks!!
fitzgen created PR review comment:
Can you follow this with a sentence saying why that is? Something like
We want the NaN bit patterns produced by an instruction to be consistent, and compile-time evaluation in a cross-compilation scenario risks producing different NaN bit patterns than the target would have at run-time.
fitzgen created PR review comment:
I think it is worth worth defining helpers like
impl Ieee{32,64} { pub(crate) fn non_nan(self) -> Option<Self> { Some(self).filter(|f| !f.is_nan()) } }
to clean up some of this repetition.
primoly updated PR #8954.
fitzgen submitted PR review:
Perfect! Thanks again!
primoly updated PR #8954.
primoly updated PR #8954.
fitzgen submitted PR review.
fitzgen merged PR #8954.
Last updated: Nov 22 2024 at 16:03 UTC