r-near opened PR #12939 from r-near:winch-nan-canonicalization to bytecodealliance:main:
The
enable_nan_canonicalizationflag already flows through to Winch via the sharedFlags, but Winch was ignoring it. This adds acanonicalize_nanmethod to theMasmtrait that, when the flag is set, emits a compare-with-self + conditional branch to replace NaN results with the canonical quiet NaN after each float arithmetic op.Covered operations: add, sub, mul, div, min, max, sqrt, ceil, floor, trunc, nearest, demote, and promote. Implemented for x64 and aarch64. Includes a scalar wast test (counterpart to
simd/canonicalize-nan.wast).
r-near requested alexcrichton for a review on PR #12939.
r-near requested wasmtime-compiler-reviewers for a review on PR #12939.
r-near requested Copilot for a review on PR #12939.
r-near requested wasmtime-core-reviewers for a review on PR #12939.
r-near converted PR #12939 winch: respect the enable_nan_canonicalization setting to a draft.
Copilot submitted PR review:
Pull request overview
This PR ensures Winch respects the existing
enable_nan_canonicalizationshared setting by adding acanonicalize_nanhook to theMacroAssemblertrait and invoking it after scalar floating-point operations so NaN results are normalized to the canonical quiet-NaN bit pattern.Changes:
- Add
MacroAssembler::canonicalize_nanand implement it for x64 and aarch64.- Invoke NaN canonicalization after scalar float arithmetic/conversion ops (and after rounding paths in the visitor).
- Add a scalar WAST test to validate canonical NaN bit patterns.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.
<details>
<summary>Show a summary per file</summary>
File Description winch/codegen/src/visitor.rsInserts canonicalize_nanafter scalar float ops and adds helper for rounding results.winch/codegen/src/masm.rsExtends the MacroAssemblertrait withcanonicalize_nan.winch/codegen/src/isa/x64/masm.rsImplements NaN detection + replacement with canonical NaN on x64. winch/codegen/src/isa/aarch64/masm.rsImplements NaN detection + replacement with canonical NaN on aarch64; stores shared flags in the masm. tests/misc_testsuite/canonicalize-nan-scalar.wastAdds scalar coverage for canonical NaN bit patterns and propagation. </details>
:light_bulb: <a href="/bytecodealliance/wasmtime/new/main?filename=.github/instructions/*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.
Copilot created PR review comment:
canonicalize_nan_for_roundalways pops the result into a register and pushes it back, even when NaN canonicalization is disabled (because the flag check happens insidemasm.canonicalize_nan). This can add unnecessary loads/moves/register pressure in the default configuration; consider gating beforepop_to_reg(e.g., by moving canonicalization into thefloat_roundimplementations where the value is already in a reg, or by exposing a flag query onMacroAssembler).
Copilot created PR review comment:
canonicalize_nancurrently emits a conditional jump tonan_labeland then an unconditional jump todone_label, which means every float op pays for an extra branch even in the common non-NaN case. Consider inverting the condition and using fallthrough (e.g., jump todone_labelwhen not NaN, otherwise fall through into the canonical-NaN load) to avoid the unconditional jump and reduce code size/branch overhead.let done_label = self.asm.buffer_mut().get_label(); self.asm.ucomis(reg.to_reg(), reg.to_reg(), size); // If the value is not NaN (parity flag not set), skip the canonical-NaN load. self.asm.jmp_if(CC::NP, done_label);
Copilot created PR review comment:
This test exercises scalar canonicalization, but it doesn't cover the newly-instrumented f64 add/sub/mul/min/max or any f64 rounding ops (ceil/floor/trunc/nearest). Adding a few f64 cases would better guard the new visitor paths and ensure the correct
OperandSize::S64canonicalization is applied across the whole set of operations mentioned in the PR description.
Copilot created PR review comment:
Same as x64:
canonicalize_nandoes a conditional branch tonan_labelfollowed by an unconditional branch todone_label, so the non-NaN path still executes an unconditional jump. Restructure to a single conditional branch with fallthrough to reduce branch overhead since this runs after every float op when canonicalization is enabled.let done_label = self.asm.buffer_mut().get_label(); self.asm.fcmp(reg.to_reg(), reg.to_reg(), size); // If the result is not NaN (VS flag not set), skip canonicalization. self.asm.jmp_if_not(Cond::Vs, done_label);
r-near updated PR #12939.
r-near updated PR #12939.
r-near created PR review comment:
Done, inverted to use CC::NP with fallthrough.
r-near submitted PR review.
r-near submitted PR review.
r-near created PR review comment:
switched to Cond::Vc
r-near submitted PR review.
r-near created PR review comment:
Left as-is, the overhead is a single reg move, and gating it would mean exposing a flag query on the MacroAssembler trait which seems more invasive than warranted for this
r-near submitted PR review.
r-near created PR review comment:
Added f64 coverage for add/sub/mul/min/max and ceil/floor/trunc/nearest
r-near has marked PR #12939 as ready for review.
github-actions[bot] added the label winch on PR #12939.
github-actions[bot] commented on PR #12939:
Subscribe to Label Action
cc @saulecabrera
<details>
This issue or pull request has been labeled: "winch"Thus the following users have been cc'd because of the following labels:
- saulecabrera: winch
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
r-near updated PR #12939.
r-near commented on PR #12939:
@cfallin Thanks for merging #12940! Would you mind taking a look at this one as well?
r-near edited a comment on PR #12939:
@cfallin Thanks for reviewing #12940! Would you mind taking a look at this one as well?
cfallin commented on PR #12939:
@r-near we usually either stick with the assignment bot's suggestions (the reason I reviewed your last PR) or, in cases such as this one, tag a sub-area reviewer who knows part of the code best -- in this case @saulecabrera might be best to review Winch-specific work?
saulecabrera commented on PR #12939:
I'll review.
saulecabrera requested saulecabrera for a review on PR #12939.
saulecabrera unassigned alexcrichton from PR #12939 winch: respect the enable_nan_canonicalization setting.
saulecabrera submitted PR review:
Looking good, some comments inline.
saulecabrera created PR review comment:
Given that the canonicalization here is optional, let's rename this method to
maybe_canonicalize_nan.
saulecabrera created PR review comment:
Could we also implement this functionality for the non-scalar counterparts? I believe that all vector instructions are implemented in order to respect canonicalization. Once implemented we should be able to fully test
simd/canonicalize-nan.wastin x86-64, by removing this line https://github.com/bytecodealliance/wasmtime/blob/main/crates/test-util/src/wast.rs#L614
saulecabrera created PR review comment:
To avoid repeating these immediate values across ISAs, can we introduce a constant at the Masm layer and import them here instead?
saulecabrera created PR review comment:
Also, could you expand a bit on the rationale for adding a per-Masm implementation instead of an ISA-agnostic one? A priori, it seems that the MacroAssember exposes enough functionality to express canonicalization one layer above.
saulecabrera created PR review comment:
Could we replace this function with a a call to
self.masm.canonicalize_nanpassing in the return register of the builtin function instead? The register holding the return value should be accessible via thebuiltinsstruct. That way we can avoid a function call (tocanonicalize_nan_for_roundand going through all the hoops involved inpop_to_reg) and more importantly we reduce the risk of this function getting used in context different to round-result canonicalization, for which as far as I can tell, we are not enforcing any particular invariants.
r-near updated PR #12939.
r-near submitted PR review.
r-near created PR review comment:
The trait doesn't have float-and-compare branch or float-constant-load, so an agnostic version would need to go through
float_cmp_with_setplus an integer branch, which is a few more instructions than the native path each backend can do.I was leaning per-ISA, but perhaps we just add those primitives to the trait?
BTW I added the shared constants so that should at least cover the duplication concern.
r-near created PR review comment:
Renamed the method here:
7cdd58e(this PR)
r-near submitted PR review.
r-near submitted PR review.
r-near created PR review comment:
Yep, here:
75cecea(this PR)
r-near submitted PR review.
r-near created PR review comment:
Ok here we go:
cdef987(this PR).Added
maybe_canonicalize_v128_nanto the masm trait and implemented it for x64I removed the
skipand it looks like the tests pass :tada:
r-near submitted PR review.
r-near created PR review comment:
Ended up removing the function here:
1722807(this PR).I kept the pop/canonicalize/push inline since
pop_to_regis essentially a no-op when the value is already in a register... idk, it looked like wiring up the return register throughfloat_roundfelt more invasive, but that might be a more purist solution
r-near edited PR #12939:
The
enable_nan_canonicalizationflag already flows through to Winch via the sharedFlags, but Winch was ignoring it. This addsmaybe_canonicalize_nan(scalar) andmaybe_canonicalize_v128_nan(SIMD) methods to theMasmtrait that, when the flag is set, emit a NaN-detecting sequence to replace results with the canonical quiet NaN.
- Scalar: compare-with-self + conditional branch to load canonical NaN. Implemented for x64 and aarch64
- SIMD (x64): unordered compare to build a NaN lane mask, then bitwise select with canonical NaN from the constant pool
Covers add, sub, mul, div, min, max, sqrt, ceil, floor, trunc, nearest, demote, and promote for both scalar and SIMD. Canonical NaN constants are shared across ISAs. Removes the
canonicalize-nan.wastWinch skip.
saulecabrera submitted PR review:
Looks good to me, thanks for iterating on this. One last request and then we can land this one.
saulecabrera created PR review comment:
I wonder if it would be worth expressing the scalar version as the vector counterpart (using a single lane), to make it branchless. We'd probably need to fallback to the branch version when avx instructions are not supported though. For now let's leave a comment here stating that in the future if a more performant version is needed, a variation of the v128 branchless could be considered for scalars.
r-near updated PR #12939.
ricky-sb created PR review comment:
Added it
63ad118(this PR)
ricky-sb submitted PR review.
ricky-sb deleted PR review comment.
r-near submitted PR review.
r-near created PR review comment:
Done!
63ad118(this PR)
saulecabrera added PR #12939 winch: respect the enable_nan_canonicalization setting to the merge queue.
r-near updated PR #12939.
github-merge-queue[bot] removed PR #12939 winch: respect the enable_nan_canonicalization setting from the merge queue.
r-near commented on PR #12939:
@saulecabrera could you re-add it to the merge queue? There was a test failure
r-near edited a comment on PR #12939:
@saulecabrera could you re-add it to the merge queue? I fixed the test failure
saulecabrera commented on PR #12939:
That is the right fix, yeah.
saulecabrera added PR #12939 winch: respect the enable_nan_canonicalization setting to the merge queue.
saulecabrera merged PR #12939.
saulecabrera removed PR #12939 winch: respect the enable_nan_canonicalization setting from the merge queue.
Last updated: Apr 12 2026 at 23:10 UTC