wasmtime / PR #12939 winch: respect the enable_nan_canoni... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #12939 winch: respect the enable_nan_canoni...

Wasmtime GitHub notifications bot (Apr 02 2026 at 04:50):

r-near opened PR #12939 from r-near:winch-nan-canonicalization to bytecodealliance:main:

The enable_nan_canonicalization flag already flows through to Winch via the shared Flags, but Winch was ignoring it. This adds a canonicalize_nan method to the Masm trait that, when the flag is set, emits a compare-with-self + conditional branch to replace NaN results with the canonical quiet NaN after each float arithmetic op.

Covered operations: add, sub, mul, div, min, max, sqrt, ceil, floor, trunc, nearest, demote, and promote. Implemented for x64 and aarch64. Includes a scalar wast test (counterpart to simd/canonicalize-nan.wast).

Wasmtime GitHub notifications bot (Apr 02 2026 at 04:50):

r-near requested alexcrichton for a review on PR #12939.

Wasmtime GitHub notifications bot (Apr 02 2026 at 04:50):

r-near requested wasmtime-compiler-reviewers for a review on PR #12939.

Wasmtime GitHub notifications bot (Apr 02 2026 at 04:50):

r-near requested Copilot for a review on PR #12939.

Wasmtime GitHub notifications bot (Apr 02 2026 at 04:50):

r-near requested wasmtime-core-reviewers for a review on PR #12939.

Wasmtime GitHub notifications bot (Apr 02 2026 at 04:52):

r-near converted PR #12939 winch: respect the enable_nan_canonicalization setting to a draft.

Wasmtime GitHub notifications bot (Apr 02 2026 at 04:54):

Copilot submitted PR review:

Pull request overview

This PR ensures Winch respects the existing enable_nan_canonicalization shared setting by adding a canonicalize_nan hook to the MacroAssembler trait and invoking it after scalar floating-point operations so NaN results are normalized to the canonical quiet-NaN bit pattern.

Changes:

Add MacroAssembler::canonicalize_nan and implement it for x64 and aarch64.

Invoke NaN canonicalization after scalar float arithmetic/conversion ops (and after rounding paths in the visitor).

Add a scalar WAST test to validate canonical NaN bit patterns.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 4 comments.

<details>
<summary>Show a summary per file</summary>

File Description

winch/codegen/src/visitor.rs Inserts canonicalize_nan after scalar float ops and adds helper for rounding results.

winch/codegen/src/masm.rs Extends the MacroAssembler trait with canonicalize_nan.

winch/codegen/src/isa/x64/masm.rs Implements NaN detection + replacement with canonical NaN on x64.

winch/codegen/src/isa/aarch64/masm.rs Implements NaN detection + replacement with canonical NaN on aarch64; stores shared flags in the masm.

tests/misc_testsuite/canonicalize-nan-scalar.wast Adds scalar coverage for canonical NaN bit patterns and propagation.

</details>

:light_bulb: <a href="/bytecodealliance/wasmtime/new/main?filename=.github/instructions/*.instructions.md" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Add Copilot custom instructions</a> for smarter, more guided reviews. <a href="https://docs.github.com/en/copilot/customizing-copilot/adding-repository-custom-instructions-for-github-copilot" class="Link--inTextBlock" target="_blank" rel="noopener noreferrer">Learn how to get started</a>.

File	Description
`winch/codegen/src/visitor.rs`	Inserts `canonicalize_nan` after scalar float ops and adds helper for rounding results.
`winch/codegen/src/masm.rs`	Extends the `MacroAssembler` trait with `canonicalize_nan`.
`winch/codegen/src/isa/x64/masm.rs`	Implements NaN detection + replacement with canonical NaN on x64.
`winch/codegen/src/isa/aarch64/masm.rs`	Implements NaN detection + replacement with canonical NaN on aarch64; stores shared flags in the masm.
`tests/misc_testsuite/canonicalize-nan-scalar.wast`	Adds scalar coverage for canonical NaN bit patterns and propagation.
</details>

Wasmtime GitHub notifications bot (Apr 02 2026 at 04:54):

Copilot created PR review comment:

canonicalize_nan_for_round always pops the result into a register and pushes it back, even when NaN canonicalization is disabled (because the flag check happens inside masm.canonicalize_nan). This can add unnecessary loads/moves/register pressure in the default configuration; consider gating before pop_to_reg (e.g., by moving canonicalization into the float_round implementations where the value is already in a reg, or by exposing a flag query on MacroAssembler).

Wasmtime GitHub notifications bot (Apr 02 2026 at 04:54):

Copilot created PR review comment:

canonicalize_nan currently emits a conditional jump to nan_label and then an unconditional jump to done_label, which means every float op pays for an extra branch even in the common non-NaN case. Consider inverting the condition and using fallthrough (e.g., jump to done_label when not NaN, otherwise fall through into the canonical-NaN load) to avoid the unconditional jump and reduce code size/branch overhead.
        let done_label = self.asm.buffer_mut().get_label();

        self.asm.ucomis(reg.to_reg(), reg.to_reg(), size);
        // If the value is not NaN (parity flag not set), skip the canonical-NaN load.
        self.asm.jmp_if(CC::NP, done_label);

Wasmtime GitHub notifications bot (Apr 02 2026 at 04:54):

Copilot created PR review comment:

This test exercises scalar canonicalization, but it doesn't cover the newly-instrumented f64 add/sub/mul/min/max or any f64 rounding ops (ceil/floor/trunc/nearest). Adding a few f64 cases would better guard the new visitor paths and ensure the correct OperandSize::S64 canonicalization is applied across the whole set of operations mentioned in the PR description.

Wasmtime GitHub notifications bot (Apr 02 2026 at 04:54):

Copilot created PR review comment:

Same as x64: canonicalize_nan does a conditional branch to nan_label followed by an unconditional branch to done_label, so the non-NaN path still executes an unconditional jump. Restructure to a single conditional branch with fallthrough to reduce branch overhead since this runs after every float op when canonicalization is enabled.
        let done_label = self.asm.buffer_mut().get_label();

        self.asm.fcmp(reg.to_reg(), reg.to_reg(), size);
        // If the result is not NaN (VS flag not set), skip canonicalization.
        self.asm.jmp_if_not(Cond::Vs, done_label);

Wasmtime GitHub notifications bot (Apr 02 2026 at 04:58):

r-near updated PR #12939.

Wasmtime GitHub notifications bot (Apr 02 2026 at 04:59):

r-near updated PR #12939.

Wasmtime GitHub notifications bot (Apr 02 2026 at 05:01):

r-near created PR review comment:

Done, inverted to use CC::NP with fallthrough.

Wasmtime GitHub notifications bot (Apr 02 2026 at 05:01):

r-near submitted PR review.

Wasmtime GitHub notifications bot (Apr 02 2026 at 05:01):

r-near submitted PR review.

Wasmtime GitHub notifications bot (Apr 02 2026 at 05:01):

r-near created PR review comment:

switched to Cond::Vc

Wasmtime GitHub notifications bot (Apr 02 2026 at 05:02):

r-near submitted PR review.

Wasmtime GitHub notifications bot (Apr 02 2026 at 05:02):

r-near created PR review comment:

Left as-is, the overhead is a single reg move, and gating it would mean exposing a flag query on the MacroAssembler trait which seems more invasive than warranted for this

Wasmtime GitHub notifications bot (Apr 02 2026 at 05:02):

r-near submitted PR review.

Wasmtime GitHub notifications bot (Apr 02 2026 at 05:02):

r-near created PR review comment:

Added f64 coverage for add/sub/mul/min/max and ceil/floor/trunc/nearest

Wasmtime GitHub notifications bot (Apr 02 2026 at 05:13):

r-near has marked PR #12939 as ready for review.

Wasmtime GitHub notifications bot (Apr 02 2026 at 08:04):

github-actions[bot] added the label winch on PR #12939.

Wasmtime GitHub notifications bot (Apr 02 2026 at 08:04):

github-actions[bot] commented on PR #12939:

Subscribe to Label Action

cc @saulecabrera

<details>
This issue or pull request has been labeled: "winch"

Thus the following users have been cc'd because of the following labels:

saulecabrera: winch

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Apr 02 2026 at 16:56):

r-near updated PR #12939.

Wasmtime GitHub notifications bot (Apr 02 2026 at 20:40):

r-near commented on PR #12939:

@cfallin Thanks for merging #12940! Would you mind taking a look at this one as well?

Wasmtime GitHub notifications bot (Apr 02 2026 at 20:40):

r-near edited a comment on PR #12939:

@cfallin Thanks for reviewing #12940! Would you mind taking a look at this one as well?

Wasmtime GitHub notifications bot (Apr 02 2026 at 21:01):

cfallin commented on PR #12939:

@r-near we usually either stick with the assignment bot's suggestions (the reason I reviewed your last PR) or, in cases such as this one, tag a sub-area reviewer who knows part of the code best -- in this case @saulecabrera might be best to review Winch-specific work?

Wasmtime GitHub notifications bot (Apr 02 2026 at 22:04):

saulecabrera commented on PR #12939:

I'll review.

Wasmtime GitHub notifications bot (Apr 06 2026 at 13:49):

saulecabrera requested saulecabrera for a review on PR #12939.

Wasmtime GitHub notifications bot (Apr 06 2026 at 13:49):

saulecabrera unassigned alexcrichton from PR #12939 winch: respect the enable_nan_canonicalization setting.

Wasmtime GitHub notifications bot (Apr 06 2026 at 17:43):

saulecabrera submitted PR review:

Looking good, some comments inline.

Wasmtime GitHub notifications bot (Apr 06 2026 at 17:43):

saulecabrera created PR review comment:

Given that the canonicalization here is optional, let's rename this method to maybe_canonicalize_nan.

Wasmtime GitHub notifications bot (Apr 06 2026 at 17:43):

saulecabrera created PR review comment:

Could we also implement this functionality for the non-scalar counterparts? I believe that all vector instructions are implemented in order to respect canonicalization. Once implemented we should be able to fully test simd/canonicalize-nan.wast in x86-64, by removing this line https://github.com/bytecodealliance/wasmtime/blob/main/crates/test-util/src/wast.rs#L614

Wasmtime GitHub notifications bot (Apr 06 2026 at 17:43):

saulecabrera created PR review comment:

To avoid repeating these immediate values across ISAs, can we introduce a constant at the Masm layer and import them here instead?

Wasmtime GitHub notifications bot (Apr 06 2026 at 17:43):

saulecabrera created PR review comment:

Also, could you expand a bit on the rationale for adding a per-Masm implementation instead of an ISA-agnostic one? A priori, it seems that the MacroAssember exposes enough functionality to express canonicalization one layer above.

Wasmtime GitHub notifications bot (Apr 06 2026 at 17:43):

saulecabrera created PR review comment:

Could we replace this function with a a call to self.masm.canonicalize_nan passing in the return register of the builtin function instead? The register holding the return value should be accessible via the builtins struct. That way we can avoid a function call (to canonicalize_nan_for_roundand going through all the hoops involved in pop_to_reg) and more importantly we reduce the risk of this function getting used in context different to round-result canonicalization, for which as far as I can tell, we are not enforcing any particular invariants.

Wasmtime GitHub notifications bot (Apr 06 2026 at 18:15):

r-near updated PR #12939.

Wasmtime GitHub notifications bot (Apr 06 2026 at 18:45):

r-near submitted PR review.

Wasmtime GitHub notifications bot (Apr 06 2026 at 18:45):

r-near created PR review comment:

The trait doesn't have float-and-compare branch or float-constant-load, so an agnostic version would need to go through float_cmp_with_set plus an integer branch, which is a few more instructions than the native path each backend can do.

I was leaning per-ISA, but perhaps we just add those primitives to the trait?

BTW I added the shared constants so that should at least cover the duplication concern.

Wasmtime GitHub notifications bot (Apr 06 2026 at 18:45):

r-near created PR review comment:

Renamed the method here: 7cdd58e (this PR)

Wasmtime GitHub notifications bot (Apr 06 2026 at 18:45):

r-near submitted PR review.

Wasmtime GitHub notifications bot (Apr 06 2026 at 18:45):

r-near submitted PR review.

Wasmtime GitHub notifications bot (Apr 06 2026 at 18:45):

r-near created PR review comment:

Yep, here: 75cecea (this PR)

Wasmtime GitHub notifications bot (Apr 06 2026 at 18:47):

r-near submitted PR review.

Wasmtime GitHub notifications bot (Apr 06 2026 at 18:47):

r-near created PR review comment:

Ok here we go: cdef987 (this PR).

Added maybe_canonicalize_v128_nan to the masm trait and implemented it for x64

I removed the skip and it looks like the tests pass :tada:

Wasmtime GitHub notifications bot (Apr 06 2026 at 18:52):

r-near submitted PR review.

Wasmtime GitHub notifications bot (Apr 06 2026 at 18:52):

r-near created PR review comment:

Ended up removing the function here: 1722807 (this PR).

I kept the pop/canonicalize/push inline since pop_to_reg is essentially a no-op when the value is already in a register... idk, it looked like wiring up the return register through float_round felt more invasive, but that might be a more purist solution

Wasmtime GitHub notifications bot (Apr 06 2026 at 18:57):

r-near edited PR #12939:

The enable_nan_canonicalization flag already flows through to Winch via the shared Flags, but Winch was ignoring it. This adds maybe_canonicalize_nan (scalar) and maybe_canonicalize_v128_nan (SIMD) methods to the Masm trait that, when the flag is set, emit a NaN-detecting sequence to replace results with the canonical quiet NaN.

Scalar: compare-with-self + conditional branch to load canonical NaN. Implemented for x64 and aarch64

SIMD (x64): unordered compare to build a NaN lane mask, then bitwise select with canonical NaN from the constant pool

Covers add, sub, mul, div, min, max, sqrt, ceil, floor, trunc, nearest, demote, and promote for both scalar and SIMD. Canonical NaN constants are shared across ISAs. Removes the canonicalize-nan.wast Winch skip.

Wasmtime GitHub notifications bot (Apr 07 2026 at 17:54):

saulecabrera submitted PR review:

Looks good to me, thanks for iterating on this. One last request and then we can land this one.

Wasmtime GitHub notifications bot (Apr 07 2026 at 17:54):

saulecabrera created PR review comment:

I wonder if it would be worth expressing the scalar version as the vector counterpart (using a single lane), to make it branchless. We'd probably need to fallback to the branch version when avx instructions are not supported though. For now let's leave a comment here stating that in the future if a more performant version is needed, a variation of the v128 branchless could be considered for scalars.