Stream: git-wasmtime

Topic: wasmtime / PR #9889 Winch: implement fpu to int conversio...


view this post on Zulip Wasmtime GitHub notifications bot (Dec 21 2024 at 17:56):

MarinPostma edited PR #9889.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 01 2025 at 17:47):

saulecabrera commented on PR #9889:

I can help reviewing this one.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 01 2025 at 17:47):

saulecabrera requested saulecabrera for a review on PR #9889.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 02 2025 at 18:12):

saulecabrera submitted PR review:

Looking good. Left some comments inline.

As a general comment, the emit_* naming convention is not standard across the assembler implementations (as noted and fixed in https://github.com/bytecodealliance/wasmtime/pull/9918), so I´d suggest dropping the emit_ prefix in the assembler methods introduced in this PR.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 02 2025 at 18:12):

saulecabrera created PR review comment:

Similar comment here, I think it should be possible to omit the OperandSize prefix in both match arms.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 02 2025 at 18:12):

saulecabrera created PR review comment:

Nit: I believe that here and in the next match arm below you can omit the OperandSize:: prefix given that you're already importing it above (e.g., use OperandSize::*)

view this post on Zulip Wasmtime GitHub notifications bot (Jan 02 2025 at 18:12):

saulecabrera created PR review comment:

    /// Load the max value for an integer of size out_size, as a floating-point
    /// of size `in_size`, into register `rd`.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 02 2025 at 18:12):

saulecabrera created PR review comment:

I´d suggest avoiding passing a placeholder register here. The temp register that is passed to unsigned_truncate is an allocatable register, therefore there's very little chance of it getting unintentionally clobbered. Instead having a single implementation for both cases, I´d suggest breaking down the implementation in the assembler into two methods, one for each case (signed/unsigned), then in each implementation you can decide to use the scratch register, immediately before emission, without the risk of unintentionally clobbering it. This is similar to how the x64 assembler implementation works.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 02 2025 at 18:12):

saulecabrera created PR review comment:

If we're writing to tmp_reg, I think this should be defined in the signature, similar to dst: tmp_reg: Writable<Reg>

view this post on Zulip Wasmtime GitHub notifications bot (Jan 02 2025 at 18:12):

saulecabrera created PR review comment:

Here we don´t need to duplicate this method, fcmp already exists.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 03 2025 at 14:36):

MarinPostma submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 03 2025 at 14:36):

MarinPostma created PR review comment:

To make sure I understand, similarly to how x86 implementation, you would like to have two methods, fpu_to_sint and fpu_to_uint in asm, and also, those methods should be passed a temporary register in masm, rather than using the provided one in the case of unsigned_truncate?

I have a few questions/remarks:

view this post on Zulip Wasmtime GitHub notifications bot (Jan 03 2025 at 14:41):

MarinPostma commented on PR #9889:

@saulecabrera I addressed most of the review comments but left a few questions

view this post on Zulip Wasmtime GitHub notifications bot (Jan 03 2025 at 14:45):

MarinPostma updated PR #9889.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 03 2025 at 14:46):

MarinPostma updated PR #9889.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2025 at 16:38):

saulecabrera submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2025 at 16:38):

saulecabrera created PR review comment:

The purpose of the allocatable temporary register is to provide an extra
register to aid in the emission of the instruction sequence needed for the
unsigned case; in x64, 4 registers are required for the instruction sequence:
1 for the destination and 3 temporary registers. Since in the case of x64 the
instruction sequence is completely handled by Cranelift, there's no risk of
unintentionally clobbering any of the registers passed to it, the unintentional
clobbering risk is particularly important in the case of scratch registers in Winch: even
though they can be used for any purpose, one important detail about them is that
they are not tracked by Winch's regalloc therefore they must be used sparingly
and with extreme caution: a misuse could result in unintentional clobbering and
therefore subtle bugs. Ideally, the live range of the scratch registers should be
as short as possible to avoid potential bugs.

In this particular case the reason why I am suggestion improving the usage of
the scratch register is to minimize the risk of unintentional clobbering of such
registers: once a scratch register is passed as a parameter, its live range is
extended and the risk of misuse is greater. A concrete example in this case of
a potential risk of unintentional clobbering is passing the scratch register to
fpu_to_int and then using the float scratch register in any of the methods
needed for the instruction sequence needed to fulfill the truncate instructions
(this is not the case today, but by passing the float scratch register around as
if it were any other register, it becomes harder to reason and audit in the
future).

There are some alternatives that I've thought about to improve the
handling/usage of the scratch registers to address some of the challenges
exposed in the previous paragraph. We could discuss those in a separate
issue/PR (e.g., relying on the type system to identify/audit scratch register usage and/or
providing exclusive access to the scratch registers, similar to what we have for
the allocatable registers). Care needs to be taken here to ensure that none of
the solutions explored affect compilation performance.

Taking a deeper look at your changes for the unsigned case, only 3 registers are
needed here compared to the x64 implementation which demands 4. This means that
you can effectively ignore the temporary register defined in the Masm
unsigned_truncate signature and only rely on the float scratch register.

Taking this into account, my concrete suggestion to move forward here is to:

fn unsigned_truncate(
    &mut self,
    context: &mut CodeGenContext<Emission>,
    src_size: OperandSize,
    dst_size: OperandSize,
    kind: TruncKind
) -> Result<()>;

This allows maximum flexibility to encode the ISA differences and more
importantly reduces register pressure in cases where only 3 registers are
needed, reducing unnecessary register pressure where possible is important to
minimize memory traffic.

This is similar to how we handle the div instruction, given that the
requirements vary between ISAs.

fn fpu_to_int(
    &mut self,
    dst: Writable<Reg>,
    src: Reg,
    src_size: OperandSize,
    dst_size: OperandSize,
    kind: TruncKind,
    signed: bool
) { ... };

Avoiding passing the scratch registers as params and shortening their live
range as much as possible by using them explicitly in each of the helper
functions that you have to fulfill the instruction sequence.


I believe that with this approach, my concerns will be addressed and there won't
be a need to split the signed/unsigned implementations (initially I was confused
with the TODO comment in the code and had assumed that the implementation was
making use of a scratch register that wasn't actually needed, although that's
not the case).

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2025 at 16:38):

saulecabrera edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2025 at 17:49):

MarinPostma submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2025 at 17:49):

MarinPostma created PR review comment:

Thanks a lot for the detailed writeup! This makes things a lot clearer. I'll make the changes :+1:

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2025 at 23:47):

MarinPostma updated PR #9889.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 04 2025 at 23:51):

MarinPostma commented on PR #9889:

@saulecabrera I have made the requested changes

view this post on Zulip Wasmtime GitHub notifications bot (Jan 05 2025 at 17:31):

saulecabrera submitted PR review:

Looks great! Thanks for all your patience iterating on this. Left two minor comments and then we should be able to land this one.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 05 2025 at 17:31):

saulecabrera created PR review comment:

Similar here, you can use the appropriate error constructor.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 05 2025 at 17:31):

saulecabrera created PR review comment:

Here you can use CodeGenError::unexpected_operand_size() instead. https://github.com/bytecodealliance/wasmtime/blob/main/winch/codegen/src/codegen/error.rs#L139

view this post on Zulip Wasmtime GitHub notifications bot (Jan 07 2025 at 14:37):

MarinPostma updated PR #9889.

view this post on Zulip Wasmtime GitHub notifications bot (Jan 07 2025 at 14:38):

MarinPostma commented on PR #9889:

It should be good now @saulecabrera

view this post on Zulip Wasmtime GitHub notifications bot (Jan 07 2025 at 15:12):

saulecabrera submitted PR review:

Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Jan 07 2025 at 15:30):

saulecabrera merged PR #9889.


Last updated: Jan 24 2025 at 00:11 UTC