wasmtime / issue #3682 x64: port select to ISLE · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #3682 x64: port select to ISLE

Wasmtime GitHub notifications bot (Jan 11 2022 at 22:11):

This is a complex lowering due to not only the tree-matching of the CLIF IR but also the sequences required by x64. Because of this the PR will need to go through a few more changes:

[ ] add a way to emit a sequence like the following for FloatCC.Equal and FloatCC.NotEqual: UNORD* + MOV + JNP + MOV + JZ + MOV--because this sequence must reuse the same flags (and avoid modifications to them, @cfallin suggested implementing a new Inst::XmmCmoveOr (but that also implies building a new Inst::Cmove as well)

[ ] add a way to emit conditional moves for multiple register values

[ ] implement the integer comparison side of these instructions (both the tree-matching version and the non-tree-matching versions)

I'm putting this PR up as a draft to see if I can get some help with the above and early feedback on any suggested refactoring to the rules.

Wasmtime GitHub notifications bot (Jan 11 2022 at 22:11):

abrown edited a comment on issue #3682:

This is a complex lowering due to not only the tree-matching of the CLIF IR but also the sequences required by x64. Because of this the PR will need to go through a few more changes:

[ ] add a way to emit a sequence like the following for FloatCC.Equal and FloatCC.NotEqual: UNORD* + MOV + JNP + MOV + JZ + MOV--because this sequence must reuse the same flags (and avoid modifications to them, @cfallin suggested implementing a new Inst::XmmCmoveOr (but that also implies building a new Inst::CmoveOr as well)

[ ] add a way to emit conditional moves for multiple register values

[ ] implement the integer comparison side of these instructions (both the tree-matching version and the non-tree-matching versions)

I'm putting this PR up as a draft to see if I can get some help with the above and early feedback on any suggested refactoring to the rules.

Wasmtime GitHub notifications bot (Jan 11 2022 at 22:32):

github-actions[bot] commented on issue #3682:

Subscribe to Label Action

cc @cfallin, @fitzgen

<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

cfallin: isle

fitzgen: isle

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Jan 15 2022 at 00:56):

abrown commented on issue #3682:

@cfallin, @fitzgen: I am finding some very strange behavior with ISLE and I'm not sure what is going on. Take, e.g., the following runtest:
test run
target x86_64

function %select_le_i32(f32, f32) -> i32 {
block0(v0: f32, v1: f32):
    v2 = fcmp le v0, v1
    v3 = iconst.i32 1
    v4 = iconst.i32 0
    v5 = select v2, v3, v4
    return v5
}
; run: %select_le_i32(0x42.42, 0.0) == 0
; run: %select_le_i32(0.0, 0.0) == 1
; run: %select_le_i32(0x0.0, 0x42.42) == 1
; run: %select_le_i32(0x0.0, NaN) == 0

function %select_le_f32(f32, f32) -> f32 {
block0(v0: f32, v1: f32):
    v2 = fcmp le v0, v1
    v3 = f32const 0x1.0
    v4 = f32const 0x0.0
    v5 = select v2, v3, v4
    return v5
}
; run: %select_le_f32(0x42.42, 0.0) == 0x0.0
; run: %select_le_f32(0.0, 0.0) == 0x1.0
; run: %select_le_f32(0x0.0, 0x42.42) == 0x1.0
; run: %select_le_f32(0x0.0, NaN) == 0x0.0
If we run that (RUST_LOG=cranelift_codegen::machinst::compile=trace cargo run -p cranelift-tools -- test -dv scratch.clif), the second set of tests fail:
FAIL scratch.clif: run

Caused by:
    Failed test: run: %select_le_f32(0x1.090800p6, 0.0) == 0.0, actual: 0x1.000000p
The reason for this is that %xmm0 and %xmm1 are magically "flipped" when passed to the comparison instruction, ucomiss. The first function, select_le_i32, has the correct order (ucomiss %xmm0, %xmm1) whereas the second does not (ucomiss %xmm1, %xmm0). This is visible in the log statements for each (in order):
 TRACE cranelift_codegen::machinst::compile > vcode after regalloc: final version:
VCode_ShowWithRRU {{
  Entry block: 0
Block 0:
  (original IR block: block0)
  (instruction range: 0 .. 10)
  Inst 0:   pushq   %rbp
  Inst 1:   movq    %rsp, %rbp
  Inst 2:   xorl    %esi, %esi
  Inst 3:   movl    $1, %edi
  Inst 4:   ucomiss %xmm0, %xmm1
  Inst 5:   cmovnbl %edi, %esi
  Inst 6:   movq    %rsi, %rax
  Inst 7:   movq    %rbp, %rsp
  Inst 8:   popq    %rbp
  Inst 9:   ret
}}
 TRACE cranelift_codegen::machinst::compile > vcode after regalloc: final version:
VCode_ShowWithRRU {{
  Entry block: 0
Block 0:
  (original IR block: block0)
  (instruction range: 0 .. 11)
  Inst 0:   pushq   %rbp
  Inst 1:   movq    %rsp, %rbp
  Inst 2:   xorps   %xmm2, %xmm2
  Inst 3:   movl    $1065353216, %esi
  Inst 4:   movd    %esi, %xmm3
  Inst 5:   ucomiss %xmm1, %xmm0
  Inst 6:   jb $next; movss %xmm3, %xmm2; $next:
  Inst 7:   movaps  %xmm2, %xmm0
  Inst 8:   movq    %rbp, %rsp
  Inst 9:   popq    %rbp
  Inst 10:   ret
}}
In the code on this branch, both lowerings use the fpcmp helper for the comparison and the cmove helper to emit either a true CMOV in the first case or a jump version in the second. I would expect both to have the same ucomiss ordering but they do not.

I then try to determine if in fact the "less than or equal" rule is firing in both cases. Using a well-placed panic! I now see that it is not firing in the first case but is in the second (this implies that I will have to rather unintuitively flip the operands in the rule to make the lowering come out right).

But why is the rule not firing in the first case? After debugging through generated_code.rs for a while, it becomes clear that it is bailing out in the fpcmp helper. Ok, I guess the type being passed in to the helper should be the comparison type, not the select type. [...spends time trying to figure out how to grab that type]. I guess first that has_type wraps def_inst--nope. Does it wrap the fcmp itself? Nope. How about wrapping one of fcmp's operands? That fails with an Unknown variable 'a'.

Any help?

Wasmtime GitHub notifications bot (Jan 18 2022 at 17:50):

fitzgen commented on issue #3682:

Ok, I guess the type being passed in to the helper should be the comparison type, not the select type. [...spends time trying to figure out how to grab that type].

You can use value_type to get the type of a value, e.g. an operand to the instruction you are lowering:
(rule (lower (select val @ (value_type val_ty) ...))
      ...)

Wasmtime GitHub notifications bot (Jan 18 2022 at 17:51):

fitzgen edited a comment on issue #3682:

Ok, I guess the type being passed in to the helper should be the comparison type, not the select type. [...spends time trying to figure out how to grab that type].

You can use value_type to extract the type of a value, e.g. an operand to the instruction you are lowering:
(rule (lower (select val @ (value_type val_ty) ...))
      ...)

Wasmtime GitHub notifications bot (Jan 29 2022 at 01:46):

abrown commented on issue #3682:

@fitzgen, @cfallin: this should be ready for review. The lower.isle code is a lot more readable now but I suspect there are better ways to do things; concretely, to somehow avoid the "use values instead of regs" pattern and the hard-coding in the i128 versions of cmove.

I've also opened https://github.com/bytecodealliance/wasmtime/issues/3743 and https://github.com/bytecodealliance/wasmtime/issues/3744 as a result of this work.

Wasmtime GitHub notifications bot (Jan 29 2022 at 01:51):

abrown edited a comment on issue #3682:

@fitzgen, @cfallin: this should be ready for review. The lower.isle code is a lot more readable now but I suspect there are better ways to do things; concretely, to somehow avoid the "use values instead of regs" pattern and the hard-coding in the i128 versions of cmove.

I've also opened https://github.com/bytecodealliance/wasmtime/issues/3743 and https://github.com/bytecodealliance/wasmtime/issues/3744 as a result of this work. Work that is still TODO but could probably go in separate PRs is to translate the other side of the select, the integer side, to ISLE and upgrade the CLIF fuzzer to generate these kinds of instructions (it only emits simple numerics now).

Wasmtime GitHub notifications bot (Feb 01 2022 at 21:20):

abrown edited a comment on issue #3682:

This is a complex lowering due to not only the tree-matching of the CLIF IR but also the sequences required by x64. Because of this the PR will need to go through a few more changes:

[x] add a way to emit a sequence like the following for FloatCC.Equal and FloatCC.NotEqual: UNORD* + MOV + JNP + MOV + JZ + MOV--because this sequence must reuse the same flags (and avoid modifications to them, @cfallin suggested implementing a new Inst::XmmCmoveOr (but that also implies building a new Inst::CmoveOr as well)

[ ] add a way to emit conditional moves for multiple register values

[ ] implement the integer comparison side of these instructions (both the tree-matching version and the non-tree-matching versions)

I'm putting this PR up as a draft to see if I can get some help with the above and early feedback on any suggested refactoring to the rules.

Wasmtime GitHub notifications bot (Feb 01 2022 at 21:20):

abrown edited a comment on issue #3682:

This is a complex lowering due to not only the tree-matching of the CLIF IR but also the sequences required by x64. Because of this the PR will need to go through a few more changes:

[x] add a way to emit a sequence like the following for FloatCC.Equal and FloatCC.NotEqual: UNORD* + MOV + JNP + MOV + JZ + MOV--because this sequence must reuse the same flags (and avoid modifications to them, @cfallin suggested implementing a new Inst::XmmCmoveOr (but that also implies building a new Inst::CmoveOr as well)

[x] add a way to emit conditional moves for multiple register values

[ ] implement the integer comparison side of these instructions (both the tree-matching version and the non-tree-matching versions)

I'm putting this PR up as a draft to see if I can get some help with the above and early feedback on any suggested refactoring to the rules.

Wasmtime GitHub notifications bot (Feb 01 2022 at 21:35):

abrown commented on issue #3682:

How confident are you in these lowerings? We talked before about getting the clif interpreter and fuzzer working for f32/f64 constants, fcmp, and select. Should we do that before landing these changes?

I would vote "no." ISLE basically creates a conflict for every rebased commit requiring an ISLE rebuild and Git twiddling to ensure things are right before moving on. For those of use who try to keep a clean history with atomic commits, this additional burden makes keeping a long-running PR like this pretty painful. As you can see, I'm now just committing review patches in hopes that I can do one final rebase before merging.

Also, the CLIF interpreter has not been extended for any of the other ISLE lowerings so doing it here is extra work that no other PRs have had to do. I don't mind doing this kind of thing but the "build up the infrastructure before merging" tasks are starting to pile up: bringing FloatCC to ISLE (which turned into bringing all enums to ISLE), create a new machinst variant for CmoveOr, create ISLE helpers for various things (cmove, FP comparison, register class checks), create CLIF run tests for select, find and report bugs with extractors, and (just recently added) figure out a way to express that two instructions consume the same flags. So I was hoping to not have to build up a fuzz target to get this merged.

Wasmtime GitHub notifications bot (Feb 01 2022 at 21:41):

cfallin commented on issue #3682:

@abrown @fitzgen from my own perspective at least, I think it's fair to merge this at its current state and take some followup PRs/tasks -- I agree with @abrown that this pretty much maintains our status-quo in the sense that it passes existing tests. Deferring to @fitzgen on the detailed review discussions above of course...

Wasmtime GitHub notifications bot (Feb 18 2022 at 21:38):

abrown commented on issue #3682:

Not sure why aarch64 and s390x CLIF tests are failing--they shouldn't be running!

Last updated: Apr 18 2025 at 14:03 UTC