abrown commented on issue #3682:
This is a complex lowering due to not only the tree-matching of the CLIF IR but also the sequences required by x64. Because of this the PR will need to go through a few more changes:
- [ ] add a way to emit a sequence like the following for
FloatCC.Equal
andFloatCC.NotEqual
:UNORD* + MOV + JNP + MOV + JZ + MOV
--because this sequence must reuse the same flags (and avoid modifications to them, @cfallin suggested implementing a newInst::XmmCmoveOr
(but that also implies building a newInst::Cmove
as well)- [ ] add a way to emit conditional moves for multiple register values
- [ ] implement the integer comparison side of these instructions (both the tree-matching version and the non-tree-matching versions)
I'm putting this PR up as a draft to see if I can get some help with the above and early feedback on any suggested refactoring to the rules.
abrown edited a comment on issue #3682:
This is a complex lowering due to not only the tree-matching of the CLIF IR but also the sequences required by x64. Because of this the PR will need to go through a few more changes:
- [ ] add a way to emit a sequence like the following for
FloatCC.Equal
andFloatCC.NotEqual
:UNORD* + MOV + JNP + MOV + JZ + MOV
--because this sequence must reuse the same flags (and avoid modifications to them, @cfallin suggested implementing a newInst::XmmCmoveOr
(but that also implies building a newInst::CmoveOr
as well)- [ ] add a way to emit conditional moves for multiple register values
- [ ] implement the integer comparison side of these instructions (both the tree-matching version and the non-tree-matching versions)
I'm putting this PR up as a draft to see if I can get some help with the above and early feedback on any suggested refactoring to the rules.
github-actions[bot] commented on issue #3682:
Subscribe to Label Action
cc @cfallin, @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "isle"Thus the following users have been cc'd because of the following labels:
- cfallin: isle
- fitzgen: isle
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
abrown commented on issue #3682:
@cfallin, @fitzgen: I am finding some very strange behavior with ISLE and I'm not sure what is going on. Take, e.g., the following runtest:
test run target x86_64 function %select_le_i32(f32, f32) -> i32 { block0(v0: f32, v1: f32): v2 = fcmp le v0, v1 v3 = iconst.i32 1 v4 = iconst.i32 0 v5 = select v2, v3, v4 return v5 } ; run: %select_le_i32(0x42.42, 0.0) == 0 ; run: %select_le_i32(0.0, 0.0) == 1 ; run: %select_le_i32(0x0.0, 0x42.42) == 1 ; run: %select_le_i32(0x0.0, NaN) == 0 function %select_le_f32(f32, f32) -> f32 { block0(v0: f32, v1: f32): v2 = fcmp le v0, v1 v3 = f32const 0x1.0 v4 = f32const 0x0.0 v5 = select v2, v3, v4 return v5 } ; run: %select_le_f32(0x42.42, 0.0) == 0x0.0 ; run: %select_le_f32(0.0, 0.0) == 0x1.0 ; run: %select_le_f32(0x0.0, 0x42.42) == 0x1.0 ; run: %select_le_f32(0x0.0, NaN) == 0x0.0
If we run that (
RUST_LOG=cranelift_codegen::machinst::compile=trace cargo run -p cranelift-tools -- test -dv scratch.clif
), the second set of tests fail:FAIL scratch.clif: run Caused by: Failed test: run: %select_le_f32(0x1.090800p6, 0.0) == 0.0, actual: 0x1.000000p
The reason for this is that
%xmm0
and%xmm1
are magically "flipped" when passed to the comparison instruction,ucomiss
. The first function,select_le_i32
, has the correct order (ucomiss %xmm0, %xmm1
) whereas the second does not (ucomiss %xmm1, %xmm0
). This is visible in the log statements for each (in order):TRACE cranelift_codegen::machinst::compile > vcode after regalloc: final version: VCode_ShowWithRRU {{ Entry block: 0 Block 0: (original IR block: block0) (instruction range: 0 .. 10) Inst 0: pushq %rbp Inst 1: movq %rsp, %rbp Inst 2: xorl %esi, %esi Inst 3: movl $1, %edi Inst 4: ucomiss %xmm0, %xmm1 Inst 5: cmovnbl %edi, %esi Inst 6: movq %rsi, %rax Inst 7: movq %rbp, %rsp Inst 8: popq %rbp Inst 9: ret }}
TRACE cranelift_codegen::machinst::compile > vcode after regalloc: final version: VCode_ShowWithRRU {{ Entry block: 0 Block 0: (original IR block: block0) (instruction range: 0 .. 11) Inst 0: pushq %rbp Inst 1: movq %rsp, %rbp Inst 2: xorps %xmm2, %xmm2 Inst 3: movl $1065353216, %esi Inst 4: movd %esi, %xmm3 Inst 5: ucomiss %xmm1, %xmm0 Inst 6: jb $next; movss %xmm3, %xmm2; $next: Inst 7: movaps %xmm2, %xmm0 Inst 8: movq %rbp, %rsp Inst 9: popq %rbp Inst 10: ret }}
In the code on this branch, both lowerings use the
fpcmp
helper for the comparison and thecmove
helper to emit either a trueCMOV
in the first case or a jump version in the second. I would expect both to have the sameucomiss
ordering but they do not.I then try to determine if in fact the "less than or equal" rule is firing in both cases. Using a well-placed
panic!
I now see that it is not firing in the first case but is in the second (this implies that I will have to rather unintuitively flip the operands in the rule to make the lowering come out right).But why is the rule not firing in the first case? After debugging through
generated_code.rs
for a while, it becomes clear that it is bailing out in thefpcmp
helper. Ok, I guess the type being passed in to the helper should be the comparison type, not the select type. [...spends time trying to figure out how to grab that type]. I guess first thathas_type
wrapsdef_inst
--nope. Does it wrap thefcmp
itself? Nope. How about wrapping one offcmp
's operands? That fails with anUnknown variable 'a'
.Any help?
fitzgen commented on issue #3682:
Ok, I guess the type being passed in to the helper should be the comparison type, not the select type. [...spends time trying to figure out how to grab that type].
You can use
value_type
to get the type of a value, e.g. an operand to the instruction you are lowering:(rule (lower (select val @ (value_type val_ty) ...)) ...)
fitzgen edited a comment on issue #3682:
Ok, I guess the type being passed in to the helper should be the comparison type, not the select type. [...spends time trying to figure out how to grab that type].
You can use
value_type
to extract the type of a value, e.g. an operand to the instruction you are lowering:(rule (lower (select val @ (value_type val_ty) ...)) ...)
abrown commented on issue #3682:
@fitzgen, @cfallin: this should be ready for review. The
lower.isle
code is a lot more readable now but I suspect there are better ways to do things; concretely, to somehow avoid the "use values instead of regs" pattern and the hard-coding in thei128
versions ofcmove
.I've also opened https://github.com/bytecodealliance/wasmtime/issues/3743 and https://github.com/bytecodealliance/wasmtime/issues/3744 as a result of this work.
abrown edited a comment on issue #3682:
@fitzgen, @cfallin: this should be ready for review. The
lower.isle
code is a lot more readable now but I suspect there are better ways to do things; concretely, to somehow avoid the "use values instead of regs" pattern and the hard-coding in thei128
versions ofcmove
.I've also opened https://github.com/bytecodealliance/wasmtime/issues/3743 and https://github.com/bytecodealliance/wasmtime/issues/3744 as a result of this work. Work that is still TODO but could probably go in separate PRs is to translate the other side of the
select
, the integer side, to ISLE and upgrade the CLIF fuzzer to generate these kinds of instructions (it only emits simple numerics now).
abrown edited a comment on issue #3682:
This is a complex lowering due to not only the tree-matching of the CLIF IR but also the sequences required by x64. Because of this the PR will need to go through a few more changes:
- [x] add a way to emit a sequence like the following for
FloatCC.Equal
andFloatCC.NotEqual
:UNORD* + MOV + JNP + MOV + JZ + MOV
--because this sequence must reuse the same flags (and avoid modifications to them, @cfallin suggested implementing a newInst::XmmCmoveOr
(but that also implies building a newInst::CmoveOr
as well)- [ ] add a way to emit conditional moves for multiple register values
- [ ] implement the integer comparison side of these instructions (both the tree-matching version and the non-tree-matching versions)
I'm putting this PR up as a draft to see if I can get some help with the above and early feedback on any suggested refactoring to the rules.
abrown edited a comment on issue #3682:
This is a complex lowering due to not only the tree-matching of the CLIF IR but also the sequences required by x64. Because of this the PR will need to go through a few more changes:
- [x] add a way to emit a sequence like the following for
FloatCC.Equal
andFloatCC.NotEqual
:UNORD* + MOV + JNP + MOV + JZ + MOV
--because this sequence must reuse the same flags (and avoid modifications to them, @cfallin suggested implementing a newInst::XmmCmoveOr
(but that also implies building a newInst::CmoveOr
as well)- [x] add a way to emit conditional moves for multiple register values
- [ ] implement the integer comparison side of these instructions (both the tree-matching version and the non-tree-matching versions)
I'm putting this PR up as a draft to see if I can get some help with the above and early feedback on any suggested refactoring to the rules.
abrown commented on issue #3682:
How confident are you in these lowerings? We talked before about getting the clif interpreter and fuzzer working for f32/f64 constants, fcmp, and select. Should we do that before landing these changes?
I would vote "no." ISLE basically creates a conflict for every rebased commit requiring an ISLE rebuild and Git twiddling to ensure things are right before moving on. For those of use who try to keep a clean history with atomic commits, this additional burden makes keeping a long-running PR like this pretty painful. As you can see, I'm now just committing review patches in hopes that I can do one final rebase before merging.
Also, the CLIF interpreter has not been extended for any of the other ISLE lowerings so doing it here is extra work that no other PRs have had to do. I don't mind doing this kind of thing but the "build up the infrastructure before merging" tasks are starting to pile up: bringing
FloatCC
to ISLE (which turned into bringing all enums to ISLE), create a new machinst variant forCmoveOr
, create ISLE helpers for various things (cmove, FP comparison, register class checks), create CLIF run tests forselect
, find and report bugs with extractors, and (just recently added) figure out a way to express that two instructions consume the same flags. So I was hoping to not have to build up a fuzz target to get this merged.
cfallin commented on issue #3682:
@abrown @fitzgen from my own perspective at least, I think it's fair to merge this at its current state and take some followup PRs/tasks -- I agree with @abrown that this pretty much maintains our status-quo in the sense that it passes existing tests. Deferring to @fitzgen on the detailed review discussions above of course...
abrown commented on issue #3682:
Not sure why aarch64 and s390x CLIF tests are failing--they shouldn't be running!
Last updated: Dec 23 2024 at 12:05 UTC