afonso360 opened PR #6112 from riscv-icmp to main:
:wave: Hey,
This PR improves our current lowerings for
icmpand moves them to ISLE. We currently emit a 4 instruction sequence that moves 1 or 0 to a register based on a branch. This PR changes the lowerings to use the dedicatedicmpinstructions.These are:
| Instruction| Description |
| ------------ | ------------ |
|slt rd, rs1, rs2| Set Less Than |
|sltu rd, rs1, rs2| Set Less Than Unsigned |
|slti rd, rs1, imm| Set Less Than (Immediate) |
|sltu rd, rs1, imm| Set Less Than Unsigned (Immediate) |All other
IntCC's are handled with some variation of the above. Additionally I've also added some optimizations when the comparision is done with an immediate.This has been fuzzing for most of today (~8h) without any issues.
afonso360 submitted PR review.
afonso360 created PR review comment:
I'm not sure this is the best way of dealing with this. I'd ideally like to do something that would solve this for all users of
Imm12, but I can't think of a way to do that without introducing a bunch of unnecessary details everywhere.The testcase that raised this point was
runtests/icmp.imm(cc: #3056):; This test is also a regression test for aarch64. ; We were not correctly handling the fact that the rhs constant value ; overflows its type when viewed as a signed value, and thus encoding the wrong ; value into the resulting instruction. function %overflow_rhs_const(i8) -> i8 { block0(v0: i8): v1 = iconst.i8 192 v2 = icmp sge v0, v1 return v2 } ; run: %overflow_rhs_const(49) == 1 ; run: %overflow_rhs_const(-65) == 0
afonso360 edited PR review comment.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
It seems like the basic primitive we want here is sign extension? I agree that special-casing i8 here doesn't feel right otherwise.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
Would it make sense to reuse the
ExtendOpenum instead of introducing aSignednessenum? If you have a preference, I could easily be convinced either way.
jameysharp created PR review comment:
Why does comparing the low half always use unsigned comparisons? Is that correct? If so I think a comment explaining why would be nice.
jameysharp created PR review comment:
I guess I128 could be handled without a pseudo-instruction if you use
SelectRegor something. Those are also pseudo-instructions hiding branches but at least they're of more general use.That would be something like this, right?
(gen_select_reg IntCC.Equal x_hi y_hi (rv_slt x_lo y_lo) (rv_slt x_hi y_hi)I guess that evaluates both
sltinstructions unconditionally, so it implements I128 comparisons in a way that is slightly slower than your version. Maybe that performance benefit isn't worth having a dedicated pseudo-instruction though. What do you think?
jameysharp created PR review comment:
I think the pattern we should always use is to either sign-extend or zero-extend constants according to how they're being used. We have
i64_sextend_imm64andu64_uextend_imm64constructors for this. I guess you haveImm12at this point but maybe you should change that?
jameysharp created PR review comment:
I'd kind of prefer not to add more magic predicates on
IntCC. In the just-merged PR #6095 we used bitfield representations ofIntCCto avoid this. Maybe you can do something similar here?
jameysharp created PR review comment:
Just to check: If the midend rules to flip constants to the right-hand side don't fire, perhaps because the egraphs pass doesn't get run, after this PR the backend will still produce correct code, right? It just won't be as efficient as it could be?
afonso360 submitted PR review.
afonso360 created PR review comment:
Yes, exactly! Most of our runtests run in that mode. However I don't know how many have LHS constants.
afonso360 submitted PR review.
afonso360 created PR review comment:
It felt a bit weird using the
ExtendOpfor something that is not an extension. But I don't really mind either way.
jameysharp submitted PR review.
jameysharp created PR review comment:
If you decide to reuse
SelectReginstead of adding a new pseudo-instruction then I suppose this question doesn't matter. Otherwise, feel free to leave theSignednessenum as-is.
jameysharp submitted PR review.
jameysharp created PR review comment:
While looking today at other discussions about commutativity, I remembered that I don't expect this to be reliable even when the egraph pass runs.
Those mid-end rules declare that both commutative alternatives are equivalent. But the cost function will assign the same cost to both choices, since they're the same opcode and use the same operands.
When two choices have equal costs I can't guarantee anything about which one the egraph extraction phase will choose and send on to the backend for lowering.
afonso360 submitted PR review.
afonso360 created PR review comment:
Makes sense, I'll add both versions of this rule. Thanks for looking into it!
afonso360 requested fitzgen for a review on PR #6112.
afonso360 requested wasmtime-compiler-reviewers for a review on PR #6112.
afonso360 updated PR #6112.
afonso360 submitted PR review.
afonso360 created PR review comment:
I've added that, let me know if it's not clear or something!
afonso360 submitted PR review.
afonso360 created PR review comment:
:+1: Its a two instruction difference, I'm not too worried about i128's and it cleans up a bunch of code! All great reasons to remove it!
afonso360 submitted PR review.
afonso360 created PR review comment:
Sorry for taking a while to get back to this! I've played around a bunch with this, I think I found something that I like along these lines.
afonso360 submitted PR review.
afonso360 created PR review comment:
Hey, I tried to get this working and keeping an extractor interface for these, but I think I might be missing something about how extractors work. Here's the outlines of what I tried (Here's the full commit https://github.com/afonso360/wasmtime/commit/4f118c37cd2e95b6507cde18d7008ad9bd3165f1):
(decl pure intcc_bit (IntCC) u64) (rule ...) ;; Returns true if the first IntCC is equal to any of the following IntCCs. (decl pure partial intcc_any2 (IntCC IntCC IntCC) bool) (rule ...) (decl pure partial intcc_sle_or_ule (IntCC) IntCC) (extractor (intcc_sle_or_ule cc) (if (intcc_any2 cc (IntCC.SignedLessThanOrEqual) (IntCC.UnsignedLessThanOrEqual))) cc)Declaring
intcc_sle_or_ulelike this is invalid, but I don't really understand why? I would expect it to work. I think I'm just not understanding how extractors work.I tried going back to the original RFC, but that didn't really help. Is there any other documentation about this?
afonso360 updated PR #6112.
afonso360 updated PR #6112.
jameysharp submitted PR review.
jameysharp created PR review comment:
Oh, I think internal extractors don't support
if/if-let. They act like macros. In this example, anywhere that you use(intcc_sle_or_ule foo), it's expanded in-place to the extractor's right-hand side, withccreplaced byfoo. Sinceif-letisn't valid inside a pattern, it doesn't work here.What you're trying to write is an or-pattern, which I really want to add ISLE support for, but it's not there yet. However, internal constructors can simulate or-patterns in many cases, so if you're willing to move the uses of
intcc_sle_or_uleto anif-letin every rule, that could be an option.It's not the end of the world to add more external extractors. I just want to avoid it if there's a good way to do so. Maybe there isn't in this case.
afonso360 updated PR #6112.
jameysharp submitted PR review.
jameysharp created PR review comment:
Oh, I'm sorry I wasn't clear, but this isn't what I meant.
Before I get into what I did mean, I'll mention there's an existing external constructor called
intcc_unsignedthat turns aSignedcondition code into itsUnsignedequivalent. So most of these conditions can be written inline without introducing any new helper terms, like this:(if-let (IntCC.UnsignedLessThanOrEqual) (intcc_unsigned cc))I think that's the clearest way to express those cases, instead of
intcc_sle_or_uleetc. Then you can deleteintcc_bitandintcc_any2.When you do need to match on one of several options, such as with matching either
>or>=forintcc_greater_than, and if you're willing to use a constructor instead of an extractor, you can write this:;; A constructor that maches any *GreaterThan or *GreaterThanOrEqual IntCC (decl pure partial intcc_greater_than (IntCC) Unit) (rule (intcc_greater_than (IntCC.SignedGreaterThan)) unit) (rule (intcc_greater_than (IntCC.UnsignedGreaterThan)) unit) (rule (intcc_greater_than (IntCC.SignedGreaterThanOrEqual)) unit) (rule (intcc_greater_than (IntCC.UnsignedGreaterThanOrEqual)) unit)I've written this to return
()instead ofboolbecause it's declaredpartial. Apartialterm returningboolactually returnsOption<bool>, which I think is likely to lead to confusion. Alternatively, you could remove thepartialflag and add a wildcard rule to this term at priority -1 that returns$false. I'm happy with either of those choices.I've put more thought into the other alternative I suggested, namely, the bitfield trick we used in #6095. I don't think it's worth the confusion it would cause here. It would give us a way to describe the cases in terms of decompositions of the condition code. But it ends up being just as many cases, and the cases are more complicated to reason about because bitfields make anyone's head hurt.
afonso360 updated PR #6112.
afonso360 submitted PR review.
afonso360 created PR review comment:
Sorry I didn't get that first time around! That makes a lot more sense, thanks for explaining it! :heart:️
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
Similarly, I think this rule is wrong if adding 1 overflows and
tyisI64.
jameysharp created PR review comment:
Adding 1 to the constant may overflow. It isn't immediately obvious to me that this rule is always equivalent in that case.
If the constant is the maximum value for the given
tyand the signedness ofcc, andccis*GreaterThan, then for allxthisicmpmust return false.If we add 1 modulo the width of
ty, then the result wraps around to the minimum forty/cc. Then for allx,!(x < min)will be true. Which would be wrong.However this doesn't add modulo the width of
ty. Instead, it's a signed 64-bit addition. Iftyis narrower than 64-bit, thenxis always less than the result (whether it's compared signed or unsigned), so the inverse of that result is always false, which is correct.If
tyisI64andccisSignedGreaterThan, then the signed 64-bit addition overflows. Then checking signed less-than will be false, the final result will be true, so the rule is incorrect.If
tyisI64andccisUnsignedGreaterThan, then the signed 64-bit addition does not overflow, but interpreting it as unsigned does overflow. Then checking unsigned less-than is false, etc.So I think this rule is wrong in case of overflow when
tyisI64, isn't it?
jameysharp created PR review comment:
Looks like some trailing whitespace got added here. You can use
git diff --checkto check a range of commits for any whitespace issues like that.
jameysharp created PR review comment:
Why is sign-extending the constant the right thing to do for an unsigned comparison?
jameysharp created PR review comment:
Given that the operands were just extended to I64, should
gen_icmp_innerbe called with$I64instead ofty?
afonso360 submitted PR review.
afonso360 created PR review comment:
Yeah, this case used to be covered when we used Imm12 instead of i64. Since
imm12_addis partial we would have that automatically checked for us. I forgot to check for that when changing it to i64.
afonso360 closed without merge PR #6112.
Last updated: Dec 13 2025 at 19:03 UTC