afonso360 opened PR #6112 from riscv-icmp
to main
:
:wave: Hey,
This PR improves our current lowerings for
icmp
and moves them to ISLE. We currently emit a 4 instruction sequence that moves 1 or 0 to a register based on a branch. This PR changes the lowerings to use the dedicatedicmp
instructions.These are:
| Instruction| Description |
| ------------ | ------------ |
|slt rd, rs1, rs2
| Set Less Than |
|sltu rd, rs1, rs2
| Set Less Than Unsigned |
|slti rd, rs1, imm
| Set Less Than (Immediate) |
|sltu rd, rs1, imm
| Set Less Than Unsigned (Immediate) |All other
IntCC
's are handled with some variation of the above. Additionally I've also added some optimizations when the comparision is done with an immediate.This has been fuzzing for most of today (~8h) without any issues.
afonso360 submitted PR review.
afonso360 created PR review comment:
I'm not sure this is the best way of dealing with this. I'd ideally like to do something that would solve this for all users of
Imm12
, but I can't think of a way to do that without introducing a bunch of unnecessary details everywhere.The testcase that raised this point was
runtests/icmp.imm
(cc: #3056):; This test is also a regression test for aarch64. ; We were not correctly handling the fact that the rhs constant value ; overflows its type when viewed as a signed value, and thus encoding the wrong ; value into the resulting instruction. function %overflow_rhs_const(i8) -> i8 { block0(v0: i8): v1 = iconst.i8 192 v2 = icmp sge v0, v1 return v2 } ; run: %overflow_rhs_const(49) == 1 ; run: %overflow_rhs_const(-65) == 0
afonso360 edited PR review comment.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
It seems like the basic primitive we want here is sign extension? I agree that special-casing i8 here doesn't feel right otherwise.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
Would it make sense to reuse the
ExtendOp
enum instead of introducing aSignedness
enum? If you have a preference, I could easily be convinced either way.
jameysharp created PR review comment:
Why does comparing the low half always use unsigned comparisons? Is that correct? If so I think a comment explaining why would be nice.
jameysharp created PR review comment:
I guess I128 could be handled without a pseudo-instruction if you use
SelectReg
or something. Those are also pseudo-instructions hiding branches but at least they're of more general use.That would be something like this, right?
(gen_select_reg IntCC.Equal x_hi y_hi (rv_slt x_lo y_lo) (rv_slt x_hi y_hi)
I guess that evaluates both
slt
instructions unconditionally, so it implements I128 comparisons in a way that is slightly slower than your version. Maybe that performance benefit isn't worth having a dedicated pseudo-instruction though. What do you think?
jameysharp created PR review comment:
I think the pattern we should always use is to either sign-extend or zero-extend constants according to how they're being used. We have
i64_sextend_imm64
andu64_uextend_imm64
constructors for this. I guess you haveImm12
at this point but maybe you should change that?
jameysharp created PR review comment:
I'd kind of prefer not to add more magic predicates on
IntCC
. In the just-merged PR #6095 we used bitfield representations ofIntCC
to avoid this. Maybe you can do something similar here?
jameysharp created PR review comment:
Just to check: If the midend rules to flip constants to the right-hand side don't fire, perhaps because the egraphs pass doesn't get run, after this PR the backend will still produce correct code, right? It just won't be as efficient as it could be?
afonso360 submitted PR review.
afonso360 created PR review comment:
Yes, exactly! Most of our runtests run in that mode. However I don't know how many have LHS constants.
afonso360 submitted PR review.
afonso360 created PR review comment:
It felt a bit weird using the
ExtendOp
for something that is not an extension. But I don't really mind either way.
jameysharp submitted PR review.
jameysharp created PR review comment:
If you decide to reuse
SelectReg
instead of adding a new pseudo-instruction then I suppose this question doesn't matter. Otherwise, feel free to leave theSignedness
enum as-is.
jameysharp submitted PR review.
jameysharp created PR review comment:
While looking today at other discussions about commutativity, I remembered that I don't expect this to be reliable even when the egraph pass runs.
Those mid-end rules declare that both commutative alternatives are equivalent. But the cost function will assign the same cost to both choices, since they're the same opcode and use the same operands.
When two choices have equal costs I can't guarantee anything about which one the egraph extraction phase will choose and send on to the backend for lowering.
afonso360 submitted PR review.
afonso360 created PR review comment:
Makes sense, I'll add both versions of this rule. Thanks for looking into it!
afonso360 requested fitzgen for a review on PR #6112.
afonso360 requested wasmtime-compiler-reviewers for a review on PR #6112.
afonso360 updated PR #6112.
afonso360 submitted PR review.
afonso360 created PR review comment:
I've added that, let me know if it's not clear or something!
afonso360 submitted PR review.
afonso360 created PR review comment:
:+1: Its a two instruction difference, I'm not too worried about i128's and it cleans up a bunch of code! All great reasons to remove it!
afonso360 submitted PR review.
afonso360 created PR review comment:
Sorry for taking a while to get back to this! I've played around a bunch with this, I think I found something that I like along these lines.
afonso360 submitted PR review.
afonso360 created PR review comment:
Hey, I tried to get this working and keeping an extractor interface for these, but I think I might be missing something about how extractors work. Here's the outlines of what I tried (Here's the full commit https://github.com/afonso360/wasmtime/commit/4f118c37cd2e95b6507cde18d7008ad9bd3165f1):
(decl pure intcc_bit (IntCC) u64) (rule ...) ;; Returns true if the first IntCC is equal to any of the following IntCCs. (decl pure partial intcc_any2 (IntCC IntCC IntCC) bool) (rule ...) (decl pure partial intcc_sle_or_ule (IntCC) IntCC) (extractor (intcc_sle_or_ule cc) (if (intcc_any2 cc (IntCC.SignedLessThanOrEqual) (IntCC.UnsignedLessThanOrEqual))) cc)
Declaring
intcc_sle_or_ule
like this is invalid, but I don't really understand why? I would expect it to work. I think I'm just not understanding how extractors work.I tried going back to the original RFC, but that didn't really help. Is there any other documentation about this?
afonso360 updated PR #6112.
afonso360 updated PR #6112.
jameysharp submitted PR review.
jameysharp created PR review comment:
Oh, I think internal extractors don't support
if
/if-let
. They act like macros. In this example, anywhere that you use(intcc_sle_or_ule foo)
, it's expanded in-place to the extractor's right-hand side, withcc
replaced byfoo
. Sinceif-let
isn't valid inside a pattern, it doesn't work here.What you're trying to write is an or-pattern, which I really want to add ISLE support for, but it's not there yet. However, internal constructors can simulate or-patterns in many cases, so if you're willing to move the uses of
intcc_sle_or_ule
to anif-let
in every rule, that could be an option.It's not the end of the world to add more external extractors. I just want to avoid it if there's a good way to do so. Maybe there isn't in this case.
afonso360 updated PR #6112.
jameysharp submitted PR review.
jameysharp created PR review comment:
Oh, I'm sorry I wasn't clear, but this isn't what I meant.
Before I get into what I did mean, I'll mention there's an existing external constructor called
intcc_unsigned
that turns aSigned
condition code into itsUnsigned
equivalent. So most of these conditions can be written inline without introducing any new helper terms, like this:(if-let (IntCC.UnsignedLessThanOrEqual) (intcc_unsigned cc))
I think that's the clearest way to express those cases, instead of
intcc_sle_or_ule
etc. Then you can deleteintcc_bit
andintcc_any2
.When you do need to match on one of several options, such as with matching either
>
or>=
forintcc_greater_than
, and if you're willing to use a constructor instead of an extractor, you can write this:;; A constructor that maches any *GreaterThan or *GreaterThanOrEqual IntCC (decl pure partial intcc_greater_than (IntCC) Unit) (rule (intcc_greater_than (IntCC.SignedGreaterThan)) unit) (rule (intcc_greater_than (IntCC.UnsignedGreaterThan)) unit) (rule (intcc_greater_than (IntCC.SignedGreaterThanOrEqual)) unit) (rule (intcc_greater_than (IntCC.UnsignedGreaterThanOrEqual)) unit)
I've written this to return
()
instead ofbool
because it's declaredpartial
. Apartial
term returningbool
actually returnsOption<bool>
, which I think is likely to lead to confusion. Alternatively, you could remove thepartial
flag and add a wildcard rule to this term at priority -1 that returns$false
. I'm happy with either of those choices.I've put more thought into the other alternative I suggested, namely, the bitfield trick we used in #6095. I don't think it's worth the confusion it would cause here. It would give us a way to describe the cases in terms of decompositions of the condition code. But it ends up being just as many cases, and the cases are more complicated to reason about because bitfields make anyone's head hurt.
afonso360 updated PR #6112.
afonso360 submitted PR review.
afonso360 created PR review comment:
Sorry I didn't get that first time around! That makes a lot more sense, thanks for explaining it! :heart:️
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
Similarly, I think this rule is wrong if adding 1 overflows and
ty
isI64
.
jameysharp created PR review comment:
Adding 1 to the constant may overflow. It isn't immediately obvious to me that this rule is always equivalent in that case.
If the constant is the maximum value for the given
ty
and the signedness ofcc
, andcc
is*GreaterThan
, then for allx
thisicmp
must return false.If we add 1 modulo the width of
ty
, then the result wraps around to the minimum forty
/cc
. Then for allx
,!(x < min)
will be true. Which would be wrong.However this doesn't add modulo the width of
ty
. Instead, it's a signed 64-bit addition. Ifty
is narrower than 64-bit, thenx
is always less than the result (whether it's compared signed or unsigned), so the inverse of that result is always false, which is correct.If
ty
isI64
andcc
isSignedGreaterThan
, then the signed 64-bit addition overflows. Then checking signed less-than will be false, the final result will be true, so the rule is incorrect.If
ty
isI64
andcc
isUnsignedGreaterThan
, then the signed 64-bit addition does not overflow, but interpreting it as unsigned does overflow. Then checking unsigned less-than is false, etc.So I think this rule is wrong in case of overflow when
ty
isI64
, isn't it?
jameysharp created PR review comment:
Looks like some trailing whitespace got added here. You can use
git diff --check
to check a range of commits for any whitespace issues like that.
jameysharp created PR review comment:
Why is sign-extending the constant the right thing to do for an unsigned comparison?
jameysharp created PR review comment:
Given that the operands were just extended to I64, should
gen_icmp_inner
be called with$I64
instead ofty
?
afonso360 submitted PR review.
afonso360 created PR review comment:
Yeah, this case used to be covered when we used Imm12 instead of i64. Since
imm12_add
is partial we would have that automatically checked for us. I forgot to check for that when changing it to i64.
afonso360 closed without merge PR #6112.
Last updated: Dec 23 2024 at 12:05 UTC