Stream: git-wasmtime

Topic: wasmtime / PR #6112 riscv64: Improve `icmp` codegen


view this post on Zulip Wasmtime GitHub notifications bot (Mar 28 2023 at 16:36):

afonso360 opened PR #6112 from riscv-icmp to main:

:wave: Hey,

This PR improves our current lowerings for icmp and moves them to ISLE. We currently emit a 4 instruction sequence that moves 1 or 0 to a register based on a branch. This PR changes the lowerings to use the dedicated icmp instructions.

These are:
| Instruction| Description |
| ------------ | ------------ |
| slt rd, rs1, rs2 | Set Less Than |
| sltu rd, rs1, rs2 | Set Less Than Unsigned |
| slti rd, rs1, imm | Set Less Than (Immediate) |
| sltu rd, rs1, imm | Set Less Than Unsigned (Immediate) |

All other IntCC's are handled with some variation of the above. Additionally I've also added some optimizations when the comparision is done with an immediate.

This has been fuzzing for most of today (~8h) without any issues.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 28 2023 at 16:41):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 28 2023 at 16:41):

afonso360 created PR review comment:

I'm not sure this is the best way of dealing with this. I'd ideally like to do something that would solve this for all users of Imm12, but I can't think of a way to do that without introducing a bunch of unnecessary details everywhere.

The testcase that raised this point was runtests/icmp.imm (cc: #3056):

; This test is also a regression test for aarch64.
; We were not correctly handling the fact that the rhs constant value
; overflows its type when viewed as a signed value, and thus encoding the wrong
; value into the resulting instruction.
function %overflow_rhs_const(i8) -> i8 {
block0(v0: i8):
    v1 = iconst.i8 192
    v2 = icmp sge v0, v1
    return v2
}
; run: %overflow_rhs_const(49) == 1
; run: %overflow_rhs_const(-65) == 0

view this post on Zulip Wasmtime GitHub notifications bot (Mar 28 2023 at 16:41):

afonso360 edited PR review comment.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 00:35):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 00:35):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 00:35):

cfallin created PR review comment:

It seems like the basic primitive we want here is sign extension? I agree that special-casing i8 here doesn't feel right otherwise.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 00:50):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 00:50):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 00:50):

jameysharp created PR review comment:

Would it make sense to reuse the ExtendOp enum instead of introducing a Signedness enum? If you have a preference, I could easily be convinced either way.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 00:50):

jameysharp created PR review comment:

Why does comparing the low half always use unsigned comparisons? Is that correct? If so I think a comment explaining why would be nice.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 00:50):

jameysharp created PR review comment:

I guess I128 could be handled without a pseudo-instruction if you use SelectReg or something. Those are also pseudo-instructions hiding branches but at least they're of more general use.

That would be something like this, right? (gen_select_reg IntCC.Equal x_hi y_hi (rv_slt x_lo y_lo) (rv_slt x_hi y_hi)

I guess that evaluates both slt instructions unconditionally, so it implements I128 comparisons in a way that is slightly slower than your version. Maybe that performance benefit isn't worth having a dedicated pseudo-instruction though. What do you think?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 00:50):

jameysharp created PR review comment:

I think the pattern we should always use is to either sign-extend or zero-extend constants according to how they're being used. We have i64_sextend_imm64 and u64_uextend_imm64 constructors for this. I guess you have Imm12 at this point but maybe you should change that?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 00:50):

jameysharp created PR review comment:

I'd kind of prefer not to add more magic predicates on IntCC. In the just-merged PR #6095 we used bitfield representations of IntCC to avoid this. Maybe you can do something similar here?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 00:50):

jameysharp created PR review comment:

Just to check: If the midend rules to flip constants to the right-hand side don't fire, perhaps because the egraphs pass doesn't get run, after this PR the backend will still produce correct code, right? It just won't be as efficient as it could be?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 09:42):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 09:42):

afonso360 created PR review comment:

Yes, exactly! Most of our runtests run in that mode. However I don't know how many have LHS constants.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 13:53):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 13:53):

afonso360 created PR review comment:

It felt a bit weird using the ExtendOp for something that is not an extension. But I don't really mind either way.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 15:22):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 29 2023 at 15:22):

jameysharp created PR review comment:

If you decide to reuse SelectReg instead of adding a new pseudo-instruction then I suppose this question doesn't matter. Otherwise, feel free to leave the Signedness enum as-is.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 30 2023 at 22:14):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 30 2023 at 22:14):

jameysharp created PR review comment:

While looking today at other discussions about commutativity, I remembered that I don't expect this to be reliable even when the egraph pass runs.

Those mid-end rules declare that both commutative alternatives are equivalent. But the cost function will assign the same cost to both choices, since they're the same opcode and use the same operands.

When two choices have equal costs I can't guarantee anything about which one the egraph extraction phase will choose and send on to the backend for lowering.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 31 2023 at 09:04):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 31 2023 at 09:04):

afonso360 created PR review comment:

Makes sense, I'll add both versions of this rule. Thanks for looking into it!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2023 at 12:37):

afonso360 requested fitzgen for a review on PR #6112.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2023 at 12:37):

afonso360 requested wasmtime-compiler-reviewers for a review on PR #6112.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2023 at 12:37):

afonso360 updated PR #6112.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2023 at 12:37):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2023 at 12:37):

afonso360 created PR review comment:

I've added that, let me know if it's not clear or something!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2023 at 12:38):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2023 at 12:38):

afonso360 created PR review comment:

:+1: Its a two instruction difference, I'm not too worried about i128's and it cleans up a bunch of code! All great reasons to remove it!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2023 at 12:40):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2023 at 12:40):

afonso360 created PR review comment:

Sorry for taking a while to get back to this! I've played around a bunch with this, I think I found something that I like along these lines.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2023 at 12:47):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2023 at 12:47):

afonso360 created PR review comment:

Hey, I tried to get this working and keeping an extractor interface for these, but I think I might be missing something about how extractors work. Here's the outlines of what I tried (Here's the full commit https://github.com/afonso360/wasmtime/commit/4f118c37cd2e95b6507cde18d7008ad9bd3165f1):

(decl pure intcc_bit (IntCC) u64)
(rule ...)

;; Returns true if the first IntCC is equal to any of the following IntCCs.
(decl pure partial intcc_any2 (IntCC IntCC IntCC) bool)
(rule ...)

(decl pure partial intcc_sle_or_ule (IntCC) IntCC)
(extractor (intcc_sle_or_ule cc)
  (if (intcc_any2 cc (IntCC.SignedLessThanOrEqual) (IntCC.UnsignedLessThanOrEqual)))
  cc)

Declaring intcc_sle_or_ule like this is invalid, but I don't really understand why? I would expect it to work. I think I'm just not understanding how extractors work.

I tried going back to the original RFC, but that didn't really help. Is there any other documentation about this?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2023 at 12:59):

afonso360 updated PR #6112.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 07 2023 at 13:01):

afonso360 updated PR #6112.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 10 2023 at 20:20):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 10 2023 at 20:20):

jameysharp created PR review comment:

Oh, I think internal extractors don't support if/if-let. They act like macros. In this example, anywhere that you use (intcc_sle_or_ule foo), it's expanded in-place to the extractor's right-hand side, with cc replaced by foo. Since if-let isn't valid inside a pattern, it doesn't work here.

What you're trying to write is an or-pattern, which I really want to add ISLE support for, but it's not there yet. However, internal constructors can simulate or-patterns in many cases, so if you're willing to move the uses of intcc_sle_or_ule to an if-let in every rule, that could be an option.

It's not the end of the world to add more external extractors. I just want to avoid it if there's a good way to do so. Maybe there isn't in this case.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2023 at 10:20):

afonso360 updated PR #6112.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2023 at 18:02):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 11 2023 at 18:02):

jameysharp created PR review comment:

Oh, I'm sorry I wasn't clear, but this isn't what I meant.

Before I get into what I did mean, I'll mention there's an existing external constructor called intcc_unsigned that turns a Signed condition code into its Unsigned equivalent. So most of these conditions can be written inline without introducing any new helper terms, like this:

(if-let (IntCC.UnsignedLessThanOrEqual) (intcc_unsigned cc))

I think that's the clearest way to express those cases, instead of intcc_sle_or_ule etc. Then you can delete intcc_bit and intcc_any2.

When you do need to match on one of several options, such as with matching either > or >= for intcc_greater_than, and if you're willing to use a constructor instead of an extractor, you can write this:

;; A constructor that maches any *GreaterThan or *GreaterThanOrEqual IntCC
(decl pure partial intcc_greater_than (IntCC) Unit)
(rule (intcc_greater_than (IntCC.SignedGreaterThan)) unit)
(rule (intcc_greater_than (IntCC.UnsignedGreaterThan)) unit)
(rule (intcc_greater_than (IntCC.SignedGreaterThanOrEqual)) unit)
(rule (intcc_greater_than (IntCC.UnsignedGreaterThanOrEqual)) unit)

I've written this to return () instead of bool because it's declared partial. A partial term returning bool actually returns Option<bool>, which I think is likely to lead to confusion. Alternatively, you could remove the partial flag and add a wildcard rule to this term at priority -1 that returns $false. I'm happy with either of those choices.

I've put more thought into the other alternative I suggested, namely, the bitfield trick we used in #6095. I don't think it's worth the confusion it would cause here. It would give us a way to describe the cases in terms of decompositions of the condition code. But it ends up being just as many cases, and the cases are more complicated to reason about because bitfields make anyone's head hurt.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 13 2023 at 13:52):

afonso360 updated PR #6112.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 13 2023 at 13:57):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 13 2023 at 13:57):

afonso360 created PR review comment:

Sorry I didn't get that first time around! That makes a lot more sense, thanks for explaining it! :heart:

view this post on Zulip Wasmtime GitHub notifications bot (Apr 13 2023 at 20:16):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 13 2023 at 20:16):

jameysharp submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 13 2023 at 20:16):

jameysharp created PR review comment:

Similarly, I think this rule is wrong if adding 1 overflows and ty is I64.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 13 2023 at 20:16):

jameysharp created PR review comment:

Adding 1 to the constant may overflow. It isn't immediately obvious to me that this rule is always equivalent in that case.

If the constant is the maximum value for the given ty and the signedness of cc, and cc is *GreaterThan, then for all x this icmp must return false.

If we add 1 modulo the width of ty, then the result wraps around to the minimum for ty/cc. Then for all x, !(x < min) will be true. Which would be wrong.

However this doesn't add modulo the width of ty. Instead, it's a signed 64-bit addition. If ty is narrower than 64-bit, then x is always less than the result (whether it's compared signed or unsigned), so the inverse of that result is always false, which is correct.

If ty is I64 and cc is SignedGreaterThan, then the signed 64-bit addition overflows. Then checking signed less-than will be false, the final result will be true, so the rule is incorrect.

If ty is I64 and cc is UnsignedGreaterThan, then the signed 64-bit addition does not overflow, but interpreting it as unsigned does overflow. Then checking unsigned less-than is false, etc.

So I think this rule is wrong in case of overflow when ty is I64, isn't it?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 13 2023 at 20:16):

jameysharp created PR review comment:

Looks like some trailing whitespace got added here. You can use git diff --check to check a range of commits for any whitespace issues like that.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 13 2023 at 20:16):

jameysharp created PR review comment:

Why is sign-extending the constant the right thing to do for an unsigned comparison?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 13 2023 at 20:16):

jameysharp created PR review comment:

Given that the operands were just extended to I64, should gen_icmp_inner be called with $I64 instead of ty?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 13 2023 at 20:37):

afonso360 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 13 2023 at 20:37):

afonso360 created PR review comment:

Yeah, this case used to be covered when we used Imm12 instead of i64. Since imm12_add is partial we would have that automatically checked for us. I forgot to check for that when changing it to i64.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 29 2023 at 21:52):

afonso360 closed without merge PR #6112.


Last updated: Dec 23 2024 at 12:05 UTC