alexcrichton opened PR #6008 from x64-div-in-isle
to main
:
This commit migrates most of the logic of the div/rem CLIF instructions to ISLE for the x64 backend. Previously most of the logic was encapsulated between the
emit_div_rem_seq
method and theCheckedDivOrRemSeq
pseudo-instruction. Nick and I found last week, however, that this poorly interacts with division-by-constants because despite being a constant checks were still emitted to ensure it wasn't a divide-by-zero or similar. To improve this constant case I figured it'd be a good opportunity to move things to ISLE.Unfortunately though these instructions couldn't be moved 100% to ISLE. Signed division, for example, needs to access labels/jumps to check for
INT_MIN / -1
. Signed remainder needs to not actually executeidiv
when the divisor is -1 and instead return 0 as well. This means that while the bulk of the logic now lives in ISLE but a chunk still lives inside the pseudo-instructions inemit.rs
.Specifically the changes in this PR are:
- The
CheckedDivOrRemSeq
pseudo-instruction and its helpers were removed.- The
Div
instruction was split intoDiv
andDiv8
since 8-bit division has different regalloc behavior.- New
ValidateSdivDivisor{,64}
instructions were added. The 64-bit one is different because it needs a temporary register as input.- New
CheckedSRem{,8}
instructions were added. These assume a check-for-zero has already happened but internally do branching to handle a -1 divisor. The 8-bit version is different due to different regalloc behavior.- The ISLE for x64 now special-cases div/rem by constants to elide checks if it can.
Overall despite replacing one pseudo-op with 4 I feel that this is a simplification relative to prior. The majority of the logic now resides in ISLE and the isel optimizations around constants are more obvious and easier to understand. Additionally the fiddly-bits about how idiv/div work on x64 are encoded in ISLE rules too.
I still think there's room to improve here in the long run. For example Wasmtime currently sets
avoid_div_traps=true
which isn't actually necessary. Most checks can be elided here to defer to signal handling to catch issues for x64. I hope to do this in a follow-up PR later on.
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
Want to fill out this TODO a little?
fitzgen created PR review comment:
;; Two registers are returned through `ValueRegs` where the first is the
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
s/use/used/
cfallin created PR review comment:
s/wher ethe/where the/
cfallin created PR review comment:
The
rax
here is outdated (left over from when we used pinned vregs rather then SSA temps); just "into the destination" is correct I think.
cfallin created PR review comment:
It doesn't look like these instructions were removed -- any reason the tests are (left over from older iteration maybe)?
cfallin created PR review comment:
Slightly unclear comment here -- not quite "equal to itself" but "zero flag set" (Z is only eq for cmp). Maybe just "and trap if zero"?
cfallin created PR review comment:
Let's add a comment here why: on x64, INT_MIN / -1 will cause a hardware trap, and this is a legal way to signal the out-of-bounds error with
avoid_div_traps=false
.
cfallin created PR review comment:
It's a little thing, but the
bool
here got fairly confusing to read in uses of this rule elsewhere, out of context -- can we define a little helper enumDivSignedness
or similar and use that in the inst?
alexcrichton created PR review comment:
Ah yeah this ended up going away in a later commit (was sort of figuring out my way through things as I went along)
alexcrichton submitted PR review.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Oh I forgot to update this, but I'll leave it as rax/rdx since this is still using fixed registers for the
idiv
instruction it embeds. For 8-bit srem this'll be%rax
and for 16-to-64-bits this is%rdx
. It's a bit subtle where this opcode always produces the remainder (due to this branch) but only conditionally produces the quotient due to the branch here not defining the quotient. That works for the lowerings today (since only srem uses this which uses the remainder)Is there a better way to model this with regalloc constraints though? Should I remove the
dst_quotient
parameter to the instruction and inform regalloc that otherwise rax/rdx are both defined during the instruction?
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Oh for these tests I've started removing some when I get conflicts due to the capstone integration which is much easier to update and I figured served the same purpose. If you'd like though I can add them back here to be individually tested
cfallin submitted PR review.
cfallin created PR review comment:
Ah, no, now that I think of it the disassemblies we have in the precise-output tests are verifying the same thing; so removing this is fine :+1:
alexcrichton updated PR #6008 from x64-div-in-isle
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, no, the constraints are the right way; the "separate reservation of rax/rdx" was the pre-SSA methodology that carries other issues and that we've moved away from.
I guess it's just a matter of semantics; the shift in mindset has us referring to sources/dests symbolically everywhere except at the constraints, even if here at emit time it is always rax. Happy to keep the comment here though if you'd prefer it to be clearer/more explicit.
alexcrichton updated PR #6008 from x64-div-in-isle
to main
.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Ok makes sense. I've reworked the comments in this area to hopefully be a bit more precise about what's going on on both halves of this branching and why
dst_quotient
isn't initialized on one branch.
alexcrichton updated PR #6008 from x64-div-in-isle
to main
.
alexcrichton has enabled auto merge for PR #6008.
alexcrichton merged PR #6008.
Last updated: Dec 23 2024 at 12:05 UTC