cfallin edited PR #4080 from x64-amodes
to main
:
This draft PR modifies the x64 backend's addressing-mode lowering to use both scaled indexing, e.g.
[rax+rbx*4]
, and to consolidate offsets, sox+y+8+12
can become [rax+rbx+20]`.Unfortunately this doesn't yet appear to be quite enough to improve on this really annoying sequence that occurs frequently in SpiderMonkey.wasm's hot blocks:
0xC064911: movq %rsi,%rdx 0xC064914: addl $3, %edx 0xC064917: movzbq 0(%r9,%rdx),%rdx
the issue being that
edx+3
, as a 32-bit add, can wrap around and so this is not quite equivalent to3(%r9, %rsi)
(assuming we could prove rsi is zero-extended). @abrown or others, if you have ideas on a better lowering here, I'd be happy to hear them!This PR builds on top of #4072, #4078, and #4079; only the last commit is new.
abrown submitted PR review.
abrown submitted PR review.
abrown created PR review comment:
I think the rules following this would benefit from a section similar to the one just removed. I don't necessarily mean pseudo-code, I mean a section describing the high-level transformations being made, why they're being made, what limitations are involved, etc. For example, I don't completely understand why we need
amode_initial
to kick the transformation off with aninvalid_reg
. And I don't really understand why the priorities are the way they are--what I mean is that the logic doesn't quite jump off the page when reading the rules. Perhaps some better documentation here would be helpful?
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
At this point, we might want to just make the ISLE definition of
Amode
the canonical one (like we do forInst
) and then import it from ISLE where it is used elsewhere, rather than have duplicate definitions.
fitzgen created PR review comment:
Don't we (didn't we?) have a
is_invalid
method or something like that onReg
? It seems like an abstraction violation to be talking about sentinels outside of the type definition. (e.g. what if there was more than one bit pattern forReg
that is invalid?)If we don't have that method anymore, can we add it and use it here? It would also make this function one line shorter since we wouldn't need the
use
import.
fitzgen created PR review comment:
Nitpick: align the pattern arguments:
(iconst (simm32 c)))
fitzgen created PR review comment:
Ditto here and more below.
fitzgen created PR review comment:
While you're messing with the DSL, it would be nice to be able to override variables rather than pretend we're erlang :-p
cfallin updated PR #4080 from x64-amodes
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Done. We still need to import the trait (it's a provided trait method) but we don't depend on the "equals the one sentinel value" assumption anymore.
cfallin submitted PR review.
cfallin created PR review comment:
Done, this was me adding priorities and not re-indenting, sorry!
cfallin submitted PR review.
cfallin created PR review comment:
Fixed.
cfallin submitted PR review.
cfallin created PR review comment:
I agree it'd be nice! Looks like we have a tracking issue (#3526) already so I'll resolve this comment for now...
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
I added a bunch of comments here; hopefully it's clearer! Let me know if anything is still unclear -- I'll wait for your r+ as well before merging.
abrown submitted PR review.
abrown created PR review comment:
;; load/store's built-in `Offset32` and `invalid_reg` as the
abrown submitted PR review.
abrown created PR review comment:
;; - Case 2: When we have an `ImmReg` with a valid register already,
abrown created PR review comment:
;; `amode_add`, for each operand of the `iadd`. This is what allows us
cfallin updated PR #4080 from x64-amodes
to main
.
cfallin updated PR #4080 from x64-amodes
to main
.
cfallin updated PR #4080 from x64-amodes
to main
.
cfallin updated PR #4080 from x64-amodes
to main
.
cfallin merged PR #4080.
Last updated: Nov 22 2024 at 16:03 UTC