alexcrichton opened PR #5976 from aarch64-constant-shift
to main
:
This commit adds special lowering rules for
vector-shifts-by-constant-amounts to use dedicated instructions which cuts down on the codegen here quite a bit for constant values.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Orthogonal to your changes here but I'm surprised we haven't moved from inline constants we branch around to the
VCodeConstant
infrastructure; I guess I am confusing this with the x64 backend, where we do have that. Using the end-of-func constant pool here as well might be a nice improvement for a followup PR, especially if the data lands in the middle of an inner loop and pushes the body into another icache line or fetch cycle...
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Agreed! The fallback for
shuffle
does an inline load like this as well, and I'm curious enough to hopefully try to give this a stab in the near future
alexcrichton updated PR #5976 from aarch64-constant-shift
to main
.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
I might be missing something obvious but is there any reason we can't match this left-shift-by-0 (when masked) in the same way and pass the value straight through?
alexcrichton submitted PR review.
alexcrichton created PR review comment:
This seems to me like something ideally left to an egraph-style optimization which removes the shift before codegen, and I ideally wanted to do the same thing for the other lowerings but it was just the restrictions on arm64 instruction incoded which prevented ignoring the shift-right-by-zero case. That being said I may as well add it to be consistent with the other rules!
alexcrichton updated PR #5976 from aarch64-constant-shift
to main
.
alexcrichton has enabled auto merge for PR #5976.
cfallin submitted PR review.
cfallin created PR review comment:
Ah, yes, that's true -- it's a reasonable algebraic simplification that we should include in the mid-end as well (might increase reach of other opts etc).
jameysharp submitted PR review.
jameysharp created PR review comment:
I'm confused. We do have egraph rules for removing shifts and rotates by 0. Are they not firing here? Are these filetests running with optimization disabled, maybe?
If you need to match on shifts by zero to avoid generating illegal instructions in a particular backend, that's important since we don't necessarily run the egraph pass. But I would be inclined not to bother special-casing anything in the backend that isn't necessary for correctness if it can be done in the egraph pass.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
The shifts here are by the full bit width, e.g. a shift by 8 for
i8x16
, which I don't think is accounted for in the egraph rules (but I just checked to see the checks for 0-shifts, so this would "just" be an extension of that rule). I don't know whether filetests run with or without optimizations, but I would guess without.Avoiding the 0-case is required for the shift-right cases for correctness since it looks like right-shifts with zero-width can't be encoded as such an instruction (I think this has to do with how right-shift is a negative left-shift on AArch64 or something like that, so maybe this is a too-strict assertion somewhere, not sure, I didn't dig deeply enough to go that far).
I went ahead and added the case here as requested for shift-lefts even though it isn't strictly necessary as required by the backend. In some sense though it does sort of feel like if someone reads that rule it depends on the time of day as to whether they would judge whether or not the rule should be included, so in that sense I don't know whether to leave it in or not as it's not obviously correct nor obviously incorrect.
cfallin submitted PR review.
cfallin created PR review comment:
I agree with @jameysharp in general that we shouldn't pull too many opts into lowering, because of cost in non-optimizing mode and because they can't interact with the main rewriting fixpoint when we're already here. I don't feel too strongly about this case; I guess I would note that it's a nice symmetry with the right-shift case here (which is why it popped out at me originally). On the other hand, we don't special-case
x + 0
in the backends, nor should we. So... I'll happily hand the baton to @jameysharp for final judgment here and only request that we extend the mid-end rule to understand masking and cover this case.
jameysharp submitted PR review.
jameysharp created PR review comment:
Well, if I have the baton…
- I think our general principle should be that algebraic simplifications should not be in backend rules. We should save our backend complexity budget for special cases where we can select a better instruction sequence due to target-specific features.
- I don't feel strongly enough about this principle to block merging this PR. I'd lean toward removing the special case for left-shifts but you're welcome to merge as-is if you want to on the basis of @cfallin's r+.
We'll have to figure out how best to handle the wrapping semantics of shifts in the mid-end rules. We should review any existing mid-end rules that match on constant shift amounts to make sure they're handling this correctly, too.
alexcrichton updated PR #5976 from aarch64-constant-shift
to main
.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Works for me! I've removed the left-shift case and rebased as well :+1:
alexcrichton has enabled auto merge for PR #5976.
alexcrichton updated PR #5976 from aarch64-constant-shift
to main
.
alexcrichton has enabled auto merge for PR #5976.
alexcrichton merged PR #5976.
Last updated: Dec 23 2024 at 13:07 UTC