cfallin requested elliottt for a review on PR #4830.
cfallin requested akirilov-arm for a review on PR #4830.
cfallin opened PR #4830 from aarch64-ra2-semantics
to main
:
This PR removes all uses of modify-operands in the aarch64 backend,
replacing them with reused-input operands instead. This has the nice
effect of removing a bunch of move instructions and more clearly
representing inputs and outputs.This PR also removes the explicit use of pinned vregs in the aarch64
backend, instead using fixed-register constraints on the operands when
insts or pseudo-inst sequences require certain registers.This is the second PR in the regalloc-semantics cleanup series; after
the remaining backend (s390x) and the ABI code are cleaned up as well,
we'll be able to simplify the regalloc2 frontend.<!--
Please ensure that the following steps are all taken care of before submitting
the PR.
[ ] This has been discussed in issue #..., or if not, please tell us why
here.[ ] A short description of what this does, why it is needed; if the
description becomes long, the matter should probably be discussed in an issue
first.[ ] This PR contains test cases, if meaningful.
- [ ] A reviewer from the core maintainer team has been assigned for this PR.
If you don't know who could review this, please indicate so. The list of
suggested reviewers on the right can help you.Please ensure all communication adheres to the code of conduct.
-->
akirilov-arm submitted PR review.
akirilov-arm submitted PR review.
akirilov-arm created PR review comment:
How about renaming this
Inst
variant toMovK
(while removing the latter fromMoveWideOp
) and removing this field?
akirilov-arm created PR review comment:
Similarly here - how about introducing a separate enum, say
FpuRRIModOp
, for these operations?
akirilov-arm submitted PR review.
akirilov-arm created PR review comment:
Is there a particular reason to deviate from the architecturally-defined disassembly, especially for the simple, single instruction cases? This looks really weird to anyone who is familiar with AArch64 assembly; similarly for the rest of the pretty printing cases you have modified.
This also results in a lot of noise in the diff.
akirilov-arm submitted PR review.
akirilov-arm created PR review comment:
Same as
AtomicRMWLoop
.
akirilov-arm created PR review comment:
BTW while this is unfortunately a step back (in the sense that the previous implementation prints the actual assembly code that is generated), since the operation is quite complex, I think that this simplification is acceptable.
akirilov-arm created PR review comment:
I don't really like this change because these are valid addressing modes with arbitrary registers, which admittedly we don't use at the moment. IMHO a cleaner approach is probably to introduce
SPPreIndexed
andSPPostIndexed
variants that only have an immediate parameter (hopefully on top of PR #4832 :smile:) - we can also remove the current ones, to be on the safe side.
akirilov-arm created PR review comment:
If I remember correctly, there isn't a real need to use fixed registers here, but I don't mind leaving it like this.
cfallin updated PR #4830 from aarch64-ra2-semantics
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Good idea! Done.
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
You mean the extra input register? This is how we've done it across the backends for the SSA-ification of instructions. While it looks a bit odd once regalloc is done and all registers are physical registers, the input and output sides of a "mod" can actually be different vregs (almost always are in fact) pre-regalloc and IMHO that is important to represent in the VCode printout. Otherwise we have hidden aspects of the IR that we can't see when debugging.
cfallin updated PR #4830 from aarch64-ra2-semantics
to main
.
cfallin submitted PR review.
cfallin created PR review comment:
Renamed to
SPPreIndexed
/SPPostIndexed
. Happy to eventually make it more general again; my concern here was mainly to avoid the need to include input/output registers for now, for simplicity.
cfallin submitted PR review.
cfallin created PR review comment:
The handwritten expansion is pretty gnarly and I was afraid of accidentally breaking it if I tried to generalize it -- so I left it as-is for now :-) Eventually it'd be great to find a way to expand this in a lowering rule rather than having a single pseudoinst (I need to think about the control-flow implications more!).
cfallin created PR review comment:
Indeed, would love to eventually rework this into ISLE as well.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Indeed, I was having trouble keeping it up-to-date as I was adding in the registers and chose instead to simplify it here; it felt odd to have a separate representation that wasn't checked against the actual emitted code.
akirilov-arm created PR review comment:
I see, I suppose you regard this as more of some kind of intermediate representation printout than a generated assembly printout, which is pretty much as I do, and that's why I would prefer if all these printouts are as close to the architecturally specified assembly language as possible (possibly making exceptions for complex VCode operations and things that are out of scope of the ISA spec such as how constant literals are represented), especially since the same logic is used by the CLIF compilation tests, where we are looking at the final generated code post-regalloc. However, I understand your use case, so I think it is acceptable to leave it as it is.
akirilov-arm submitted PR review.
akirilov-arm submitted PR review.
akirilov-arm created PR review comment:
TBH the LSE lowering obsoletes this, while suffering from virtually none of the issues, and resulting in a significantly smaller code size. Even though recent processors support LSE, we still need this variant, though, in case we compile for base Armv8-A hardware (e.g. Raspberry Pi 4).
akirilov-arm submitted PR review.
akirilov-arm submitted PR review.
akirilov-arm created PR review comment:
Nit - use
self.clone()
as in the match arms that follow?
akirilov-arm submitted PR review.
akirilov-arm created PR review comment:
Ditto.
cfallin created PR review comment:
Yeah, it definitely started out that way; actually early on I had been testing the aarch64 backend by feeding the vcode pretty-print into the assembler. But by now the semantics of VCode are not quite the same as machine code and I think it's more important for the textual view to show an accurate representation of the data structure than to try to match an assembler output. We also have
clif-util compile -D
that disassembles via Capstone for use-cases where we actually really want an independent verification of the machine code output :-)
cfallin submitted PR review.
cfallin updated PR #4830 from aarch64-ra2-semantics
to main
.
cfallin updated PR #4830 from aarch64-ra2-semantics
to main
.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin has enabled auto merge for PR #4830.
cfallin merged PR #4830.
Last updated: Dec 23 2024 at 12:05 UTC