github-actions[bot] commented on Issue #1718:
Subscribe to Label Action
cc @bnjbvr
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:aarch64", "cranelift:area:isel", "cranelift:area:x64"Thus the following users have been cc'd because of the following labels:
- bnjbvr: cranelift
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
cfallin commented on Issue #1718:
Updated: rebased onto latest master; brought x64 backend up-to-date; updated all aarch64 filetests; and added some more tests for MachBuffer. Should be ready for review now.
bjorn3 commented on Issue #1718:
I see debug! calls in potentially hottish places, eg MachBuffer::put1(). Are we sure those become zero cost in release builds?
trace!
is probably more appropriate.
cfallin commented on Issue #1718:
Thanks a bunch for the review! Hopefully I've addressed everything. A few responses to the questions below:
There are a few comments in-line, but I have some larger semantics-level and maintainability questions here:
* > The MachBuffer also implements branch-island support. [..] * How is this tested, considering it is low probability path stuff? Given that long range jumps are only for > 1 MB on arm64, I feel like there's a bunch of paths here with very low probability of being taken. Which is a verification hazard. Is that the case? If so, how is this stuff being tested? (Later) I see tests at the bottom of machinst/buffer.rs. Are these adequate?
The tests at the end of the file are all we have for now, but I had a thought today: another way to exercise this might be to artificially turn down the allowable branch range to force "real" use of islands/veneers in plausibly-large functions, e.g. maybe 1KB or so. Then we could do the usual correctness tests with our benchmarks. I'll have to think a bit about how to automate this; it ties into the idea Dan suggested before of an "evil mode", where we change some target config option(s) to make pessimistic choices everywhere (randomize block order, lower every phi as explicit moves, clobber every callee-save intentionally, etc.) and then fuzz the heck out of things. For now, perhaps I can do this manually and run the spec testsuite?
* (Islands-and-Deadlines algorithm further comment): This is clearly a complex bit of machinery. I didn't see any single block comment explaining what the whole algorithm is. Can you add one? Not every detail, but at least some top level description.
Yup, added a big block comment -- thanks; more docs are always better!
* (revised isel driver): again, this is now more complex than it was. Question: is lookback past block boundaries still allowed? And how does this interact with the colouring machinery? My impression is that lookbacks past block boundaries are still allowed, however the colours always change across block boundaries. Hence this restricts lookbacks across boundaries to pure value trees. So it's correct. But if that analysis is correct, that's a non-obvious interaction that it would be good to document.
Yes, that's right; the predicate for allowing a lookback (which in practice means giving the instruction reference in the result of
get_input()
) is "same color, or producing op is pure". This is in the doc comment forget_input()
, but perhaps I can write more high-level docs as well. I hope to write some sort of "backend writer's guide" that communicates these principles and invariants that one needs to know, eventually...* (revised isel driver more): regarding lookbacks and colouring, I would like to see at least some implementation of movzx/movsx applied to loads, preferably in this patch, or at least very soon in a followup. This is partly to improve code quality but primarily to demonstrate that the colouring infrastructure is sound (viz, to test it to some extent).
Sure; perhaps separately from this patch, as that's part of the x64 backend work (and I'm trying to touch it as little as possible in this, to avoid stepping on other ongoing work)?
* Is it correct to understand that the new blockorder.rs maintains the old invariant that there are no dead blocks in the output? Or, at least, that the invariant is maintained by whatever means, that there are no dead blocks in the input to regalloc?
Yes, exactly: the output of
BlockLoweringOrder::new()
will never have dead blocks (even if the input CLIF does; the RPO naturally ensures this). Caught a few bugs with the regalloc's liveness checker so I'm somewhat confident this is the case now!* I see debug! calls in potentially hottish places, eg MachBuffer::put1(). Are we sure those become zero cost in release builds?
Fixed, thanks!
cfallin edited a comment on Issue #1718:
Thanks a bunch for the review! Hopefully I've addressed everything. A few responses to the questions below:
There are a few comments in-line, but I have some larger semantics-level and maintainability questions here:
The MachBuffer also implements branch-island support. [..]
- How is this tested, considering it is low probability path stuff? Given that long range jumps are only for > 1 MB on arm64, I feel like there's a bunch of paths here with very low probability of being taken. Which is a verification hazard. Is that the case? If so, how is this stuff being tested?
(Later) I see tests at the bottom of machinst/buffer.rs. Are these adequate?The tests at the end of the file are all we have for now, but I had a thought today: another way to exercise this might be to artificially turn down the allowable branch range to force "real" use of islands/veneers in plausibly-large functions, e.g. maybe 1KB or so. Then we could do the usual correctness tests with our benchmarks. I'll have to think a bit about how to automate this; it ties into the idea Dan suggested before of an "evil mode", where we change some target config option(s) to make pessimistic choices everywhere (randomize block order, lower every phi as explicit moves, clobber every callee-save intentionally, etc.) and then fuzz the heck out of things. For now, perhaps I can do this manually and run the spec testsuite?
- (Islands-and-Deadlines algorithm further comment): This is clearly a complex bit of machinery. I didn't see any single block comment explaining what the whole algorithm is. Can you add one? Not every detail, but at least some top level description.
Yup, added a big block comment -- thanks; more docs are always better!
- (revised isel driver): again, this is now more complex than it was. Question: is lookback past block boundaries still allowed? And how does this interact with the colouring machinery? My impression is that lookbacks past block boundaries are still allowed, however the colours always change across block boundaries. Hence this restricts lookbacks across boundaries to pure value trees. So it's correct. But if that analysis is correct, that's a non-obvious interaction that it would be good to document.
Yes, that's right; the predicate for allowing a lookback (which in practice means giving the instruction reference in the result of
get_input()
) is "same color, or producing op is pure". This is in the doc comment forget_input()
, but perhaps I can write more high-level docs as well. I hope to write some sort of "backend writer's guide" that communicates these principles and invariants that one needs to know, eventually...
- (revised isel driver more): regarding lookbacks and colouring, I would like to see at least some implementation of movzx/movsx applied to loads, preferably in this patch, or at least very soon in a followup. This is partly to improve code quality but primarily to demonstrate that the colouring infrastructure is sound (viz, to test it to some extent).
Sure; perhaps separately from this patch, as that's part of the x64 backend work (and I'm trying to touch it as little as possible in this, to avoid stepping on other ongoing work)?
- Is it correct to understand that the new blockorder.rs maintains the old invariant that there are no dead blocks in the output? Or, at least, that the invariant is maintained by whatever means, that there are no dead blocks in the input to regalloc?
Yes, exactly: the output of
BlockLoweringOrder::new()
will never have dead blocks (even if the input CLIF does; the RPO naturally ensures this). Caught a few bugs with the regalloc's liveness checker so I'm somewhat confident this is the case now!
- I see debug! calls in potentially hottish places, eg MachBuffer::put1(). Are we sure those become zero cost in release builds?
Fixed, thanks!
julian-seward1 commented on Issue #1718:
One other comment (not to block landing):
(revised isel driver): again, this is now more complex than it was. Question: is lookback past block boundaries still allowed? [..]
Yes, that's right; [..]
FTR, I am still of the opinion that lookback outside the block, even for just pure values, will turn out to be a net loss in the end, in the sense that the extra register pressure will cause much more of a loss than any minor improvements in insn selection that might result. That said, I don't have any Actual Evidence to substantiate my claim, at least currently.
Last updated: Nov 22 2024 at 17:03 UTC