abrown commented on Issue #2278:
@abrown, would you be willing to help with this (and then we can land it all at once)?
Yes, but I might not get around to it until next week? If the delay is not a problem, I don't mind doing that so that the issue is resolved all at once. (I think I can push to this PR, right?)
cfallin commented on Issue #2278:
That sounds fine to me at least!
akirilov-arm commented on Issue #2278:
I don't mind a one week delay either.
akirilov-arm commented on Issue #2278:
I added the necessary legalization rule to the old x86 backend.
akirilov-arm edited a comment on Issue #2278:
I added the necessary legalization rule to the old x86 backend. I also removed the temporary work-around in
cranelift-wasm
, so the lowering logic is no longer conditional.
akirilov-arm edited a comment on Issue #2278:
I added the necessary legalization rule to the old x86 backend. I also removed the temporary work-around in
cranelift-wasm
, so the lowering logic is no longer target-specific.
abrown commented on Issue #2278:
@akirilov-arm, nice! So I'm tracking the only thing I need to do next week is lower
LoadSplat
in the x64 backend, right? (Unless you get to it first...).
akirilov-arm commented on Issue #2278:
@abrown Yes, that's the only thing remaining, and no, I don't intend to do any other x86 changes (unless I am addressing review feedback, of course).
However, I have noticed that the peephole optimizer tests are failing and according to the logs the reason seems unrelated to my changes...
abrown commented on Issue #2278:
@akirilov-arm, @cfallin: I committed an implementation of
LoadSplat
on x64--please take a look. Unfortunately I had to threadSourceLoc
s through moreInst
formats in c9c8de6 and I feel there may a better way to do this so I opened #2290.
abrown commented on Issue #2278:
Looking at the failures in the CI, I think something is behaving weirdly re: Git. The lines it claims need extra parameters should have those added by c9c8de6 and
ci/run-experimental-x64-ci.sh
. Perhaps we should rebase onmain
and re-push?
akirilov-arm commented on Issue #2278:
@abrown I have just rebased, but the tests are still failing. Your changes look fine to me, but I am not particularly familiar with the x64 backend.
akirilov-arm edited a comment on Issue #2278:
@abrown I have just rebased, but the tests are still failing. Your changes look fine to me otherwise, but I am not particularly familiar with the x64 backend.
akirilov-arm commented on Issue #2278:
I am not sure why, but I have moved the new IR instruction definition to the end of the function defining all the operations in
cranelift/codegen/meta/src/shared/instructions.rs
and now the peephole optimizer tests are no longer failing.cc @fitzgen
akirilov-arm commented on Issue #2278:
@abrown It looks like you missed updating a couple of
Inst::xmm_rm_r
andInst::xmm_rm_r_imm
call sites, probably because my branch was outdated when you started working on it. I just rebased, so would you be able to fix it?
julian-seward1 commented on Issue #2278:
I apologise for getting to this so late in the day.
I would like to request that this be reconsidered. I believe the approach discussed so far works against the design goals of the new backend framework, and that an alternative approach would be preferable.
A primary goal of CLIF, in the new backend framework, is to provide a way to conveniently do redundancy elimination. One key aspect is to provide only a single way to represent any computation. This change runs counter to those goals. For example, consider GVN: with this change in place, there is no way to GVN a single
LoadSplat
CLIF with an equivalent two-CLIF load and splat sequence.Of course I want this to become a single instruction on AArch64. The instruction selection framework in the new backend has been designed from the start to allow matching multiple CLIF nodes into a single machine instruction, and so we should use that here, and not add the proposed
LoadSplat
CLIF node.@akirilov-arm writes:
In order to generate it, it is necessary to merge the Load and the Splat in the backend, which is not possible because the load may have side effects.
I don't follow .. can you clarify this? What instruction are you referring to?
akirilov-arm commented on Issue #2278:
What instruction are you referring to?
Opcode::Load
andOpcode::Splat
. I actually tried what you are talking about and it didn't work - refer to the discussion in #1175.
akirilov-arm edited a comment on Issue #2278:
What instruction are you referring to?
Opcode::Load
andOpcode::Splat
. I actually tried what you are talking about and it didn't work - refer to the discussion in #1175.
julian-seward1 commented on Issue #2278:
Ah, I see. Yes, it's true; the new framework currently doesn't allow what you tried -- it only allows a load to be at the root of a pattern tree. Whereas here, the splat would be at the root.
Having said that, @cfallin was talking recently about lifting that restriction in the framework. This isn't of much significance in the AArch64 case (apart from here, obviously) since AArch64 have very few load-op instructions. But for x64, almost all instructions have a load-op form, and so it's important that the framework can support that idiom effectively. Hence it might be the case that this limitation gets lifted sooner rather than later.
julian-seward1 edited a comment on Issue #2278:
Ah, I see. Yes, it's true; the new framework currently doesn't allow what you tried -- it only allows a load to be at the root of a pattern tree. Whereas here, the splat would be at the root.
Having said that, @cfallin was talking recently about lifting that restriction in the framework. This isn't of much significance in the AArch64 case (apart from here, obviously) since AArch64 has very few load-op instructions. But for x64, almost all instructions have a load-op form, and so it's important that the framework can support that idiom effectively. Hence it might be the case that this limitation gets lifted sooner rather than later.
abrown commented on Issue #2278:
@akirilov-arm: good call, now d990dd4 should have source locations in more places. @julian-seward1: regardless of what we decide to do with this PR, you should also take a look at #2290--we need a good way to trap at all of the places an x64 instruction could load (which could be quite a few) so I suggest a possible alternative in that issue.
akirilov-arm commented on Issue #2278:
Thanks, @abrown!
@cfallin @julian-seward1 How are we going to proceed with this? The implementation is now complete (covering all three major backends) and passes all tests. Based on the discussion in #1175, I believed that this would be an acceptable approach, but even if it is not, most of my code is actually independent of whether there is a new CLIF operation or not, and I think that's the case for @abrown's code as well. Are there plans to change the new framework soon, so that it supports the type of pattern matching that would be applicable to this case?
cfallin commented on Issue #2278:
@akirilov-arm @julian-seward1 @abrown so, I'm off for this week and next (but I'll try to keep up with this when I can). At a high level, I am fine with the pragmatic approach here (and in #1175) of creating these merged opcodes for now, and then eventually removing them when we have the story for load-related code motion sorted out.
In other words, I don't necessarily think that we are so close to having the "right thing" built that we should hold back true codegen improvements in the meantime and create a dependence on a to-be-built part of the framework; I think the cost of removing these opcodes later is minimal, and can be done as part of a PR that introduces the proper pattern-matching. It's not the ideal end-state, but it lets us move forward in the meantime.
@julian-seward1, what do you think?
cfallin commented on Issue #2278:
So, @julian-seward1 and I discussed this last week; apologies for taking so long to get back to it (I'm back to work again, albeit somewhat busy with onboarding stuff.)
In general I agree with @julian-seward1 (and it seems others would be on board with) a better approach eventually that can actually handle pattern-matching load sources to instructions. This has been on my to-do list for a while; but in theory is possible if we loosen the load-motion conditions a bit.
However, I think that having the better codegen as a starting point is valuable; if we take this as-is for now, then we have a baseline whose codegen output we should match when we do have the proper pattern-matching in place.
Thus, I'm happy to take this, and I'll commit to removing the
LoadSplat
as part of my pattern-matching fixup later, as long as @julian-seward1 doesn't strongly object to this transitional state. Thoughts?
akirilov-arm commented on Issue #2278:
@cfallin Sounds good to me. It's always better to have a good test case before adding non-trivial functionality and, if I understand correctly, previously there were no actual use cases for the enhanced pattern matching.
julian-seward1 commented on Issue #2278:
Well, I can't say I'm enthusiastic; I think it moves the design in the wrong direction. But if it's only temporary, and since it's useful to land the rest of it anyway, I won't stand in the way.
I should add .. Today I implemented the proposed wasm instructions
v128.load32_zero
andv128.load64_zero
, although the patch isn't on any PR yet. These are almost identical to the load-splat insns, except that all lanes other than lane zero are zeroed out, not made into clones of it. I used a two-CLIF-insn scheme -- a scalar load, and a move-scalar-to-lowest-lane-and-zero-others. Once the isel framework can do load-op merging, I'll add a matching rule so as to only generate a single insn.So .. if/when that lands, it'll create further CLIF-level inconsistency, at least in the short term :-/
cfallin commented on Issue #2278:
@julian-seward1 understood, and yes, not an ideal situation; but I think this should be a useful test case in the sense that if a new matching framework can handle this case (allow
LoadSplat
to be removed while retaining identical output), it will have succeeded. I definitely want to get to that point as soon as possible :-)I'll go ahead and approve and merge now; thanks for the discussion, all!
Last updated: Jan 24 2025 at 00:11 UTC