alexcrichton opened PR #7321 from alexcrichton:legalize-isle
to bytecodealliance:main
:
This PR is the result of some discussion in the Cranelift weekly meeting a few weeks ago at this point. Currently backends are required to cope with any shape of IR thrown at them, modulo some minor platform-agnostic legalization. This can be difficult at times in cases such as:
- Many backends handle some constructs in the same way, for example zero-extending a 32-bit value to a 128-bit value. This is done as "easy" combinations of other already-implemented instructions, but each backend also has to have 128-bit specific logic. This is applicable to many instructions other than
uextend
as well.- Backends don't have the ability to create new basic blocks or control flow during lowering. This gives rise to pseudo-instructions which bake in the control flow to them, but at least personally I feel that's pretty cumbersome and brittle.
- The work that backends do to transform CLIF to something that can be emitted natively is not subject to optimizations. For example many backends insert sign/zero extensions for various operations which can theoretically be GVN'd with previous ones if they already exist, but this is so late in the pipeline GVN doesn't run.
As a broad solution to the above issues as well as being in the spirit of "let's write more stuff in ISLE as it's much easier to extend", this PR migrates all legalization into ISLE. The general setup of this PR is that there's a shared set of legalizations which represent the prior backend-agnostic legalizations (e.g. replacing
iadd_imm x, y
withiadd x, (iconst y)
). Each backend then has its own set of legalizations which are added to this set of legalizations to create one largelegalize
function. Additionally the NaN canonicalization pass is now moved into ISLE as well to happen as part of the legalization step (ok so maybe "legalize" isn't the best term for it any more at that point, please bikeshed this or tell me it's a bad idea)Existing legalizations are ported for examples, in addition to two target-specific legalizations:
- On RISC-V some sample legalizations are done where
uextend
-to-i128 no longer needs to be handled in the backend as it's handled in the legalization phase instead. Other backends can in theory include these rules as well, but that's left for a future PR.- On x86_64 the
fcvt_from_uint
instruction's lowering for 64-bit types is no longer a pseudo-instruction in the backend. Instead the pseudo-instruction is deleted and replaced with a legalization. This brings a few benefits such as being able to use AVX instructions in the lowering in addition to breaking false dependencies, neither of which previously happened in the lowering.My main goal with this is to not necessarily solve all the preexisting issues with the backends but instead provide a framework to move forward. When CFG edits are required for lowerings or when constructs would be good to throw at egraph-based optimization it's now easy to add a rule to
legalize.isle
and have it thrown in. This can help shift some complexity around and avoid consolidating it all in one pass.A downside of this approach, however, is that it doesn't provide a means to remove all pseudo-instructions. For example the x64 backend has pseudo-instructions for converting floats to integers, and they can't be legalized with this technique. The reason for this is that the native instruction has no corresponding CLIF instruction which can express it. While it's possible to add
x86_*
instructions to CLIF that seems best left for later only if truly necessary. In the meantime my hope is that backends, as maintenance continues over time, can start sharing more logic in these legalizations related to 128-bit integers for example or otherwise legalizing larger lowerings through this technique rather than all via ISLE lowerings.
alexcrichton requested elliottt for a review on PR #7321.
alexcrichton requested wasmtime-compiler-reviewers for a review on PR #7321.
alexcrichton updated PR #7321.
alexcrichton updated PR #7321.
cfallin requested cfallin for a review on PR #7321.
alexcrichton updated PR #7321.
alexcrichton updated PR #7321.
jameysharp commented on PR #7321:
I forgot this PR existed and got excited again while reading over the discussion history. I think that, with some work, there's a clean solution to the problem of the egraph pass undoing target-specific legalizations: Delay elaboration until lowering. I've written that up in #8529 because I think it's useful independently.
There's still a problem if the legalized version gets eliminated by
subsume
. We can delete all uses ofsubsume
in the egraph rules if we want, but conceivably the auto-subsume behavior from #8006 could be a problem too.The above is all stuff we could do separately from this PR. I think it would be worthwhile to remove the parts of this PR that change any lowering rules, and just convert the existing legalization pass to ISLE.
alexcrichton commented on PR #7321:
A bit belated getting back to this, but that sounds plausible to me! I'll probably leave this be though for now since we're not really changing that much related to legalization nowadays and the main motivation here was to open up the door to simplifying various rules in backends (e.g. backend-specific legalization). Once there's a clearer direction on #8529 I think it'd make sense to start moving in that direction.
fitzgen commented on PR #7321:
FYI: we now take
trap[n]z
instructions all the way to the backend, so there are no more control-flow changes during legalization, so moving legalization to ISLE should be easier now.But I also kind of feel like we shouldn't have to do any legalization in the mid-end. Instead, I think we should remove the instructions that get legalized and replace them with
InstructionBuilder
convenience methods, as we've talked about previously (in other issues, or cranelift meetings, can't remember). That is, we'd remove thestack_store
instruction but keep the ability to dobuilder.ins().stack_store(...)
and it would just eagerly generatev123 = stack_addr ...; store ..., v123;
clif instructions.
The reason I'm using stack_load and stack_store in cg_clif is to simplify the printed clif ir to make it easier to read. As well as to make future optimizations that depend on stack pointers not leaking easier. stack_addr would leak stack pointers as Cranelift doesn't have pointer provenance.
alexcrichton commented on PR #7321:
But I also kind of feel like we shouldn't have to do any legalization in the mid-end.
One day I'd like to have a property where backends don't have to implement lowering for
uextend.i128 x
for example because they all already supporticoncat
andiconst.i64 0
, so I'd at least personally still like to see legalization used for "massage this input IR until all operations the backend doesn't support are removed"
cfallin commented on PR #7321:
"massage this input IR until all operations the backend doesn't support are removed"
+1, that's the "good" usage of legalization IMHO. Historically it was used for (i) "unconditional on all backends" rewrites (stack loads/stores as a good example), (ii) actual instruction lowering until it reached CLIF insts that corresponded one-to-one to machine insts on the target; (iii) "reduce to a previously solved problem" rewrites (narrowing and polyfills).
Options (i) and (ii) are "inefficient" in the sense that we can do them better by (i) emitting the final sequence right away, as suggested above, and (ii) using our one-pass lowering scheme that we introduced with MachInsts and VCode. But Option (iii) is eminently reasonable; I too think that there's unnecessary duplication in our 128-bit ops handling, for example. We've talked about "architecture-specific mid-end rules" and this is a use-case for that IMHO -- a backend can opt into the "narrow 128 to 64 bits" ruleset, or selectively opt in but gradually implement more efficient lowerings over time, etc.
fitzgen commented on PR #7321:
Agreed that use case (iii) continues to make sense. We just don't currently do it at all.
(i) and (ii) I think should be removed.
@bjorn3
The reason I'm using stack_load and stack_store in cg_clif is to simplify the printed clif ir to make it easier to read. As well as to make future optimizations that depend on stack pointers not leaking easier. stack_addr would leak stack pointers as Cranelift doesn't have pointer provenance.
If you only use the proposed
stack_{store,load}
helper methods, the stack addresses will not leak beyond their associated stores/loads, which is exactly the same situation we have today.It is true that the clif text dumps will have two operations where it used to have one (pre-legalization and optimization, at least). I think that is ultimately acceptable, and a small price to pay for simplifying the IR and compiler internals, but it is a downside.
cfallin commented on PR #7321:
For a little more on escape analysis and stack addresses, re: @bjorn3's concerns: it's true that Cranelift doesn't have a notion of pointer provenance, so an arbitrary integer computation that happens to produce a correct stack address could technically access the stack legally; but that's already the case today, under the hood, in the sense that we do translate
stack_load
/_store
to regular loads/stores and we do not specify that the stack is "outside the ordinarily accessible address space".Short of full provenance though I think we can still have a notion of "leakage" that is formally definable: an address becomes visible (an arbitrary integer computation can produce a pointer that can access it) once used/exposed by any operation other than as the address of a load/store. So
stack_addr
->stack_load
can be pattern-matched today, and an escape analysis can see (if no other uses of thestack_addr
) that the address does not enter the rest of the program dataflow. Such a line of reasoning should be reasonable to take, IMHO, because the only way that a computation could otherwise happen upon the address is by guessing or having inside knowledge of the system (stack layout, etc). So in other words we avoid the full complexity of maintaining pointer provenance through IR transforms (all the things we've talked about before) while still having a notion of "exposed addresses".
Last updated: Nov 22 2024 at 17:03 UTC