fitzgen requested alexcrichton for a review on PR #11210.
fitzgen requested wasmtime-compiler-reviewers for a review on PR #11210.
fitzgen opened PR #11210 from fitzgen:cranelift-inliner to bytecodealliance:main:
This comit adds "inlining as a library" to Cranelift; it does _not_ provide a complete, off-the-shelf inlining solution. Cranelift's compilation context is per-function and does not encompass the full call graph. It does not know which functions are hot and which are cold, which have been marked the equivalent of
#[inline(always)]versus#[inline(never)], etc... Only the Cranelift user can understand these aspects of the full compilation pipeline, and these things can be very different between (say) Wasmtime andcg_clif. Therefore, this infrastructure does not attempt to define hueristics for when inlining a particular call is likely beneficial. This module only provides hooks for the Cranelift user to tell Cranelift whether a given call should be inlined or not, and the mechanics to inline a callee into a particular call site when the user directs Cranelift to do so.This commit also creates a new kind of filetest that will always inline calls to functions that have already been defined in the file. This lets us exercise the inliner in filetests.
Fixes https://github.com/bytecodealliance/wasmtime/issues/4127
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
fitzgen requested cfallin for a review on PR #11210.
cfallin created PR review comment:
Would it make sense to provide a
Cow<'a, ir::Function>here? I can imagine use-cases that might want to synthesize a body to inline (for example: intrinsics, perhaps even dynamically generated based on parameters) rather than holding ownership of an existing function elsewhere.
cfallin created PR review comment:
This feels like a somewhat error-prone API -- it imposes an ordering requirement (or rather, the first
push_catchis not really a catch but the normal return). Perhaps we could have a separate setter for the normal return entry starting from an empty exception table, and then assert here that the normal return is present?
cfallin created PR review comment:
This seems like a reasonable case to optimize here (will be a very frequent case) but it might be worth noting that it could go away, simplifying logic here, if we add a jump-threading optimization pass later...
cfallin submitted PR review:
This looks great overall -- thanks a bunch for building this and the care taken in handling all the combinations of kinds of calls!
A few minor thoughts below.
vmctxhandling in particular is interesting and I'm wondering if we should punt some complexity by leaning on legalization instead; could be convinced either way though.
cfallin created PR review comment:
A thought -- not sure how I feel about it, but worth noting to see what you think:
The vmctx handling here feels a little awkward, and certainly costs something; what do you think about requiring the inlined function to have been legalized first, and erroring out if we see a
global_valueoperator during inlining?On the one hand, it's a little less user-friendly (not just any function body can be inlined -- one must ensure it's in a particular state), but on the other hand, it feels much more orthogonal and well-factored from a compiler-pass point of view -- the point of legalization is to lower the implicit function-scoped GVs to ordinary (compositional, fully explicit) operators, so we might as well use that logic.
bjorn3 created PR review comment:
I don't think we can inline
try_callat all without frontend specific knowledge. For example forcg_clif, a regular call to_Unwind_Resumeis used to resume unwinding to the caller of the current function. The inliner would need to know to replace_Unwind_Resumecalls with a jump to the right exception table entry.
bjorn3 submitted PR review.
bjorn3 edited PR review comment.
fitzgen submitted PR review.
fitzgen created PR review comment:
Good idea, will do :+1:
fitzgen submitted PR review.
fitzgen created PR review comment:
Yeah I wavered back and forth on this myself, but ultimately it didn't seem too hard to support global values, even with the different-
vmctxcase, so I went ahead and did it. I guess unless you feel strongly that we should require legalization before inlining, I would rather not put effort into changing what is already there, given that I don't think there is a clear winner between the two choices. But I am willing to make this change if you really think it is the better approach.
fitzgen submitted PR review.
fitzgen created PR review comment:
True, will add a comment
fitzgen created PR review comment:
Will add the assert. I don't have any use case for actually building a new exception table from scratch after it has been cleared, and no other place in the code base does either, so I'll avoid the push-normal-return method for now.
I think ideally this type would internally have the equivalent of
enum { Cleared, Full(...) }so that if we have the vecs then we know we aren't cleared and can always rely on their length relationship invariant, but that is a bit outside the scope of this PR, IMO.
fitzgen submitted PR review.
fitzgen submitted PR review.
fitzgen created PR review comment:
without frontend specific knowledge
I think that should be fine because the frontend is in control of deciding whether to inline any given call, and has the information about which function is being called and with which call opcode is being used (
try_callvs regularcallvsreturn_call), so the frontend can simply tell Cranelift not to inline any call that would be problematic in the ways you describe, or in any other ways. This inliner just needs to be able to do the actual inlining, given the precondition that it is valid to do said inlining (which is what the frontend is promising by saying "yes inline this, here is a body").
fitzgen submitted PR review.
fitzgen created PR review comment:
(Furthermore, because there is no "throw" instruction in CLIF, the only way of throwing an exception is via a call into some runtime function (libunwind, the Wasmtime runtime, C++'s
_Unwind*runtime, etc...), and therefore it is invalid to inline that particular call because then the throw could not happen, which breaks the equivalent-semantics promise that the frontend is making by saying "please inline this call with this callee body." On the other hand, it is fine to inline atry_callof a function that does not throw and does not make any calls that could throw; in this case we won't end up using the originaltry_call's exception edges at all, which is the correct thing to do as those exception edges are unreachable from the inlined call's body -- although the exception edges' target blocks may be reachable from other edges -- as we can statically see that there is no call that could throw and unwind along those exceptional edges.)
fitzgen updated PR #11210.
fitzgen has enabled auto merge for PR #11210.
cfallin submitted PR review.
cfallin created PR review comment:
On a little further thought: I do think it would actually be better to remove the vmctx-specific logic here, for a few maintenance- and risk-related reasons:
- We're spreading lowering knowledge around -- we're embodying the mapping between particular kinds of GVs and, say, loads or adds here as well as in the legalization phase. If we add any other sorts of GVs later (I know we want to go in the opposite direction, but...) this adds some more inlining-specific work that needs careful thought.
- The
vmctx-related GVs in particular are pretty critical for sandboxing logic in Wasmtime, and I'd rather keep that as contained as possible -- i.e., our very well curatedbounds_check.rsplus the one bit of legalization logic. The risk (however remote) that we mess up something here, now or later, and cross wires with the wrong vmctx exists; and if/when we try to push something like PCC further, it seems much more difficult if we're doing ad-hoc translation here as well as in the legalization phase.- If we try to move legalization to ISLE later then we'd also have the question whether to reuse that ISLE context here, or do something else; as above, IMHO it would be better to encapsulate all the knowledge in one place.
Finally I suspect that this requirement wouldn't place too much burden on "typical" use-cases: either one is doing a bottom-up-on-SCC-callgraph general inlining pass, and one will have done optimizations (which start with legalization) already on bodies that are considered for inlining; or one is building a specific compile-time builtin / intrinsification mechanism and has full mutable access to the special function bodies one is inlining as they are prepared. In other words I don't think it will be common to be stuck with a
&ir::Functionthat's not legalized and have to clone it or whatnot.All that said: still happy to discuss more if there are things I've missed!
cfallin edited PR review comment.
cfallin commented on PR #11210:
(I think your add-to-queue crossed paths with my comment above -- see discussion on vmctx)
fitzgen updated PR #11210.
fitzgen has enabled auto merge for PR #11210.
fitzgen merged PR #11210.
Last updated: Jan 09 2026 at 13:15 UTC