Does cranelift
/wasmtime
perform any degree of inlining for call
instructions?
No, it does not. It depends on the generator of the input clir ir or wasm (eg llvm) to do inlining. Every function is compiled individually and possibly in parallel.
@Gabor Greif
@Gabor Greif adding an inliner to our Cranelift middle-end is something I want to do eventually; it'll become more important especially with module linking and interface types in Wasm, and likely really useful for other frontends as well; we just need to sit down and do the work
It could be a performance gain to inline very short single basic blocks or short conditional/comparison code followed by an if
when one of the alternatives is very short. get x; get y; cmp; if; <long> else const 1
-type of code. The <long>
could then be extracted into a new function and call
-ed.
Chris Fallin said:
Gabor Greif adding an inliner to our Cranelift middle-end is something I want to do eventually; it'll become more important especially with module linking and interface types in Wasm, and likely really useful for other frontends as well; we just need to sit down and do the work
we have to be careful how we do this, of course, because most small wasm functions within a module would have already been inlined by LLVM if it were profitable to do so, and so the remaining ones are things that were marked cold/noinline but which those attributes aren't present in the Wasm anymore
that said, I agree we will want cross-module inlining, but we just have to be careful in our implementation
Yes, for sure, the heuristic will be somewhat special. Might be worth studying what SpiderMonkey does for intra-module inlining (e.g. if they just prohibit it, or have a heuristic that's more conservative, or...); I don't remember off the top of my head
Ah, yes, when the call
-site is already in an if
, that's a tell-tale sign that the call
might be on a cold path. Also: not every Wasm module comes from LLVM :-) In Motoko, e.g., I don't want to inline too much (even for hot code) because it may distort (i.e. worsen) the instruction count metrics. Space is precious and cycles as well. I am in the process of running benchmarks to gauge the effect.
Last updated: Jan 24 2025 at 00:11 UTC