In Wasmi I was experimenting with a new macro-op fusion of call
instructions in tail position.
While this optimization is semantics preserving the Wasm spec testsuite has some exhaustion checks which started to fail and state the following:
;; Stack exhaustion
;; Implementations are required to have every call consume some abstract
;; resource towards exhausting some abstract finite limit, such that
;; infinitely recursive test cases reliably trap in finite time. This is
;; because otherwise applications could come to depend on it on those
;; implementations and be incompatible with implementations that don't do
;; it (or don't do it under the same circumstances).
With fuel consumption enabled we still exhaust some abstract resources even with this tail call optimization enabled.
Thus I conclude that optimizations like this are only possible when fuel consumption is enabled. I wondered: does Wasmtime/Cranelift already apply this optimization conditionally? Would it be a good idea? Why? Why not?
The way I read specification is that it must be a finite resource that tracks call depth -- so that recursing too deeply traps -- and one can't "reload" the resource. Fuel on the other hand is meant to be a metering mechanism that one uses to, say, pause execution every so often; it's underneath the level of Wasm semantics.
Explored with a particular example: if we rely only on fuel to limit us in this sense, then both a regular infinite loop ((loop (br 0))
) and a tail-recursive loop should fail in the same way; but what the specification above is trying to say (the spirit of its intent I think) is that regular infinite loops must be allowed but tail-recursive loops must not
To me there is no statement of relation between infinite loops and infinite recursion from the above sentence. It simply states that every call has to consume some abstract finite resource which to me is given if fuel metering is enabled. Therefore I assumed that with fuel metering enabled an optimization like this should be fine. If fuel metering is disabled this optimization would not be applied and thus the max (call) stack depth acts as this finite abstract resource that is consumed for each function call.
OK. I think that's a mis-reading, in context. I'm going from this:
;; This is
;; because otherwise applications could come to depend on it on those
;; implementations and be incompatible with implementations that don't do
;; it (or don't do it under the same circumstances).
as I mentioned, fuel is often used in a way that lets one periodically interrupt; that wouldn't prevent a producer from using tail-recursion to implement iteration. The intent here seems to me to be to push iteration toward intra-function loops rather than tail recursion, for community-dynamics reasons.
If you disagree I'd recommend filing an issue on the tail-calls repo and seeing if we can get a canonical answer...
hey @Robin Freyler
@Dan Gohman would know more of the history here, especially with regards to this spec test, but my understanding is that optimizing regular call
s in tail position into return_call
s is frowned upon (perhaps even disallowed?) at the Wasm engine level. the fear was that if one engine does it, then applications might start to rely upon it, which would de facto force all other engines to do the "optimization" as well, which might not always be desirable. for example this transform isn't always beneficial from a codegen point of view, since tail calls can be more expensive at the ABI/calling convention level than regular calls.
it is, however, fair game for a Wasm-producing toolchain (eg LLVM or binaryen) to perform, since they are preserving application/language semantics rather than Wasm semantics. I believe that clang/LLVM will actually do this today when you enable tail calls. ofc the bit about it not always being a beneficial transform still applies.
An additional nuance is, the term "tail position" is ambiguous; is a call in tail position if its return value goes through a local.set/local.get? Or an add
with 0? And further, on some hardware architectures but not others, operations like i32.wrap_i64
can be done implicitly and could be compatible with tail call optimization. And another is that some wasm tools and implementations add instrumentation to the ends of functions, which would interfere with "tail position".
Especially in an ecosystem where there is one particularly dominant runtime (browsers), we didn't want toolchains depending on the details of how that particular runtime works.
@fitzgen (he/him) @Dan Gohman thank you for the further context and your replies.
I understand that optimizations like these are "frowned upon" and why they should rather be performed on the Wasm producer side. I won't implement this optimization despite the fact that even with this optimization, Wasmi probably would have never seriously outperform any of the JIT based engines such as Wasmtime. :D
The question when a call is in tail position was very interesting to me. Practically or theoretically minded persons might answer this question very differently. Finally, implementing this optimization would likely be an optimization that is going to be less effective over time with the rollout of the tail-calls
Wasm feature in more and more Wasm producers and users.
Last updated: Dec 23 2024 at 12:05 UTC