alexcrichton opened PR #6509 from alexcrichton:component-async-bug
to bytecodealliance:main
:
This commit addresses https://github.com/bytecodealliance/wasmtime/issues/6493 by fixing a soundness issue with the async
implementation of the component model. This issue has been presence
since the inception of the addition of async support to the component
model and doesn't represent a recent regression. The underlying problem
is that one of the base assumptions of the trap handling code is that
there's only one single activation in TLS that needs to be pushed/popped
when a stack is switched (e.g. a fiber is switched to or from). In the
case of the component model there might be two activations: one for an
invocation of a component function and then a second for an invocation
of arealloc
function to return results back to wasm (e.g. in the case
an imported function returns a list).This problem is fixed by changing how TLS is managed in the presence of
fibers. Previously when a fiber was suspended it would pop a single
activation from the top of the stack and save that to get pushed when
the fiber was resumed. This has the benefit of maintaining an entire
linked list of activations for the current thread but has the problem
above where it doesn't handle a fiber with multiple activations on it.
Instead now TLS management is done when a fiber is resumed instead of
suspended. Instead of pushing/popping a single activation the entire
linked list of activations is tracked for a particular fiber and stored
within the fiber itself. In this manner resuming a fiber will push
all activations onto the current thread and suspending a fiber will pop
all activations for the fiber (and store them as a new linked list in
the fiber's state itself).This end result is that all activations on a fiber should now be managed
correctly, regardless of how many there are. The main downside of this
commit is that fiber suspension and resumption is more complicated, but
the hope there is that fiber suspension typically corresponds with I/O
not being ready or similar so the order of magnitude of TLS operations
isn't too significant compared to the I/O overhead.
alexcrichton requested fitzgen for a review on PR #6509.
alexcrichton requested wasmtime-core-reviewers for a review on PR #6509.
fitzgen submitted PR review:
Mostly LGTM but I have an idea around making this slightly stronger typed and also some naming nitpicks (take these with a grain of salt, I'm not 100% sure why I hate "TLS" here so much, maybe you can think of something better tho, since I am woefully lacking in terms of actual alternatives)
fitzgen submitted PR review:
Mostly LGTM but I have an idea around making this slightly stronger typed and also some naming nitpicks (take these with a grain of salt, I'm not 100% sure why I hate "TLS" here so much, maybe you can think of something better tho, since I am woefully lacking in terms of actual alternatives)
fitzgen created PR review comment:
I guess we don't really (and shouldn't) have easy access to the top
CallThreadState
where we actually call the function, so making it a method wouldn't really help.But I still don't like the "TLS" nomenclature and find it mildly confusing.
fitzgen created PR review comment:
Instead of "reverse order" can we say "from youngest to oldest activation"?
fitzgen created PR review comment:
Can we mess around with wording a bit here? "TLS pointer" could maybe instead be "
CallThreadState
pointer" and similar changes to "TLS" elsewhere in here? Maybe even make this a method likeCallThreadState::assert_not_in_range
? Because as-written I was thinking that this relied on some kind of details of the underlying system's TLS implementation, not our per-threadCallThreadState
linked list, which is indeed always on the stack.
fitzgen created PR review comment:
Very nice
fitzgen created PR review comment:
Similarly, I wouldn't mind calling this
AsyncWasmCallState
or something along those lines.Maybe I'm taking this too far...
fitzgen created PR review comment:
So currently nothing is really stopping us from calling
push
twice, with nopop
s in between, which would lead to broken things. Do you think it is worth having two types here?
- a
PushedTlsState
for when it is pushed, and this would have afn pop(self) -> PoppedTlsState
- a
PoppedTlsState
for when it is popped, and this would have afn push(self) -> PushedTlsState
It wouldn't be too much extra boilerplate, and would help ensure that these methods are called correctly.
fitzgen created PR review comment:
The idea is that
self.state
, when theTlsState
is pushed, is the previous head of theCallThreadState
stack before thisTlsState
was pushed? I guess that is what the comment above, on the field, says. The fact that this same field is punned to hold different things (even if they are of the same type) whether we are in a pushed vs popped state is sorta subtle and is making me think that the type-per-state approach would be good. At minimum we could have more descriptive field names to make this code easier to grok at a glance, because I was having to read this pretty closely to understand the loop termination condition.
fitzgen created PR review comment:
Wait it is actually oldest to youngest, right? So that the towers-of-hanoi style pushing and popping onto different stacks works?
Concretely: can we explicitly list the order (rather than say it is the usual vs reverse order).
fitzgen created PR review comment:
And then this would become
Option<wasmtime_runtime::PoppedTlsState>
...
fitzgen created PR review comment:
And this would become
let popped = self.tls.take().unwrap(); let pushed = popped.push(); let result = self.fuber.resume(val); self.tls = Some(pushed.pop());
alexcrichton updated PR #6509.
alexcrichton updated PR #6509.
fitzgen submitted PR review:
Thanks! LGTM!
fitzgen merged PR #6509.
Last updated: Jan 24 2025 at 00:11 UTC