Stream: git-wasmtime

Topic: wasmtime / PR #6509 Fix a soundness issue with the compon...


view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 16:03):

alexcrichton opened PR #6509 from alexcrichton:component-async-bug to bytecodealliance:main:

This commit addresses https://github.com/bytecodealliance/wasmtime/issues/6493 by fixing a soundness issue with the async
implementation of the component model. This issue has been presence
since the inception of the addition of async support to the component
model and doesn't represent a recent regression. The underlying problem
is that one of the base assumptions of the trap handling code is that
there's only one single activation in TLS that needs to be pushed/popped
when a stack is switched (e.g. a fiber is switched to or from). In the
case of the component model there might be two activations: one for an
invocation of a component function and then a second for an invocation
of a realloc function to return results back to wasm (e.g. in the case
an imported function returns a list).

This problem is fixed by changing how TLS is managed in the presence of
fibers. Previously when a fiber was suspended it would pop a single
activation from the top of the stack and save that to get pushed when
the fiber was resumed. This has the benefit of maintaining an entire
linked list of activations for the current thread but has the problem
above where it doesn't handle a fiber with multiple activations on it.
Instead now TLS management is done when a fiber is resumed instead of
suspended. Instead of pushing/popping a single activation the entire
linked list of activations is tracked for a particular fiber and stored
within the fiber itself. In this manner resuming a fiber will push
all activations onto the current thread and suspending a fiber will pop
all activations for the fiber (and store them as a new linked list in
the fiber's state itself).

This end result is that all activations on a fiber should now be managed
correctly, regardless of how many there are. The main downside of this
commit is that fiber suspension and resumption is more complicated, but
the hope there is that fiber suspension typically corresponds with I/O
not being ready or similar so the order of magnitude of TLS operations
isn't too significant compared to the I/O overhead.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 16:03):

alexcrichton requested fitzgen for a review on PR #6509.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 16:03):

alexcrichton requested wasmtime-core-reviewers for a review on PR #6509.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 18:57):

fitzgen submitted PR review:

Mostly LGTM but I have an idea around making this slightly stronger typed and also some naming nitpicks (take these with a grain of salt, I'm not 100% sure why I hate "TLS" here so much, maybe you can think of something better tho, since I am woefully lacking in terms of actual alternatives)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 18:57):

fitzgen submitted PR review:

Mostly LGTM but I have an idea around making this slightly stronger typed and also some naming nitpicks (take these with a grain of salt, I'm not 100% sure why I hate "TLS" here so much, maybe you can think of something better tho, since I am woefully lacking in terms of actual alternatives)

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 18:57):

fitzgen created PR review comment:

I guess we don't really (and shouldn't) have easy access to the top CallThreadState where we actually call the function, so making it a method wouldn't really help.

But I still don't like the "TLS" nomenclature and find it mildly confusing.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 18:57):

fitzgen created PR review comment:

Instead of "reverse order" can we say "from youngest to oldest activation"?

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 18:57):

fitzgen created PR review comment:

Can we mess around with wording a bit here? "TLS pointer" could maybe instead be "CallThreadState pointer" and similar changes to "TLS" elsewhere in here? Maybe even make this a method like CallThreadState::assert_not_in_range? Because as-written I was thinking that this relied on some kind of details of the underlying system's TLS implementation, not our per-thread CallThreadState linked list, which is indeed always on the stack.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 18:57):

fitzgen created PR review comment:

Very nice

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 18:57):

fitzgen created PR review comment:

Similarly, I wouldn't mind calling this AsyncWasmCallState or something along those lines.

Maybe I'm taking this too far...

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 18:57):

fitzgen created PR review comment:

So currently nothing is really stopping us from calling push twice, with no pops in between, which would lead to broken things. Do you think it is worth having two types here?

  1. a PushedTlsState for when it is pushed, and this would have a fn pop(self) -> PoppedTlsState
  2. a PoppedTlsState for when it is popped, and this would have a fn push(self) -> PushedTlsState

It wouldn't be too much extra boilerplate, and would help ensure that these methods are called correctly.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 18:57):

fitzgen created PR review comment:

The idea is that self.state, when the TlsState is pushed, is the previous head of the CallThreadState stack before this TlsState was pushed? I guess that is what the comment above, on the field, says. The fact that this same field is punned to hold different things (even if they are of the same type) whether we are in a pushed vs popped state is sorta subtle and is making me think that the type-per-state approach would be good. At minimum we could have more descriptive field names to make this code easier to grok at a glance, because I was having to read this pretty closely to understand the loop termination condition.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 18:57):

fitzgen created PR review comment:

Wait it is actually oldest to youngest, right? So that the towers-of-hanoi style pushing and popping onto different stacks works?

Concretely: can we explicitly list the order (rather than say it is the usual vs reverse order).

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 18:57):

fitzgen created PR review comment:

And then this would become Option<wasmtime_runtime::PoppedTlsState>...

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 18:57):

fitzgen created PR review comment:

And this would become

let popped = self.tls.take().unwrap();
let pushed = popped.push();
let result = self.fuber.resume(val);
self.tls = Some(pushed.pop());

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 19:23):

alexcrichton updated PR #6509.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 19:40):

alexcrichton updated PR #6509.

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 20:56):

fitzgen submitted PR review:

Thanks! LGTM!

view this post on Zulip Wasmtime GitHub notifications bot (Jun 02 2023 at 21:35):

fitzgen merged PR #6509.


Last updated: Oct 23 2024 at 20:03 UTC