dicej opened PR #11807 from dicej:p3-instance-reuse to bytecodealliance:main:
This is a draft implementation of WASIp3 instance reuse. Previously, we used one store and one instance per request for both p2 and p3 handlers, discarding the store and instance immediately after the request was handled. Now we attempt to reuse p3 instances and their stores instead, both serially and concurrently.
Advantages of reuse:
- Higher throughput: We can amortize the time spent creating and disposing of instances and stores over a number of requests.
- Lower memory and address space usage: Concurrent instance reuse allows us to handle more requests with fewer instances, especially when the handler is I/O bound.
- Developers will presumably need to make changes to their applications anyway when migrating from p2 to p3, so this is an ideal time to change the default from no reuse to (some) reuse, assuming we want to do it at all.
Disadvantages of reuse:
- Reduced isolation: When the same instance is used to handle multiple requests, the blast radius of bugs in the guest is larger; a trap can affect unrelated requests, and a security flaw could leak state across requests.
- Host implementation complexity: There's more code to manage and more configuration knobs to tune.
- Guest implementation complexity: You can't just stash state in a global variable and call it a day. Similarly, shortcuts like using a leaking GC require additional thought.
Given the tradeoffs, we'll definitely need to provide components a way to opt out of reuse if desired. See
https://github.com/WebAssembly/wasi-http/issues/190 for related discussion.Implementation details:
I've moved
ProxyHandlerfromwasmtime_cli::commands::servetowasmtime_wasi_http::handler(so it can be reused by custom embedders), abstracted away theserve-specific parts, and added support for instance reuse via a newProxyHandler::pushfunction. Thepushfunction takes a work item representing an incoming request and dispatches it to a dynamically sized pool of worker tasks, each with its own store andwasi:http/handler@0.3.xinstance. Worker tasks accept work items as capacity allows until they either reach a maximum total reuse count or hit an idle timeout, at which point they exit.Performance:
I've run a few benchmarks using
wasmtime-serve-rps.shand hello-wasip3-http (a simple "hello, world" request handler). Highlights:
- 45% more requests per second with instance reuse enabled vs. disabled
- 64% fewer concurrent instances required for an I/O-bound handler
- Here I added a 40ms delay to the "hello, world" handler to simulate I/OTODOs:
- Support host-enforced request timeouts. Note that we can't simply use the epoch-based trap-on-timeout approach we've traditionally used since a given instance may be responsible for a number of concurrent requests. This will require adding a public API to the
wasmtimecrate for cleanly cancelling a guest task without affecting other ones.- Add CLI options for the configuration knobs (e.g. max request count per instance, idle timeout, etc.)
- Use guest signals like
backpressure, explicitwasi:cli/exit/exit, https://github.com/WebAssembly/wasi-http/issues/190, etc. to drive reuse decisions<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
dicej updated PR #11807.
dicej updated PR #11807.
dicej updated PR #11807.
tschneidereit commented on PR #11807:
- 64% fewer concurrent instances required for an I/O-bound handler
A 2/3rds reduction is less than I'd expect, given the max concurrency is set to 16, and all of these instances should be able to easily absorb 16 concurrent requests, given that they spent 99+% of their time waiting for that timeout to resolve.
- 64% fewer concurrent instances required for an I/O-bound handler
A 2/3rds reduction is less than I'd expect, given the max concurrency is set to 16, and all of these instances should be able to easily absorb 16 concurrent requests, given that they spent 99+% of their time waiting for that timeout to resolve.
I think that's at least partially due to the algorithm I'm using to decide when a worker will accept concurrent requests (and indicate it's ready to the request dispatcher), which balances the goal of concurrent reuse against the goal of maximizing throughput for guests which do little or no I/O.
Specifically, as soon as a guest has accepted a work item (i.e. a request to be handled), it marks itself unavailable to the dispatcher and will not accept another task (nor mark itself available again) until the store's event loop returns
Poll::Pending, meaning all work items it has accepted are blocked on I/O. In practice, this means that the number of available workers will go to zero more often than if they each accepted tasks unconditionally up to the max concurrency level, and when the dispatcher sees the available worker count at zero it starts another worker with its own instance.The advantage of the above algorithm is that new work items are never (or at least very rarely) left waiting for a worker to free up from whatever CPU-bound work item(s) it is occupied with. The disadvantage is we use more instances. Perhaps there are ways to tweak the algorithm so it can reduce the instance count without affecting latency?
tschneidereit commented on PR #11807:
I think that algorithm does make sense. My strong intuition would still be that it'd lead to better results than you're seeing. I wonder if part of the explanation could be the artificial nature of the load test, where a lot of requests come in at the same time with little variation? Maybe a load generation scheme that starts additional concurrent requests in a staggered manner would show different results?
tschneidereit commented on PR #11807:
It looks like Siege supports this, and in fact does it by default—see the
--delayoption. By default, Siege will apparently delay individual requests' start times by up todelayseconds, and ramp up to the defined number of concurrent clients.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Perhaps not the most critical thing, but I'm getting more and more wary of the solution of "just slap
FuturesUnorderedto get concurrency". With the addition of this we now have these layers ofFuturesUnordered:
- In the guest,
FuturesUnorderedis used for spawned tasks and the main task.- In the store, there's a
FuturesUnorderedfor concurrent tasks in the store.- Here there's another
FuturesUnorderedfor concurrent requests.- Tokio more-or-less has a
FuturesUnorderedfor all spawned tokio tasks.This is getting to be a bit excessive IMO and I feel like we're going to start seeing perf cliffs as a result in various scaling situations. I don't have anything concrete I know of necessarily but I feel like we can't just keep slapping a
FuturesUnorderedeverywhere and assume it'll all work out and scale well...Unfortunately though I don't have a great solution off the top of my head. I realize what this is doing is maintaining concurrency within a store and API-wise we haven't given many other options here.
It looks like Siege supports this, and in fact does it by default—see the
--delayoption. By default, Siege will apparently delay individual requests' start times by up todelayseconds, and ramp up to the defined number of concurrent clients.I just ran a test using
siege -c 50 -t 30s -d 0.5and got the exact same peak concurrent instance count as withhey. Siege doesn't appear to be applying any jitter to the delay, nor does it appear to vary when each user starts, and I'm not seeing any options to enable such things. I suspect we'll either need to keep looking or write our own load test tool :shrug:
dicej updated PR #11807.
tschneidereit commented on PR #11807:
Interesting. I guess either the Siege docs are incorrect or my reading of them is. Either way, as mentioned in our call I think the results you already have are entirely sufficient performance motivation for going ahead, so there's no blocking need to get additional ones, given that I'd only ever expect them to be better.
dicej updated PR #11807.
dicej updated PR #11807.
While implementing new CLI options to control reuse (i.e. max reuse count, max concurrent reuse count, and idle instance timeout), I noticed that setting the max reuse count to 1 (which effectively disables reuse) reduced
wasmtime-serve-rps.shperformance by 86% vs. themainbranch. I was able to fix that by cheating: falling back to the normal request handling path and not using the new reuse-enabled path when the max reuse count is 1.The upshot is that if you're setting max reuse count >1, you need to set it pretty high (e.g. >16) to actually get a performance benefit, since the overhead of the reuse-enabled code path will dominate otherwise. That suggests we might be leaving significant performance on the table even when the max reuse count is high. I don't know what the bottleneck is -- maybe
tokio::spawn; maybeFuturesUnorderedlike Alex mentioned; maybe cache coherency traffic due to atomics; maybe something else :shrug: .Anyway, I'm going to mark this ready-for-review now, since the major TODO items have been addressed, and both the default settings and the reuse-disabled mode provide performance that's at least as good as
main.
dicej has marked PR #11807 as ready for review.
dicej requested wasmtime-wasi-reviewers for a review on PR #11807.
dicej requested alexcrichton for a review on PR #11807.
dicej requested wasmtime-core-reviewers for a review on PR #11807.
dicej requested wasmtime-default-reviewers for a review on PR #11807.
I don't know what the bottleneck is -- maybe
tokio::spawn; maybeFuturesUnorderedlike Alex mentioned; maybe cache coherency traffic due to atomics; maybe something else :shrug: .Thinking about this a bit more last night, I realized that at least part of the problem is how the code decides whether to create a new instance for a given request; the way that currently works when a bunch of requests come in at the same time is suboptimal from a latency perspective. I'll experiment with some variations on Monday.
dicej updated PR #11807.
I don't know what the bottleneck is -- maybe
tokio::spawn; maybeFuturesUnorderedlike Alex mentioned; maybe cache coherency traffic due to atomics; maybe something else :shrug: .Thinking about this a bit more last night, I realized that at least part of the problem is how the code decides whether to create a new instance for a given request; the way that currently works when a bunch of requests come in at the same time is suboptimal from a latency perspective. I'll experiment with some variations on Monday.
This is mostly addressed; the performance penalty is now less than 20%. I've left the fast path in place for when reuse is completely disabled, so there's no penalty there at all. For small, but greater-than-one max reuse counts, performance is much better than before.
The tradeoff is now the code is more aggressive about creating new instances when a bunch of requests arrive simultaneously, so the peak instance count can be nearly as high as with reuse disabled. However, I think that's the correct tradeoff to maximize throughput in that scenario, and it should be pretty rare in practice anyway.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Since this already forwards all the arguments of the script itself, could these extra options be relegated to
"$@"to avoid tampering with defaults too much?
alexcrichton created PR review comment:
With https://github.com/bytecodealliance/wasmtime/pull/11822 this I think is no longer necessary
alexcrichton created PR review comment:
The synchronization and implementation here is subtle enough that I've got no idea, on first pass, why this is constructed the way it is. I see from Tokio's documentation, however, that this is a copy/paste of the examples they recommend for a mpmc channel. Could this function have a comment pointing to said docs indicating that this is the documented method of doing this?
alexcrichton created PR review comment:
Would it be possible to go ahead and resolve this TODO in this PR? I think it'd be nice to see this in action for both p2/p3 to avoid having to maintain multiple separate paths.
alexcrichton created PR review comment:
Could this be inlined above since it's only called once?
alexcrichton created PR review comment:
In lieu of this TODO being resolved, I think that the entire store must be thrown away right? Otherwise cancelling a task leaves a store in an effectively indeterminate state?
alexcrichton created PR review comment:
Naively I'd expect a signal like this would be updated on "major events" like a task being completed or a new task coming in. Is there a reason though that this needs to be updated on each turn of the event loop more-or-less?
alexcrichton created PR review comment:
This is a fair bit of
#[cfg]traffic, could that be relegated to a submodule perhaps to handle concurrency or something like that? (basically seeking out if there's a way to reduce the total amount of#[cfg])
alexcrichton created PR review comment:
How come this spawns a worker vs letting the handler at the top-level be in charge of that?
alexcrichton created PR review comment:
Instead of duplicating code, could this call
Worker::rundirectly?
alexcrichton created PR review comment:
Personally I find nested/multiple uses of
future::poll_fnquite confusing as I have to keep going back-and-forth trying to figure out what's polling what and what's doing what. For example here I need to jump back above to look at the loop, see where it's pending, etc. Would it be possible to simplify this by managingaccept_concurrentwithin the loop itself? For example waiting for a lull between tasks or something like that.I also find it pretty hard to reverse-engineer when this variable is set, unset, etc, and what are the state transitions for something like this.
alexcrichton created PR review comment:
I would ideally like to not have duplication like this currently has, so if possible I'd like to move the p2 handling into the handler added in the wasi-http crate. Previously the difference between these two branches was localized to just here and was, in theory, pretty easy to audit and see that they're doing the same thing. Now that they're so spread out across crates it's much more difficult to verify that both end up doing the same thing. In theory it should be possible for the new handler to pretty easily handle p2 modules, right?
alexcrichton created PR review comment:
Is this right to propagate an error here? I would expect that a task timing out results in a lot or tweaking of heuristics as it does right now by not accepting more tasks. Otherwise though for the entire worker I wouldn't expect it to fail just because one task timed out.
dicej updated PR #11807.
dicej updated PR #11807.
dicej submitted PR review.
dicej created PR review comment:
It _could_, but
Worker::rundoes a lot more than this code does, including polling the mpmc channel, fiddling with atomic variables, using aFuturesUnordered, etc., none of which is needed here.
dicej submitted PR review.
dicej created PR review comment:
Because that could lead to orphan requests such that
maybe_start_workersees a non-zeroworker_countand therefore makes no attempt to start a new one, but then the last worker exits right after that. This code ensures that, when the last worker exits and there is work to do, then a new worker will be started.
dicej submitted PR review.
dicej created PR review comment:
The code in
set_availableshort-circuits if the state hasn't actually changed, so this is a pretty light-weight operation. The reason we're doing it here is thataccept_concurrentcould change from outside theasyncblock any time it's polled.
dicej created PR review comment:
Yes, and that's exactly what we do as soon as
futures.len()has gone to zero; i.e. once every not-yet-timed-out task has finished or timed out (which will take at most one more timeout interval from the point of the first timeout), we drop the instance and store.
dicej submitted PR review.
dicej submitted PR review.
dicej created PR review comment:
I was trying to approximate the existing behavior in the non-reuse case where we return an error on timeout, but I don't have a strong opinion about doing that vs. just logging immediately and returning
Ok(())
dicej submitted PR review.
dicej created PR review comment:
Would it be possible to simplify this by managing
accept_concurrentwithin the loop itself? For example waiting for a lull between tasks or something like that.Currently, there's no way to determine whether the store's event loop has gone idle from within a
Futurepassed torun_concurrent-- the only way to know is based on when theFuturereturned byrun_concurrentitself returnsPending. In other words, just becauseFuturesUnordered::nextreturnsPendingdoesn't mean we're ready to accept another task -- it may only mean that it's waiting on deferred work which will be done during the next turn of the store's event loop.I'll think some more about what sort of API(s) we could add to Wasmtime to expose store event loop lulls in a way that's accessible from within a
Futurepassed torun_concurrent, but it's not obvious to me what that would look like.
dicej submitted PR review.
dicej created PR review comment:
At a minimum, I can move this code into
handler.rs, for sure. Actually supporting p2 reuse is less trivial, but I'll give it a shot.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
While true, does that factor into the 20% improvement this duplication earns? I would expect that those constructs are small peanuts, but that's also just a guess
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Ah ok makes sense, mind leaving a comment to that effect?
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Hm ok I think my attempts to understand how this implementation works are basically all in failure. Given the comments below I've seriously misunderstood how this loop works and I already stared at it for an hour...
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Given the comments above this one, I think I've basically reached the conclusion that I don't really have any idea how this is all piecing together. The combinations and subtelties of
poll_fn, plusFuturesUnordered, plus the multiple layers here (e.g. twopoll_fn), plus the variables used to control behavior ... I am basically just totally lost.I do think this would be much, much more readable if this was all written without
poll_fn. Once the transition is made from async/await to futures internals things get extremely tricky. In contrast it's basically trivial to read async/await code. Overallpoll_fnis fine IMO only when the internal closure is a handful of lines long as opposed to taking up most of the function.Is it possible to rewrite most of this in an async/await style? Maybe not preserving the exact same behavior but something similar? For example the idle time could be replaced with "here's something which takes up to N messagse from the incoming queue, ending early if M time elapses between messages taken". That's not an exact correlation to the idle time implemented here (I think) but I feel like that captures the intent well enough and is in theory much easier to understand where that entire computation could be in an
asyncblock and forgotten about (or something like that)
alexcrichton created PR review comment:
Overall I'm basically just wary of tacking on more and more featuers of "here's the p3 thing over there we're going to just ignore the p2 thing over here". It spreads out the implementation burden and means that we end up having a bunch of untested code because most things preexisting are p2.
I would naively expect integration to be pretty simple because p2 functions can be
call_concurrent'd just the same as p3 functions. One simplification, perhaps, is thatwasmtime serveshould unconditionally depend on thecomponent-model-asyncproposal. I don't think there's any point in supporting it with component-model but notcomponent-model-asynccompiled in.
alexcrichton submitted PR review.
dicej submitted PR review.
dicej created PR review comment:
Are you proposing that we generate the p2 bindings in
wasmtime-wasi-httpwithexports: { default: async | store }? Or generate them withexports: { default: async }like we do today, but also generate another set with| store? Or something else?
alexcrichton submitted PR review.
alexcrichton created PR review comment:
I wasn't really thinking that far myself. My thinking was that we should make it work to have one implementation of
wasmtime serve, and other goals should all be secondary. So for example if a second set of bindings is needed that seems fine to me. Either that or "get the function from bindings" accessor so a direct call is made to theTypedFuncor something like that.Put another way, I don't personally care how the bindings work out, but I do care much more that there's only one "meat" implementation that we have to test/verify/audit/etc.
dicej updated PR #11807.
dicej updated PR #11807.
dicej submitted PR review.
dicej created PR review comment:
Done.
dicej submitted PR review.
dicej created PR review comment:
Ok, I went ahead and generated a third set of p3 bindings (in addition to the existing async and sync ones) which enable
call_concurrent. I considered usingTypedFuncdirectly, but found myself duplicating the function index pre-instance lookup code; easier just to letwasmtime-wit-bindgenhandle that.
dicej updated PR #11807.
dicej updated PR #11807.
dicej submitted PR review.
dicej created PR review comment:
I just pushed an update which bypasses the task queue when there are no already-available workers, which narrowed the performance gap to just 2%; that seems close enough to justify removing the duplication. It also points to an opportunity to switch to a faster MPMC channel if we can find (or build) one.
dicej updated PR #11807.
dicej updated PR #11807.
alexcrichton commented on PR #11807:
Unfortunately I do continue to be basically ever-yet-more-lost on the internals of the implementation here. I'm trying to poke at things locally to see if I can improve my understanding, though.
I've run across something that seems to me like a serious bug in the design of component-model-async APIs in Wasmtime. I started trying to explore the sort of "outer timeout". here and the whole
BTreeMapthat was recently added. That feels quite surprising to me and naively I would not expect it to be necessary. I see now why it was done (sort of), but the inherent need for it, to me, indicates a serious bug in how we're exposing async things.Currently
run_concurrenttakes anAsyncFnOnce, but in reality it's not actually behaving the way that I would expect anAsyncFnOnceto behave. Specifically if the guest callsstd::thread::sleepthen that blocks all other async operations in thatAsyncFnOnce, including host timeouts (e.g. thetokio::time::timeout-per-request). That to me is a pretty big violation of what one would expect from async APIs in Rust.Can you speak more to why this behavior is happening? I understand why all wasm would be blocked, but I don't understand why all other host interactions are also blocked.
Can you speak more to why this behavior is happening? I understand why all wasm would be blocked, but I don't understand why all other host interactions are also blocked.
If you want to trace though the code,
run_concurrentcallspoll_untilcallshandle_work_itemcallsresume_fibercallsfiber::resolve_or_release. In this case, the fiber ends up suspending _without_ releasing the store, so theresolve_or_releaseFuture::pollreturnsPending, which bubbles all the way back up torun_concurrent. At this point, nothing can be done with theStore-- the fiber has exclusive access, so we can't run the event loop, can't calltls::set, can't run theAsyncFnOnce, etc. It's like theStoreis locked, and there's nothing we can do except either wait for the fiber to resume and release the store or give up and drop everything; we do the latter to enforce the timeout.One way out of this would be to rewrite all of
wasmtime-wasi's p2 support to useasync | storewhen generating import bindings for any blocking operation, in which case the fiber would release access to theStorewhen the guest sleeps, meaning none of the above would be an issue. Not sure we have an appetite for that right now, though.
dicej edited a comment on PR #11807:
Can you speak more to why this behavior is happening? I understand why all wasm would be blocked, but I don't understand why all other host interactions are also blocked.
If you want to trace though the code,
run_concurrentcallspoll_untilcallshandle_work_itemcallsresume_fibercallsfiber::resolve_or_release. In this case, the fiber ends up suspending _without_ releasing the store, so theresolve_or_releaseFuture::pollreturnsPending, which bubbles all the way back up torun_concurrent. At this point, nothing can be done with theStore-- the fiber has exclusive access, so we can't run the event loop, can't calltls::set, can't run theAsyncFnOnce, etc. It's like theStoreis locked, and there's nothing we can do except either wait for the fiber to resume and release the store or give up and drop everything; we do the latter to enforce the timeout.One way out of this would be to rewrite parts of
wasmtime-wasi's p2 support to useasync | storewhen generating import bindings for any blocking operation, in which case the fiber would release access to theStorewhen the guest sleeps, meaning none of the above would be an issue. Not sure we have an appetite for that right now, though.
dicej updated PR #11807.
dicej updated PR #11807.
alexcrichton updated PR #11807.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Ok after thinking more about this, I think the main substantial change I'd like to see is the removal of this timeout. Given what we clarified in https://github.com/bytecodealliance/wasmtime/pull/11871 it means that this timeout effectively cannot work, so I don't think we should be relying on it. for anything here.
Otherwise while I'd still like to tweak things about this loop I don't think it's necessarily worth it at this time.
alexcrichton created PR review comment:
s/SeqCst/Relaxed/
alexcrichton created PR review comment:
This can be private, right? (ideally it wouldn't be exposed)
dicej submitted PR review.
dicej created PR review comment:
Are you sure about that? The ordering of this operation relative to the
Queue::pushthat happens before this must be preserved, whereasRelaxedis documented to have "No ordering constraints, only atomic operations". I imagine we could get away with something less strict thanSeqCsthere (maybeRelease?), butRelaxedseems to weak in this case.
dicej edited PR review comment.
dicej submitted PR review.
dicej created PR review comment:
serve.rs(which is in a different crate) needs access to it; not sure if there's a way around that.
dicej submitted PR review.
dicej created PR review comment:
(and not just
serve.rs, but anyone who wants to use this module, presumably)
alexcrichton submitted PR review.
alexcrichton created PR review comment:
In this case
worker_countisn't protecting anythingunsafeor anything like that so AFAIKRelaxedis all that's needed. Anything stronger is only required typically whenunsafeis involved or it's specifically trying to synchronize with some other modification ofworker_countto see updates made somewhere else, but in this case the state is in a mutex anyway so I don't think that's required
alexcrichton submitted PR review:
Talked some more with @dicej offline about this, and more-or-less it's not really worth spending more time bikeshedding things. This is working well-enough as-is and while I'm not perfectly satisfied it's not worth continuing to refine things at this point. The API is solid enough that it's fine to iterate in-tree.
dicej submitted PR review.
dicej created PR review comment:
Alex and I chatted about this synchronously. He actually meant to say he wanted to remove the _other_ timeout in this closure. I pointed out that the other timeout is guaranteeing the invariant that we never kill an instance with at least once task that has not yet reached its timeout; it ensures that by setting
reuse_count = max_instance_reuse_countas soon as any task reaches the timeout.The cost of guaranteeing that invariant is that the code is more complicated than it otherwise would be, and more subtle, which is unfortunate, but abandoning the invariant would make it unusable for practical purposes, and it's not clear there's a better way to structure the code that still provides the same guarantee.
I'll add some more code comments to point out the subtleties here, which should help with future maintenance.
lann submitted PR review.
lann created PR review comment:
What is
req_id?
dicej updated PR #11807.
dicej submitted PR review.
dicej created PR review comment:
wasmtime serveuses it to prefix lines emitted by the guest tostderrandstdout, so we plumb it through. It will only be non-Noneifmax_instance_reuse_count == 1; otherwise, we can't reliably do that prefixing given that the store will be reused (although I suppose we could make it work for sequential-only reuse with more effort).
dicej edited PR review comment.
dicej edited PR review comment.
dicej submitted PR review.
dicej created PR review comment:
I'll expand the doc comment.
dicej updated PR #11807.
dicej submitted PR review.
dicej created PR review comment:
Ok, taking another look at this, I've realized I glossed over that this is a
loadrather than astore, in which case what I wrote above doesn't apply. I'll study this a bit more.
dicej edited PR review comment.
lann submitted PR review.
lann created PR review comment:
It might be nice if these allowed for more flexible logic, maybe even with default impls, e.g.
fn should_reuse_instance(&self, reuse_count: usize) -> bool { true } fn should_reuse_concurrently(&self, reuse_count: usize, current_tasks: usize) -> bool { false }
lann submitted PR review.
lann created PR review comment:
Actually the things I would have in mind for this would be more global and could be managed by whatever is calling
Worker::spawn, but now it isn't clear to me why these limits should be managed byWorkerat all... :thinking:
lann edited PR review comment.
dicej updated PR #11807.
lann edited PR review comment.
dicej created PR review comment:
Okay, I just pushed an update with more comments and also changed a _different_
SeqCsttoRelaxed. I'm guessing the remainingSeqCsts are stronger than we actually need, but the ordering requirements between theQueueoperations involving aMutexand theworker_countAtomicUsizeoperations are a bit subtle, and I'm erring on the side of paranoia here.
dicej submitted PR review.
I'm going to hold off until Monday to merge this, just in case any other concerns come up. Thanks for the attentive review!
alexcrichton merged PR #11807.
Last updated: Dec 06 2025 at 06:05 UTC