alexcrichton opened PR #7461 from alexcrichton:less-tasks
to bytecodealliance:main
:
This commit updates the wasi-http
BodyWriteStream
implementation to avoid an extra task spawned to manage the outgoing body. Some local profiling shows that a "hello world" component currently maxes out at 500rps locally which is suprisingly low.Some investigation shows that one of the primary reasons for this is concurrency-related issues. For example
taskset -c 1 wasmtime serve ...
has a significantly better rps than withouttaskset
. My hypothesis is that the extra tasks for both reading the incoming body and writing the outgoing body are introducing opportunities for the guest to block and get work-stolen to another thread in the "happy path" where reads/writes are immediately ready and don't need blocking.For example writing to the body currently sends the chunk of bytes to a task which then sends it to another task. This means that if a guest writes some bytes and then waits for readiness it'll synchronize on these bytes by doing a round-trip with another task. This opportunity for blocking apparently lets something, be it Linux or the Tokio scheduler, to work-steal the wasm task to another thread.
Once a wasm task is work-stolen to another thread it can significantly increase the cost of VM management syscalls because any happy path the kernel might have for a single thread is no longer applicable, meaning that many costs start shooting through the roof.
The fix here in this PR is to remove a helper task when writing HTTP bodies. This extra task didn't end up doing much and was largely just a proxy into a channel, so the channel is now directly manipulated instead. Namely the
flush
implementation in the old worker task didn't actually do anything and so the only thing that needs to be managed in the stream is sending chunks along through the channel. This proved relatively easy and enables the "happy path" where if a body is written and then flushed it's immediately "ready" so no opportunity for a context switch happens and wasm stays planted on its thread.The end result of this PR is that
wasmtime serve
goes from 500rps to 5000rps (10x increase) for the simple "hello world" example that I'm working with.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
alexcrichton requested wasmtime-core-reviewers for a review on PR #7461.
alexcrichton requested pchickey for a review on PR #7461.
alexcrichton requested elliottt for a review on PR #7461.
alexcrichton requested pchickey for a review on PR #7461.
dicej submitted PR review.
pchickey submitted PR review:
Thanks for finding this big performance improvement!
I'm concerned that this will only send the pending out to the hyper body if the user takes action by blocking on the pollable, calling check-write, or flush. I can imagine an application which doesn't explicitly turn that crank and is then confused why output isnt happening. I don't see any other way to pull this off, immediately, so idk. I will spend some time thinking on it...
alexcrichton updated PR #7461.
pchickey submitted PR review:
Awesome, thank you!
elliottt submitted PR review:
Awesome perf improvement!
alexcrichton updated PR #7461.
alexcrichton has enabled auto merge for PR #7461.
alexcrichton merged PR #7461.
Last updated: Jan 24 2025 at 00:11 UTC