wasmtime / issue #11974 20% increase in component latency... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #11974 20% increase in component latency...

Wasmtime GitHub notifications bot (Nov 04 2025 at 00:19):

EthanBlackburn opened issue #11974:

I upgraded wasmtime from 34.0.2 to 35.0.0 and noticed the average latency of our wasm components increased by ~20%. On-CPU profiles look slightly cheaper in 35, but off-CPU shows a large increase in time parked in finish_task_switch.isra.0. The issue persists even when upgrading to wasmtime 38.0.3

34.0.2 vs 35.0.0 on-cpu
![Image](https://github.com/user-attachments/assets/8d75e201-8329-42c8-9155-8dd5c1f3e382)

34.0.2 vs 35.0.0 off-cpu
![Image](https://github.com/user-attachments/assets/d51885cf-7354-4d62-a785-60f371548031)

34.0.2 vs 38.0.0 on-cpu
![Image](https://github.com/user-attachments/assets/bce3db8f-7384-4344-9ed3-93af9b97d31f)

34.0.2 vs 38.0.0 off-cpu
![Image](https://github.com/user-attachments/assets/e1a45f84-db49-461f-843e-932ea91710f4)

Test Case

Compiled with 34.0.2
Compiled with 35.0.0
Compiled with 38.0.3

Steps to Reproduce

Benchmarking a wasm component compiled from this go code

Build on 34.0.2
git clone https://github.com/telophasehq/tangent
cd ~/tangent/examples/golang
mkdir plugins

make run

(in a separate window)
tangent bench --config tangent.yaml --seconds 30 --payload tests/input.json
This will print some stats like
producer bytes (consumed): 6861.75 MiB → 228.66 MiB/s
guest: bytes_in=6854.63 MiB, avg_latency=12.824 ms (over 29179 calls)
Build on 38.0.3
git fetch origin wasmtime-38
git checkout wasmtime-38
make run

(in a separate window)
tangent bench --config tangent.yaml --seconds 30 --payload tests/input.json
This will print stats showing guest latency and throughput are lower
producer bytes (consumed): 5855.87 MiB → 195.15 MiB/s
guest: bytes_in=5848.09 MiB, avg_latency=15.845 ms (over 24617 calls)
Expected Results

I expected guest latency to stay the say the same

Actual Results

average guest latency as reported by the the benchmark is ~20% higher in wasmtime version 35.0.0 or higher

Versions and Environment

Wasmtime version or commit: 34.0.2 (good), 35.0.0 (bad), 38.0.3 (bad)

Operating system: Darwin

Architecture: arm64

I see the same issue on Linux aarch64

Extra Info

I feel like I've misconfigured something or am not using the async API correctly, but I'm not sure where to start looking.

Additionally, if y'all suspect there is an issue in wasmtime async I'm happy to poke around myself if I can get some pointers

Wasmtime GitHub notifications bot (Nov 04 2025 at 00:19):

EthanBlackburn added the bug label to Issue #11974.

Wasmtime GitHub notifications bot (Nov 04 2025 at 00:27):

EthanBlackburn edited issue #11974:

I upgraded wasmtime from 34.0.2 to 35.0.0 and noticed the average latency of our wasm components increased by ~20%. On-CPU profiles look slightly cheaper in 35, but off-CPU shows a large increase in time parked in finish_task_switch.isra.0. The issue persists even when upgrading to wasmtime 38.0.3

34.0.2 vs 35.0.0 on-cpu
![Image](https://github.com/user-attachments/assets/8d75e201-8329-42c8-9155-8dd5c1f3e382)

34.0.2 vs 35.0.0 off-cpu
![Image](https://github.com/user-attachments/assets/d51885cf-7354-4d62-a785-60f371548031)

34.0.2 vs 38.0.0 on-cpu
![Image](https://github.com/user-attachments/assets/bce3db8f-7384-4344-9ed3-93af9b97d31f)

34.0.2 vs 38.0.0 off-cpu
![Image](https://github.com/user-attachments/assets/e1a45f84-db49-461f-843e-932ea91710f4)

Test Case

Compiled with 34.0.2
Compiled with 35.0.0
Compiled with 38.0.3

Steps to Reproduce

Benchmarking a wasm component compiled from this go code

Build on 34.0.2
git clone https://github.com/telophasehq/tangent
cd tangent && cargo build --bin tangent --release
cd ~/tangent/examples/golang
mkdir plugins

make run

(in a separate window)
tangent bench --config tangent.yaml --seconds 30 --payload tests/input.json
This will print some stats like
producer bytes (consumed): 6861.75 MiB → 228.66 MiB/s
guest: bytes_in=6854.63 MiB, avg_latency=12.824 ms (over 29179 calls)
Build on 38.0.3
git fetch origin wasmtime-38
git checkout wasmtime-38
make run

(in a separate window)
tangent bench --config tangent.yaml --seconds 30 --payload tests/input.json
This will print stats showing guest latency and throughput are lower
producer bytes (consumed): 5855.87 MiB → 195.15 MiB/s
guest: bytes_in=5848.09 MiB, avg_latency=15.845 ms (over 24617 calls)
Expected Results

I expected guest latency to stay the say the same

Actual Results

average guest latency as reported by the the benchmark is ~20% higher in wasmtime version 35.0.0 or higher

Versions and Environment

Wasmtime version or commit: 34.0.2 (good), 35.0.0 (bad), 38.0.3 (bad)

Operating system: Darwin

Architecture: arm64

I see the same issue on Linux aarch64

Extra Info

I feel like I've misconfigured something or am not using the async API correctly, but I'm not sure where to start looking.

Additionally, if y'all suspect there is an issue in wasmtime async I'm happy to poke around myself if I can get some pointers

Wasmtime GitHub notifications bot (Nov 04 2025 at 00:52):

EthanBlackburn edited issue #11974:

I upgraded wasmtime from 34.0.2 to 35.0.0 and noticed the average latency of our wasm components increased by ~20%. On-CPU profiles look slightly cheaper in 35, but off-CPU shows a large increase in time parked in finish_task_switch.isra.0. The issue persists even when upgrading to wasmtime 38.0.3

34.0.2 vs 35.0.0 on-cpu
![Image](https://github.com/user-attachments/assets/8d75e201-8329-42c8-9155-8dd5c1f3e382)

34.0.2 vs 35.0.0 off-cpu
![Image](https://github.com/user-attachments/assets/d51885cf-7354-4d62-a785-60f371548031)

34.0.2 vs 38.0.0 on-cpu
![Image](https://github.com/user-attachments/assets/bce3db8f-7384-4344-9ed3-93af9b97d31f)

34.0.2 vs 38.0.3 off-cpu
![Image](https://github.com/user-attachments/assets/e1a45f84-db49-461f-843e-932ea91710f4)

Test Case

Compiled with 34.0.2
Compiled with 35.0.0
Compiled with 38.0.3

Steps to Reproduce

Benchmarking a wasm component compiled from this go code

Build on 34.0.2
git clone https://github.com/telophasehq/tangent
cd tangent && cargo build --bin tangent --release
cd ~/tangent/examples/golang
mkdir plugins

make run

(in a separate window)
tangent bench --config tangent.yaml --seconds 30 --payload tests/input.json
This will print some stats like
producer bytes (consumed): 6861.75 MiB → 228.66 MiB/s
guest: bytes_in=6854.63 MiB, avg_latency=12.824 ms (over 29179 calls)
Build on 38.0.3
git fetch origin wasmtime-38
git checkout wasmtime-38
make run

(in a separate window)
tangent bench --config tangent.yaml --seconds 30 --payload tests/input.json
This will print stats showing guest latency and throughput are lower
producer bytes (consumed): 5855.87 MiB → 195.15 MiB/s
guest: bytes_in=5848.09 MiB, avg_latency=15.845 ms (over 24617 calls)
Expected Results

I expected guest latency to stay the say the same

Actual Results

average guest latency as reported by the the benchmark is ~20% higher in wasmtime version 35.0.0 or higher

Versions and Environment

Wasmtime version or commit: 34.0.2 (good), 35.0.0 (bad), 38.0.3 (bad)

Operating system: Darwin

Architecture: arm64

I see the same issue on Linux aarch64

Extra Info

I feel like I've misconfigured something or am not using the async API correctly, but I'm not sure where to start looking.

Additionally, if y'all suspect there is an issue in wasmtime async I'm happy to poke around myself if I can get some pointers

Wasmtime GitHub notifications bot (Nov 04 2025 at 00:52):

EthanBlackburn edited issue #11974:

I upgraded wasmtime from 34.0.2 to 35.0.0 and noticed the average latency of our wasm components increased by ~20%. On-CPU profiles look slightly cheaper in 35, but off-CPU shows a large increase in time parked in finish_task_switch.isra.0. The issue persists even when upgrading to wasmtime 38.0.3

34.0.2 vs 35.0.0 on-cpu
![Image](https://github.com/user-attachments/assets/8d75e201-8329-42c8-9155-8dd5c1f3e382)

34.0.2 vs 35.0.0 off-cpu
![Image](https://github.com/user-attachments/assets/d51885cf-7354-4d62-a785-60f371548031)

34.0.2 vs 38.0.3 on-cpu
![Image](https://github.com/user-attachments/assets/bce3db8f-7384-4344-9ed3-93af9b97d31f)

34.0.2 vs 38.0.3 off-cpu
![Image](https://github.com/user-attachments/assets/e1a45f84-db49-461f-843e-932ea91710f4)

Test Case

Compiled with 34.0.2
Compiled with 35.0.0
Compiled with 38.0.3

Steps to Reproduce

Benchmarking a wasm component compiled from this go code

Build on 34.0.2
git clone https://github.com/telophasehq/tangent
cd tangent && cargo build --bin tangent --release
cd ~/tangent/examples/golang
mkdir plugins

make run

(in a separate window)
tangent bench --config tangent.yaml --seconds 30 --payload tests/input.json
This will print some stats like
producer bytes (consumed): 6861.75 MiB → 228.66 MiB/s
guest: bytes_in=6854.63 MiB, avg_latency=12.824 ms (over 29179 calls)
Build on 38.0.3
git fetch origin wasmtime-38
git checkout wasmtime-38
make run

(in a separate window)
tangent bench --config tangent.yaml --seconds 30 --payload tests/input.json
This will print stats showing guest latency and throughput are lower
producer bytes (consumed): 5855.87 MiB → 195.15 MiB/s
guest: bytes_in=5848.09 MiB, avg_latency=15.845 ms (over 24617 calls)
Expected Results

I expected guest latency to stay the say the same

Actual Results

average guest latency as reported by the the benchmark is ~20% higher in wasmtime version 35.0.0 or higher

Versions and Environment

Wasmtime version or commit: 34.0.2 (good), 35.0.0 (bad), 38.0.3 (bad)

Operating system: Darwin

Architecture: arm64

I see the same issue on Linux aarch64

Extra Info

I feel like I've misconfigured something or am not using the async API correctly, but I'm not sure where to start looking.

Additionally, if y'all suspect there is an issue in wasmtime async I'm happy to poke around myself if I can get some pointers

Wasmtime GitHub notifications bot (Nov 04 2025 at 02:14):

alexcrichton commented on issue #11974:

Bisection shows https://github.com/bytecodealliance/wasmtime/pull/10959 as the culprit. Local testing shows that this clone is the exact culprit.

I'm not sure how this factors in the on/off cpu graphs you're showing myself, but I've also never gotten a perf comparison to ever work before either. Nevertheless this makes sense to me because it's a highly contended atomic inc/dec which is plausible that it could cause a slowdown such as this. I'll see if I can poke around tomorrow and see what comes up.

Wasmtime GitHub notifications bot (Nov 04 2025 at 02:14):

alexcrichton added the performance label to Issue #11974.

Wasmtime GitHub notifications bot (Nov 04 2025 at 02:16):

alexcrichton commented on issue #11974:

Also, I should mention, thank you for the thorough report!

Wasmtime GitHub notifications bot (Nov 04 2025 at 05:38):

EthanBlackburn commented on issue #11974:

You got it. Thank you for the fast response! I'm also seeing the problem start on that commit.

I'll test the Arc::clone theory tomorrow as well and Im interested to see the result. My understanding is that clone would show up as on-cpu time instead of off-cpu time, but Im out of my depth here

Will report back

Wasmtime GitHub notifications bot (Nov 04 2025 at 19:15):

EthanBlackburn commented on issue #11974:

You were right!

https://github.com/bytecodealliance/wasmtime/pull/11979

Before change

producer bytes (consumed): 2798.62 MiB → 93.28 MiB/s
guest: bytes_in=2794.94 MiB, avg_latency=19.490 ms (over 12169 calls)

After change

producer bytes (consumed): 3731.03 MiB → 124.36 MiB/s
guest: bytes_in=3728.28 MiB, avg_latency=13.473 ms (over 16252 calls)

Wasmtime GitHub notifications bot (Nov 05 2025 at 20:38):

alexcrichton commented on issue #11974:

Ok https://github.com/bytecodealliance/wasmtime/pull/11987 is what I came up with for this. Still uses unsafe but it's needed for a number of other locations other than this one and is something I've wanted to do for awhile as well. My hope is that it's encapsulated as well as the external interface is a safe one, it's "just" an unsafe implementation.

Wasmtime GitHub notifications bot (Nov 06 2025 at 14:42):

EthanBlackburn commented on issue #11974:

Let's gooo! Thanks for the fast turnaround.

BTW, since you mentioned you hadn't gotten a perf comparison to work before, here's what I did

On-CPU

(run server with wasmtime 34.0.2)
perf record -F 999 -g -- tangent run --config examples/golang/tangent.yaml
perf script | ~/FlameGraph/stackcollapse-perf.pl > 34.folded

(run server with wasmtime 38.0.3)
perf record -F 999 -g -- tangent run --config examples/golang/tangent.yaml
perf script | ~/FlameGraph/stackcollapse-perf.pl > 38.folded

~/FlameGraph/difffolded.pl 34.folded 38.folded |   ~/FlameGraph/flamegraph.pl --negate --title "Wasmtime 38 vs 34 (red=regression)" > diff-34-38.svg

Off-CPU

(while our server was running with wasmtime 34.0.2)
sudo offcputime -f -p $(pgrep -n tangent) 30 > offcpu34.folded

(while our server was running with wasmtime 38.0.3)
sudo offcputime -f -p $(pgrep -n tangent) 30 > offcpu38.folded

~/FlameGraph/difffolded.pl offcpu34.folded offcpu35.folded | ~/FlameGraph/flamegraph.pl --negate --title "Off-CPU diff: 35 vs 34 (red = more wait)" > offcpu-diff-34-35.svg

Shoutout Brendan Gregg for FlameGraph and offcputime

Wasmtime GitHub notifications bot (Nov 07 2025 at 20:13):

alexcrichton closed issue #11974:

I upgraded wasmtime from 34.0.2 to 35.0.0 and noticed the average latency of our wasm components increased by ~20%. On-CPU profiles look slightly cheaper in 35, but off-CPU shows a large increase in time parked in finish_task_switch.isra.0. The issue persists even when upgrading to wasmtime 38.0.3

34.0.2 vs 35.0.0 on-cpu
![Image](https://github.com/user-attachments/assets/8d75e201-8329-42c8-9155-8dd5c1f3e382)

34.0.2 vs 35.0.0 off-cpu
![Image](https://github.com/user-attachments/assets/d51885cf-7354-4d62-a785-60f371548031)

34.0.2 vs 38.0.3 on-cpu
![Image](https://github.com/user-attachments/assets/bce3db8f-7384-4344-9ed3-93af9b97d31f)

34.0.2 vs 38.0.3 off-cpu
![Image](https://github.com/user-attachments/assets/e1a45f84-db49-461f-843e-932ea91710f4)

Test Case

Compiled with 34.0.2
Compiled with 35.0.0
Compiled with 38.0.3

Steps to Reproduce

Benchmarking a wasm component compiled from this go code

Build on 34.0.2
git clone https://github.com/telophasehq/tangent
cd tangent && cargo build --bin tangent --release
cd ~/tangent/examples/golang
mkdir plugins

make run

(in a separate window)
tangent bench --config tangent.yaml --seconds 30 --payload tests/input.json
This will print some stats like
producer bytes (consumed): 6861.75 MiB → 228.66 MiB/s
guest: bytes_in=6854.63 MiB, avg_latency=12.824 ms (over 29179 calls)
Build on 38.0.3
git fetch origin wasmtime-38
git checkout wasmtime-38
make run

(in a separate window)
tangent bench --config tangent.yaml --seconds 30 --payload tests/input.json
This will print stats showing guest latency and throughput are lower
producer bytes (consumed): 5855.87 MiB → 195.15 MiB/s
guest: bytes_in=5848.09 MiB, avg_latency=15.845 ms (over 24617 calls)
Expected Results

I expected guest latency to stay the say the same

Actual Results

average guest latency as reported by the the benchmark is ~20% higher in wasmtime version 35.0.0 or higher

Versions and Environment

Wasmtime version or commit: 34.0.2 (good), 35.0.0 (bad), 38.0.3 (bad)

Operating system: Darwin

Architecture: arm64

I see the same issue on Linux aarch64

Extra Info

I feel like I've misconfigured something or am not using the async API correctly, but I'm not sure where to start looking.

Additionally, if y'all suspect there is an issue in wasmtime async I'm happy to poke around myself if I can get some pointers

Last updated: Jun 01 2026 at 09:49 UTC