Stebalien labeled issue #5037:
When
wasm_backtrace
is enabled, wasmtime is 300x slower in a specific stack-heavy test-case that:
- Pushes slightly less than 2048 stack elements (~300 recursive rust calls).
- Calls back out into the host, which recurses back into a new instance of the same module (about 1k times).
- Eventually traps and rewinds through all 1k recursive calls.
While I'll admit that this is an unusual load, it can cause wasmtime to basically just sit there for ~20 seconds while bailing out, when it should take milliseconds at most.
Proposed resolution: Keep the option to disable backtraces. While _very_ useful for debugging, the ability to disable them is still pretty useful.
Test Case
It's not quite as simple as a single wasm file. My guess is that there's some form of non-linear cost to keeping track of stack traces.
Steps to Reproduce
I can provide a reduced test-case that would help, it just might take a bit of time.
Non-reduced test-case:
- Checkout https://github.com/filecoin-project/ref-fvm/pull/943
- Disable https://github.com/filecoin-project/ref-fvm/pull/943/files#diff-f6ca4a1c384bf9a1f911827c3974f3f2b25357d937ce1d529732c09404ed6c92R129
- In
testing/integration/tests
, runcargo test --release native_stack_overflow
.Expected Results
With backtraces disabled, it runs in 0.06 seconds.
Actual Results
With backtraces enabled, it runs in 20 seconds.
Versions and Environment
Wasmtime: 1.0.1
Operating system: Linux (5.19.14)
Architecture: x86_64 (i7-1185G7)
Extra Info
Anything else you'd like to add?
Stebalien opened issue #5037:
When
wasm_backtrace
is enabled, wasmtime is 300x slower in a specific stack-heavy test-case that:
- Pushes slightly less than 2048 stack elements (~300 recursive rust calls).
- Calls back out into the host, which recurses back into a new instance of the same module (about 1k times).
- Eventually traps and rewinds through all 1k recursive calls.
While I'll admit that this is an unusual load, it can cause wasmtime to basically just sit there for ~20 seconds while bailing out, when it should take milliseconds at most.
Proposed resolution: Keep the option to disable backtraces. While _very_ useful for debugging, the ability to disable them is still pretty useful.
Test Case
It's not quite as simple as a single wasm file. My guess is that there's some form of non-linear cost to keeping track of stack traces.
Steps to Reproduce
I can provide a reduced test-case that would help, it just might take a bit of time.
Non-reduced test-case:
- Checkout https://github.com/filecoin-project/ref-fvm/pull/943
- Disable https://github.com/filecoin-project/ref-fvm/pull/943/files#diff-f6ca4a1c384bf9a1f911827c3974f3f2b25357d937ce1d529732c09404ed6c92R129
- In
testing/integration/tests
, runcargo test --release native_stack_overflow
.Expected Results
With backtraces disabled, it runs in 0.06 seconds.
Actual Results
With backtraces enabled, it runs in 20 seconds.
Versions and Environment
Wasmtime: 1.0.1
Operating system: Linux (5.19.14)
Architecture: x86_64 (i7-1185G7)
Extra Info
Anything else you'd like to add?
alexcrichton commented on issue #5037:
Thanks for reporting this! Digging in a bit here one aspect I believe is that we have quadratic behavior capturing the backtraces with interleavings of host and wasm frames, specifically each time Wasmtime returns with a trap back to wasm a backtrace is recaptured only to get thrown away later, meaning that with N sequences of host stack frames we'll capture a backtrace N times (hence the quadratic behavior). That may be one of the contributing factors to this and would probably be best to fix. @fitzgen would you be up for taking a look at this?
In the meantime @Stebalien would you be able to produce a more standalone example of what's going on? I made a simple reproduction myself but was unable to get more than 600 ish interleavings of host/wasm deep so I'm not sure how to get up to the multiple thousands you describe.
Stebalien commented on issue #5037:
I made a simple reproduction myself but was unable to get more than 600 ish interleavings of host/wasm deep so I'm not sure how to get up to the multiple thousands you describe.
Ah, we run wasmtime on a 64MiB stack to support cases like this. But I'll cobble together a reproducer.
fitzgen commented on issue #5037:
Looking into this.
fitzgen closed issue #5037:
When
wasm_backtrace
is enabled, wasmtime is 300x slower in a specific stack-heavy test-case that:
- Pushes slightly less than 2048 stack elements (~300 recursive rust calls).
- Calls back out into the host, which recurses back into a new instance of the same module (about 1k times).
- Eventually traps and rewinds through all 1k recursive calls.
While I'll admit that this is an unusual load, it can cause wasmtime to basically just sit there for ~20 seconds while bailing out, when it should take milliseconds at most.
Proposed resolution: Keep the option to disable backtraces. While _very_ useful for debugging, the ability to disable them is still pretty useful.
Test Case
It's not quite as simple as a single wasm file. My guess is that there's some form of non-linear cost to keeping track of stack traces.
Steps to Reproduce
I can provide a reduced test-case that would help, it just might take a bit of time.
Non-reduced test-case:
- Checkout https://github.com/filecoin-project/ref-fvm/pull/943
- Disable https://github.com/filecoin-project/ref-fvm/pull/943/files#diff-f6ca4a1c384bf9a1f911827c3974f3f2b25357d937ce1d529732c09404ed6c92R129
- In
testing/integration/tests
, runcargo test --release native_stack_overflow
.Expected Results
With backtraces disabled, it runs in 0.06 seconds.
Actual Results
With backtraces enabled, it runs in 20 seconds.
Versions and Environment
Wasmtime: 1.0.1
Operating system: Linux (5.19.14)
Architecture: x86_64 (i7-1185G7)
Extra Info
Anything else you'd like to add?
Last updated: Jan 24 2025 at 00:11 UTC