brettcannon opened issue #5152:
Thanks for filing a bug report! Please fill out the TODOs below.
Note: if you want to report a security issue, please read our security policy!
Test Case
test_marshal
from Python's test suite.Steps to Reproduce
Easiest to just look at a GitHub Action run, e.g. https://github.com/tiran/cpython-wasm-test/actions/runs/3342891022/jobs/5535571299 .
Expected Results
The test to at least complete.
Actual Results
Call stack exhaustion which doesn't occur under wasmtime 1.0.1.
https://github.com/tiran/cpython-wasm-test/actions/runs/3294946299/jobs/5432988265#step:7:408
Versions and Environment
Wasmtime version or commit: latest
Operating system: Linux
Architecture: x64
Extra Info
This started occurring on October 22: https://github.com/tiran/cpython-wasm-test/actions/workflows/wasi.yml . As mentioned above, wasmtime 1.0.1 passes without issue and everything was fine until something changed between October 19th @ 22:04 and October 20th @ 22:13 for what https://wasmtime.dev/install.sh installs.
brettcannon labeled issue #5152:
Thanks for filing a bug report! Please fill out the TODOs below.
Note: if you want to report a security issue, please read our security policy!
Test Case
test_marshal
from Python's test suite.Steps to Reproduce
Easiest to just look at a GitHub Action run, e.g. https://github.com/tiran/cpython-wasm-test/actions/runs/3342891022/jobs/5535571299 .
Expected Results
The test to at least complete.
Actual Results
Call stack exhaustion which doesn't occur under wasmtime 1.0.1.
https://github.com/tiran/cpython-wasm-test/actions/runs/3294946299/jobs/5432988265#step:7:408
Versions and Environment
Wasmtime version or commit: latest
Operating system: Linux
Architecture: x64
Extra Info
This started occurring on October 22: https://github.com/tiran/cpython-wasm-test/actions/workflows/wasi.yml . As mentioned above, wasmtime 1.0.1 passes without issue and everything was fine until something changed between October 19th @ 22:04 and October 20th @ 22:13 for what https://wasmtime.dev/install.sh installs.
brettcannon commented on issue #5152:
/cc @tiran
cfallin commented on issue #5152:
A few bits of info that would help, if you wouldn't mind investigating further on your end:
- What is this test case doing? Is it supposed to recurse deeply, but "not too deeply"? In other words, is the failure because we don't support a depth of recursion that we previously did, or because something else went wrong (correctness bug etc) that is causing infinite recursion now?
- If it's a quantitative (how much recursion) rather than qualitative (recursion at all) bug, could you try to "bisect the limit" a bit and play with the test case's recursion depth, to see how much worse it got? E.g., did the test slip in just under a stack limit before, and it is just over now? Or did we explode stack usage per frame by a large factor?
Nothing major changed in regalloc or codegen between 1.0 and 2.0 that would lead to a large expansion of stackframe size, at least off the top of my head, but if the above investigation shows that we somehow have a much lower limit, we can of course dig in and see what's going on.
Thanks!
jameysharp commented on issue #5152:
I haven't tried reproducing this locally, but I was curious and looked through the failing test's source code a bit. I'm guessing the specific failing test is
test_recursion_limit
.That test expects to be able to recurse over a data structure nested about 2,000 levels deep, and expects
marshal.c
'sw_object
function to raiseValueError
if the data structure is any deeper. Since that function is mutually recursive withw_complex_object
, this corresponds to a C stack depth of about 4,000 frames, plus a few for setting up the test. And the stack trace recorded in CI does show these two functions alternating many times, so I think this is the correct proximate cause.So I guess that makes this a "quantitative" recursion bug.
alexcrichton commented on issue #5152:
I tried to download the last successful build's copy of
python.wasm
, but it has unfortunately expired and I could not acquire it. Instead downloaded the 3.11 rc2 from here and with this file:import marshal head = last = [] MAX_MARSHAL_STACK_DEPTH = 2000 for i in range(MAX_MARSHAL_STACK_DEPTH - 2): last.append([0]) last = last[-1] data = marshal.dumps(head) new_head = marshal.loads(data)
this succeeded:
$ cargo run --release -- run --dir . --disable-cache -- python.wasm ./foo.py
for all of
main
, v1.0.0, v1.0.1, v2.0.0, and v2.0.1 on an x86_64 machine.On an aaarch64 machine v1.0.0 and
main
both stack overflow. Themain
branch stack is 3111 wasm frames deep and the v1.0.0 overflows at 3112 frames deep, so not all that different.Basically I unfortunately am unable to easily reproduce, and at least on AArch64 I can see a small difference in stack usage but nothing meaningful.
I'd echo what @cfallin said and reiterate @brettcannon if you're able it would be very helpful to reduce this issue to ideally "here's a wasm file and a directory tree and a command that fails." We can probably take it from there but without that it may be unlikely much can be done here.
alexcrichton commented on issue #5152:
Ok with some further testing and hammering I was able to get something. At 29c7de7 it takes 497712 bytes of stack to complete this test. At the next commit, 9715d91, it takes 528752 bytes of stack. The default allowed stack is 524288, which is why this is stack overflowing.
The stack required increased by 6%. Of the two co-recursive functions,
w_object
's stack size did not increase from 0x50 bytes andw_complex_object
's stack size increased from 0x80 to 0x90 bytes. This represents a 7.7% increase in stack size for these two functions combined.Overall I think that should fully explain what's going on here. A small increase in stack from
w_complex_object
happened to hit the limit. Otherwise everything appears to be normal.I'll leave @jameysharp to conclude whether https://github.com/bytecodealliance/wasmtime/pull/4966 should have increased the stack size here or not. I've extracted the single function to this wasm file: extract.wasm.gz where I get:
$ ./before compile extract.wasm && objdump -S extract.cwasm | rg 'sub.*rsp' 1a: 48 81 ec 80 00 00 00 sub $0x80,%rsp $ ./after compile extract.wasm && objdump -S extract.cwasm | rg 'sub.*rsp' 1a: 48 81 ec 90 00 00 00 sub $0x90,%rsp
locally on an x86_64 target.
@brettcannon to solve this issue for you in the meantime I believe there's a few options:
- Pin to 1.0.1
- Update the test to not recurse as much on wasi
- Wait for https://github.com/bytecodealliance/wasmtime/pull/5156 to land in 3.0.0 and pass a
--max-wasm-stack
option during testing to increase the default stack size.We could alternatively consider increasing the default stack size in a platform-specific manner. I believe the current limit is relatively restrictive due to our testing on Windows where the main thread stack was smaller by default than other platforms. Ideally though we could keep consistent platform behavior.
cfallin commented on issue #5152:
Thank you @alexcrichton for bisecting this!
I too am curious why exactly we use 16 extra bytes in the stackframe in this function. We can look into that a bit; however I think the high-order bit here, actually much more important, is to set expectations/norms around this sort of thing, so I'm going to go ahead and give some thoughts on that. (I don't mean to pick on this particular instance of the issue @brettcannon -- this is just a useful "norm-setting opportunity", so I'll make a general point.)
In general, I want to make sure we don't (accidentally or otherwise) fall into a "no regressions ever" expectation. The size of a stackframe (number of spillslots) is implementation-defined, and a Wasm compiler is free to be as efficient or as inefficient as it likes. A test (such as the one linked above) that depends on the stack size fitting just under some implementation-defined limit, given implementation-defined stackframe sizes, is a really brittle test: it could break with any minor version update. In other words, I want to be very clear here first that Wasmtime/Cranelift are perfectly standards-compliant here, and so in a strict sense, this is not a bug. I would consider a "blow up by factor of N" regression a performance bug, but this is not that: it's one extra spillslot.
Of course, all else being equal, regressions are bad. Over time the compiler should improve on average, and we would never want to take a change that adds an extra 16 bytes to a stackframe with no other benefit. But that qualification is key: most things have tradeoffs. (In the case here, 4-6% faster compile times, according to the commit message.) The impact of a strict no-regressions policy is that one ends up climbing the hill to a local maximum, and one can never really change the design otherwise: the core algorithms/approach are ossified forever (or a new algorithm/approach needs to go to absolutely heroic lengths to match or exceed the old approach in every single benchmark, and is always "on call" to be "fixed" when new cases are discovered).
Investigations of small N% regressions like this one can be useful because they can uncover real problems, or opportunities to fix a case. I'll also defer to @jameysharp whether he wants to dig into this one. I wouldn't be surprised however if it ends up being something like "values processed in slightly different order and spillslot-reuse heuristic now stops searching one step too soon" -- that sort of thing can easily happen in a complex compiler.
I'd second @alexcrichton 's suggestion to set either the recursion depth or stack limit to widen the gap between the two a bit -- in other words the most sustainable approach here is to not live right "on the edge" of the limit.
I don't want to discourage filing bugs for this sort of thing, as long as they have enough detail to be helpful; but I just want to make sure we're all on the same page re: acceptable outcomes. Happy to discuss this further if anyone would like!
alexcrichton commented on issue #5152:
I agree with everything you say, and I, too, do not wish to have a policy of zero regressions accepted. Personally I feel that this sequence of events for this issue is ok to play out, though. As an end-user you probably aren't that aware of how close to the limits you're running which means that if a release, like this one, bumps you over the limit it looks like a legitimate regression rather than a small amount more stack space. In that sense I very much want to still encourage these bugs being opened so we can investigate and figure out a conclusion.
On this particular but I personally see no issue in providing the
--max-wasm-stack
knob and considering it closed.
jameysharp commented on issue #5152:
I chatted about this offline with @cfallin and others.
I'm a little puzzled at how #4966 could have impacted stack frame size. In particular, Chris says regalloc2 does a value-merging pre-processing step that sounds to me like it should have made the input to register allocation be the same with or without the PR. Perhaps merging values earlier is enabling other Cranelift optimizations that lead to longer live-ranges?
It would certainly be nice to understand the root cause here. But I think @alexcrichton's
--max-wasm-stack
option is the best solution for making this particular CPython test reliable. And since it doesn't look like this is a sign of dramatically larger stack frames, I think it'll be okay if we don't get around to identifying the root cause.So I'm closing this issue as I believe it's resolved by #5156. @brettcannon, if that solution isn't sufficient for you, feel free to re-open with more details about what you need. Thank you for your report!
jameysharp closed issue #5152:
Thanks for filing a bug report! Please fill out the TODOs below.
Note: if you want to report a security issue, please read our security policy!
Test Case
test_marshal
from Python's test suite.Steps to Reproduce
Easiest to just look at a GitHub Action run, e.g. https://github.com/tiran/cpython-wasm-test/actions/runs/3342891022/jobs/5535571299 .
Expected Results
The test to at least complete.
Actual Results
Call stack exhaustion which doesn't occur under wasmtime 1.0.1.
https://github.com/tiran/cpython-wasm-test/actions/runs/3294946299/jobs/5432988265#step:7:408
Versions and Environment
Wasmtime version or commit: latest
Operating system: Linux
Architecture: x64
Extra Info
This started occurring on October 22: https://github.com/tiran/cpython-wasm-test/actions/workflows/wasi.yml . As mentioned above, wasmtime 1.0.1 passes without issue and everything was fine until something changed between October 19th @ 22:04 and October 20th @ 22:13 for what https://wasmtime.dev/install.sh installs.
brettcannon commented on issue #5152:
Thanks everyone for the details and looking into this! Sorry I couldn't react faster to the request for more info as I have been a bit busy with other things. :sweat_smile:
As for the result, I'm totally happy with the answers. The issue was opened more to make sure we had not found a bug as CPython is somewhat notorious in doing in edge cases of platforms. :grinning_face_with_smiling_eyes: I will lower the stack depth cap in CPython under WASI to a lower number (which is the key purpose of the test; to know when we need a lower stack depth so that we can make sure you don't overflow and we can raise an exception instead).
brettcannon commented on issue #5152:
I have opened https://github.com/python/cpython/issues/98925 to track fixing this on the CPython side.
Last updated: Nov 22 2024 at 17:03 UTC