Stream: git-wasmtime

Topic: wasmtime / PR #9093 wasmtime: Check stack limits only on ...


view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2024 at 19:40):

jameysharp opened PR #9093 from jameysharp:wasmtime-probe-stack to bytecodealliance:main:

Currently, we do an explicit check for stack overflow on entry to every WebAssembly function. But that costs some time, and is a significant performance hit for very short functions.

This commit instead switches Wasmtime to relying on guard pages at the end of the stack to catch stack overflow, so the MMU does this check for "free". This means we may allow deeper recursion in guest code than we did before.

To make this work, we need Wasmtime's signal handlers to recognize when a guest memory fault is in a stack guard page and report the appropriate stack-overflow trap code.

Note that we can't turn host-code signals into guest traps, so the signal handlers have to verify that the signal occurred in guest code.

When the guest calls host code (explicitly due to calling an imported host function, or implicitly due to a libcall inserted by Wasmtime or Cranelift), we also need to ensure that there is enough stack space available for the host code to not hit the guard pages. We do that by checking the stack limit that the embedder provided in the trampolines where we exit wasm.


I've been laid off from Fastly so I won't be pursuing this further, but I feel good about the work I put into it so far and want to leave it here in case anyone else wants to pick it up.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2024 at 17:53):

sunfishcode updated PR #9093.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2024 at 20:40):

sunfishcode updated PR #9093.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2024 at 20:40):

sunfishcode commented on PR #9093:

I've now rebased this and ported it to main. There some test failures at the moment.

thread 'stack_overflow::host_always_has_some_stack' has overflowed its stack
fatal runtime error: stack overflow

which I'm looking into.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 13 2024 at 21:44):

github-actions[bot] commented on PR #9093:

Subscribe to Label Action

cc @fitzgen

<details>
This issue or pull request has been labeled: "pulley", "wasmtime:api"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 22:43):

sunfishcode commented on PR #9093:

The stack overflow in the host appears to be because the stack limit checks are in the trampolines, however host libcalls don't use trampolines. The testcase disables sse4_1 and does f32.ceil, which causes it to get a host libcall, so it calls from guest to host without a trampoline, and therefore without checking the stack limit.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 23:04):

cfallin commented on PR #9093:

I remember talking about the libcall problem at the time with @jameysharp but have only vague recollections of the various answers we came up with; but IIRC we did decide that recognizing the overflow by fault address in guard page was more reliable than trying to deduce fault reason from PC (because we don't know the first instruction that might touch the guard page in the <1 page frame size case); I see this PR does that with some logic checking if the fault address is within a certain distance of SP; I remember us realizing that this might also (should also?) work for libcalls because natively compiled code is also compatible with a guard-page scheme. So I guess I'm curious: what's the reason that this condition isn't triggering in this case?

(Apologies if I've lost too much context -- it's been a long time and I skimmed this PR again but there are many subtleties!)

view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 23:08):

fitzgen commented on PR #9093:

Also it seems like libcalls should probably do stack checks as well, since I would be hesitant to try and recover from runtime code hitting the guard page, so it might make sense to make the locals use trampolines like imports do.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 18 2024 at 23:34):

alexcrichton commented on PR #9093:

If it helps I've got an old wasmtime branch which removed the stack limit checks from functions through generating trampolines for all libcalls. I never pushed that over the finish line though in terms of polish and testing

view this post on Zulip Wasmtime GitHub notifications bot (Dec 19 2024 at 18:26):

jameysharp commented on PR #9093:

I've lost the context I had on this.

I believe Wasmtime libcalls work correctly in this PR, because they use Wasmtime's trampolines, which this PR makes do stack limit checks. It's only the Cranelift libcalls that still don't work.

There was some reason why we can only safely use the guard page if we see that the faulting instruction is in Cranelift-compiled code. I think because our only recourse in the signal handler is to fully unwind the stack, and we can't trigger Rust panic unwinding let alone any libc cleanup, so there's no telling what invariants are broken if we just rewind the stack in native code. But the specific way we use Cranelift is always safe to just drop everything at any time.

I remember at one point intending to add support to Cranelift for inserting a stack limit check immediately before any libcall, or insert it in the prologue like today but only if some libcalls appear somewhere in the function. I don't remember if that was the last plan I had in mind or if something had superceded it.

I definitely carefully considered Alex's old branch. I only found out about it after landing on the idea of checking the stack pointer in the signal handler, and it was reassuring to see we'd both landed in the same place there.

I was not comfortable with the specific mechanism in that branch for getting stack limit checks into Cranelift libcalls.

End result was calling two extra Cranelift functions plus a Rust function before the real libcall could run, and that felt like a lot of trouble to me.

I hope this helps!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 20 2024 at 01:36):

sunfishcode updated PR #9093.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 20 2024 at 01:57):

sunfishcode commented on PR #9093:

@jameysharp Thanks, that's useful context!

@alexcrichton Thanks for that branch. It's worth considering that approach, however if possible, I'd like to try taking an approach where we just pass vmctx to the libcalls and use trampolines, so that they work the same as other Wasmtime builtins, rather than having the libcall code do special things.

I've now added patches to experiment with making the calls to ceil/floor/etc. work like normal Wasmtime builtins, meaning they get a vmctx and a trampoline in the normal way. This involves handling them in the Wasm-to-Cranelift translator, which is a little annoying, but has the nice side effect of eliminating a whole separate relocation mechanism from Wasmtime, so I think it's an overall simplification. See the changes in crates/cranelift/src/translate/code_translator.rs for the place where this hooks in.

There's a test failure in host_sefault.rs. The test attempts to trigger a crash by deliberately misconfiguring the host, and with this patch, we interpret the crash as a Wasm trap, which causes the test to fail. I need to investigate more.

There are lots of Pulley failures still. I haven't yet looked at how all this interacts with Pulley.

And the patch is a bit rough still. I need to tidy up all the loose ends around x86_pshufb and __m128i and things, and some bits of unnecessary relocation code left behind, and probably other things too.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 20 2024 at 16:46):

alexcrichton commented on PR #9093:

This involves handling them in the Wasm-to-Cranelift translator, which is a little annoying, but has the nice side effect of eliminating a whole separate relocation mechanism from Wasmtime, so I think it's an overall simplification

Oh I really like that strategy, much better than the hack I had :+1:

There are lots of Pulley failures still. I haven't yet looked at how all this interacts with Pulley.

Feel free to ignore any pulley failures where possible, but if that's causing issues I can help take a deeper look too!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 20 2024 at 22:48):

sunfishcode updated PR #9093.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 20 2024 at 22:59):

sunfishcode updated PR #9093.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 20 2024 at 23:37):

sunfishcode updated PR #9093.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 21 2024 at 01:00):

sunfishcode commented on PR #9093:

The patch is now tidied up more. I still need to fix the host_segfault.rs test, and there's a todo to figure out the stack pointer on s390x, but I think it's otherwise in pretty good shape.

@alexcrichton The CI shows a lot of pulley failures; any help you could provide with those would be appreciated!


Last updated: Dec 23 2024 at 12:05 UTC