bnjbvr opened PR #2723 from mach-ports-continued
to main
:
This is #2632 rebased and continued. Many thanks to @alexcrichton for providing a lot of help with the CFI directives in the custom assembly stub; this really helped simplify the design of frame unwinding.
This is sufficient to make mach ports work on Mac M1, with the slight exception that the system's libunwind on this machine doesn't correctly interpret the CFI directives generated by the assembly stub. As a consequence, the stack trace information for wasm call frames is missing. Retrieving the source of another libunwind implementation, in Apple's LLVM fork or from the opensource.apple.com website, building it and linking it against wasmtime is sufficient to get it working as expected. In addition to this, some experiments showed that the system libunwind's source code doesn't match what's on Apple's repository: this was inferred from looking at an abort error message's line number, and seeing it didn't match the line number for the same message in local sources. As a matter of fact, this is impossible to debug as is. Yet it seemed better to have something that worked, even incompletely. In the future, either we can wait for a new functioning version of OSX libunwind, or we could consider shipping our own, as a small C/C++ dependency.
This is also blocked on https://github.com/fitzgen/mach/pull/64 to be landed and released first.
There's a bit more work to do to fully support Mac M1: I'm seeing WASI failures when trying to run the full test suite, to be handled in a second time.
bnjbvr updated PR #2723 from mach-ports-continued
to main
.
bnjbvr closed without merge PR #2723.
bnjbvr reopened PR #2723 from mach-ports-continued
to main
.
bnjbvr updated PR #2723 from mach-ports-continued
to main
.
bnjbvr updated PR #2723 from mach-ports-continued
to main
.
bnjbvr updated PR #2723 from mach-ports-continued
to main
.
alexcrichton submitted PR Review.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Could this be moved to the bottom of the loop since we'll only possibly hit this case after a resumption from a suspend?
alexcrichton created PR Review Comment:
You know, thinking about it, we could probably store these values in volatile registers like x3/4, and the unwinding information could restore the information from those registers rather than from the stack possibly. That may avoid needing a stack allocation here at all.
(or we could store them in callee-save registers which are more likely to be restored when unwinding in native code)
alexcrichton created PR Review Comment:
eh I wouldn't really say that this function has a signature because it's not natively callable, but this could perhaps describe the input register expectations.
alexcrichton created PR Review Comment:
Does overwriting
fp
need to happen? I think that the unwinding from the assembly stub doesn't restore this overwritten value as well?
alexcrichton created PR Review Comment:
(that may also remove the need to call
lazy_per_thread_init
manually)
alexcrichton created PR Review Comment:
I think with thread-switching support this map will need to be updated when async wasm crosses thread. I think in
TlsRestore::take
we'd remove something from this map and inTlsRestore::restore
we'd re-insert back into the map.
cfallin submitted PR Review.
cfallin created PR Review Comment:
"strictly different from LR" instead?
cfallin submitted PR Review.
cfallin created PR Review Comment:
Can we link the SpiderMonkey source here (e.g. via Searchfox)?
cfallin created PR Review Comment:
I think it's necessary because there is no saved FP in the unwinding info for the stub (?) -- the 16-byte pushed frame is the pair (x3, LR). But then, why
SP + 16
rather than the actual priorFP
value?
cfallin created PR Review Comment:
The overall flow here was slightly unclear to me at first -- it might be good to add a figure summarizing how the stack is manipulated (here or where the frame is created in traphandlers.rs, doesn't matter much).
bnjbvr created PR Review Comment:
Indeed! What I did was subtle: the
raw::replace
function was also registering the new CallThreadState for this thread, so it was done by theraw::replace
calls inTlsRestore::take
andTlsRestore::restore
. The missing piece was to call the lazy per thread init in restore, as we might have crossed a thread boundary there. Commented to this effect. It's much simpler, thanks.
bnjbvr submitted PR Review.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Good idea, done!
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
Alex is right, we were not restoring its value; it appears to work equally well without it, so I've removed it.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
I've tried this, and it doesn't work; apparently libunwind reaaaally wants to restore the register value from the CFA. Tried both
cfa_register x3, x3
orcfa_same_value x3
to indicate the value hadn't changed, but this wasn't apparently sufficient. Oh well.Any concern about doing it this way? I think it should be fine, even in case of stack overflow, because we should have some ballast before the actual stack overflow guard page, etc.
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
btw, it was only necessary to save pc, not LR: in the cranelift generated prologue of wasm functions, LR is saved, with an accompanying cfi directive, so there's no need to save it ourselves.
bnjbvr updated PR #2723 from mach-ports-continued
to main
.
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
Nah no concerns, just wondering if we could be super clever. Now's probably not the time to be clever though!
alexcrichton submitted PR Review.
alexcrichton created PR Review Comment:
This feels somewhat iffy to me that we rely on Cranelift to restore
lr
to unwind since thisblr
is otherwise overwriting the value of the wasm frame'slr
. In the future if we optimize cranelift to not savelr
to the stack for leaf functions, for example, we'd have to update this?I figured it was easy enough to push
lr
to the stack and have a CFI directive for how to restore it?
bnjbvr submitted PR Review.
bnjbvr created PR Review Comment:
As discussed on zulip, turns out we can even:
- set
lr
from the faulting pc- jump to unwind directly
- burn the assembly stub with :fire:
I tried doing something like this a week or so ago, but it wasn't working (probably because of the intertwined return-address signing issue!). Good riddance!
bnjbvr updated PR #2723 from mach-ports-continued
to main
.
bnjbvr updated PR #2723 from mach-ports-continued
to main
.
bnjbvr updated PR #2723 from mach-ports-continued
to main
.
alexcrichton merged PR #2723.
Last updated: Nov 22 2024 at 16:03 UTC