@fitzgen (he/him) , with the regalloc update (PR #2034), the enable-gc-tests-on-aarch64 PR almost passes, except for gc_during_gc_from_many_table_gets
; the debug spew seems to indicate that it's not finding the stack canary
I'm thinking this may be an issue with the way that stackframes work on aarch64; in particular, the FP tuple may be at the bottom of an activation record rather than the top (i.e., we don't have the equivalent of push rbp / mov rbp, rsp
at the top of every function like x86 does) so we may not actually step over the canary
thoughts? it's totally possible I'm missing the intended functionality here
Here's some debug spew (with my local addition for the backtrace loop):
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] start GC
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] stack_canary = 4001586718
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] backtrace: pc 400027b2e0 sp 4001586190
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] backtrace: pc 40002763c0 sp 40015861f0
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] backtrace: pc 4000275c34 sp 40015862c0
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] backtrace: pc 400026a214 sp 40015862f0
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] backtrace: pc 40cc61e134 sp 4001586320
[2020-07-16T17:08:59Z WARN wasmtime_runtime::externref] did not find stack canary; skipping GC sweep
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] end GC
context: the point of the stack canary is to ensure that we've walked every Wasm frame. If we didn't walk every Wasm frame, then when we sweep, we might reclaim objects that are still in use, leading to use-after-free. If we don't observe the stack canary, then we skip the sweep to avoid these issues
it's expected that (if all the frame management is working right) whenever we're in Wasm code, we should reach the canary, right?
yes, this is a workaround for .NET always emitting windows stack unwind info, even on linux/macos, and libunwind doesn't know how to handle that info, so it fails to keep walking
so if someone calls into wasm, which calls into .NET, and then a GC is triggered, we miss all the frames older than the .NET frames
I see, OK
Chris Fallin said:
I'm thinking this may be an issue with the way that stackframes work on aarch64; in particular, the FP tuple may be at the bottom of an activation record rather than the top (i.e., we don't have the equivalent of
push rbp / mov rbp, rsp
at the top of every function like x86 does) so we may not actually step over the canary
can you explain a little more here? I'm not sure I follow
So basically, as far as I understand, backtrace
should walk the stack by following the linked-list from fp
(the frame pointer register); each entry has fp
at offset 0 and lr
(return address link register) at offset 8
On aarch64, the stack management is somewhat flexible
So some compilers (e.g. I just disassembled /bin/ls
on our aarch64 box and found an example) allocate stack storage above this record
On x86 it'd be the equivalent of sub rsp, $MY_LOCALS_SIZE / push rbp / mov rbp, rsp
So if one is hoping that the two fp
values (which I assume is what backtrace
returns for the "stack pointer"?) land on either side of some local stack storage, this might not happen if the very topmost frame is where it lives
(What's not clear to me is whether the canary is in the very topmost frame: the unittest framework should call the #[test]
function from somewhere else; but maybe the Rust code is compiled without frame pointers?)
the canary is not in the topmost frame, but in the youngest frame before we enter the oldest wasm frame
Hmm, OK, so perhaps the problem is just that the backtrace walk stops prematurely
there could be older frames than the frame containing the canary, but they would all be non-wasm frames
Chris Fallin said:
Hmm, OK, so perhaps the problem is just that the backtrace walk stops prematurely
Yeah, sounds like this to me
to double check: on aarch64 the stack grows down, right?
The smoke-test passes now so it seems at least some of the time, this works
Yes, that's right
ok, good because basically all of this code assumes that :)
@Chris Fallin afaik stack walking doesn't work on aarch64 b/c we don't emit unwind tables?
I think all the other trap-related tests that rely on unwinding are ignored as well
Oh! Does backtrace
want those too?
OK, that's a big chunk of not-happening-today then :-)
yeah lol that's ok
but yeah backtrace relies on libunwind which relies on dwarf unwind tables
and we only emit those on x86_64 right now
backtrace wraps libunwind
which relies on DWARF/ehidx (or whatever its called)
for the old backend
OK cool, well, more reason to get to those soon. In any case, I'm happy we're not throwing random segfaults
Thanks!
/me is updating ubuntu; offline for a while
Last updated: Dec 23 2024 at 14:03 UTC