wasmtime reftypes on aarch64 · wasmtime

@fitzgen (he/him) , with the regalloc update (PR #2034), the enable-gc-tests-on-aarch64 PR almost passes, except for gc_during_gc_from_many_table_gets; the debug spew seems to indicate that it's not finding the stack canary

Chris Fallin (Jul 16 2020 at 17:15):

I'm thinking this may be an issue with the way that stackframes work on aarch64; in particular, the FP tuple may be at the bottom of an activation record rather than the top (i.e., we don't have the equivalent of push rbp / mov rbp, rsp at the top of every function like x86 does) so we may not actually step over the canary

Chris Fallin (Jul 16 2020 at 17:15):

Chris Fallin (Jul 16 2020 at 17:17):

[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] start GC
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] stack_canary = 4001586718
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] backtrace: pc 400027b2e0 sp 4001586190
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] backtrace: pc 40002763c0 sp 40015861f0
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] backtrace: pc 4000275c34 sp 40015862c0
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] backtrace: pc 400026a214 sp 40015862f0
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] backtrace: pc 40cc61e134 sp 4001586320
[2020-07-16T17:08:59Z WARN  wasmtime_runtime::externref] did not find stack canary; skipping GC sweep
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] end GC

fitzgen (he/him) (Jul 16 2020 at 17:19):

context: the point of the stack canary is to ensure that we've walked every Wasm frame. If we didn't walk every Wasm frame, then when we sweep, we might reclaim objects that are still in use, leading to use-after-free. If we don't observe the stack canary, then we skip the sweep to avoid these issues

Chris Fallin (Jul 16 2020 at 17:19):

it's expected that (if all the frame management is working right) whenever we're in Wasm code, we should reach the canary, right?

fitzgen (he/him) (Jul 16 2020 at 17:20):

yes, this is a workaround for .NET always emitting windows stack unwind info, even on linux/macos, and libunwind doesn't know how to handle that info, so it fails to keep walking

fitzgen (he/him) (Jul 16 2020 at 17:21):

so if someone calls into wasm, which calls into .NET, and then a GC is triggered, we miss all the frames older than the .NET frames

Chris Fallin (Jul 16 2020 at 17:21):

fitzgen (he/him) (Jul 16 2020 at 17:22):

Chris Fallin (Jul 16 2020 at 17:23):

So basically, as far as I understand, backtrace should walk the stack by following the linked-list from fp (the frame pointer register); each entry has fp at offset 0 and lr (return address link register) at offset 8

Chris Fallin (Jul 16 2020 at 17:23):

Chris Fallin (Jul 16 2020 at 17:24):

So some compilers (e.g. I just disassembled /bin/ls on our aarch64 box and found an example) allocate stack storage above this record

Chris Fallin (Jul 16 2020 at 17:24):

On x86 it'd be the equivalent of sub rsp, $MY_LOCALS_SIZE / push rbp / mov rbp, rsp

Chris Fallin (Jul 16 2020 at 17:25):

So if one is hoping that the two fp values (which I assume is what backtrace returns for the "stack pointer"?) land on either side of some local stack storage, this might not happen if the very topmost frame is where it lives

Chris Fallin (Jul 16 2020 at 17:25):

(What's not clear to me is whether the canary is in the very topmost frame: the unittest framework should call the #[test] function from somewhere else; but maybe the Rust code is compiled without frame pointers?)

fitzgen (he/him) (Jul 16 2020 at 17:26):

the canary is not in the topmost frame, but in the youngest frame before we enter the oldest wasm frame

Chris Fallin (Jul 16 2020 at 17:26):

Hmm, OK, so perhaps the problem is just that the backtrace walk stops prematurely

fitzgen (he/him) (Jul 16 2020 at 17:27):

there could be older frames than the frame containing the canary, but they would all be non-wasm frames

fitzgen (he/him) (Jul 16 2020 at 17:27):