Stream: wasmtime

Topic: wasmtime reftypes on aarch64


view this post on Zulip Chris Fallin (Jul 16 2020 at 17:14):

@fitzgen (he/him) , with the regalloc update (PR #2034), the enable-gc-tests-on-aarch64 PR almost passes, except for gc_during_gc_from_many_table_gets; the debug spew seems to indicate that it's not finding the stack canary

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:15):

I'm thinking this may be an issue with the way that stackframes work on aarch64; in particular, the FP tuple may be at the bottom of an activation record rather than the top (i.e., we don't have the equivalent of push rbp / mov rbp, rsp at the top of every function like x86 does) so we may not actually step over the canary

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:15):

thoughts? it's totally possible I'm missing the intended functionality here

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:17):

Here's some debug spew (with my local addition for the backtrace loop):

[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] start GC
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] stack_canary = 4001586718
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] backtrace: pc 400027b2e0 sp 4001586190
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] backtrace: pc 40002763c0 sp 40015861f0
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] backtrace: pc 4000275c34 sp 40015862c0
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] backtrace: pc 400026a214 sp 40015862f0
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] backtrace: pc 40cc61e134 sp 4001586320
[2020-07-16T17:08:59Z WARN  wasmtime_runtime::externref] did not find stack canary; skipping GC sweep
[2020-07-16T17:08:59Z DEBUG wasmtime_runtime::externref] end GC

view this post on Zulip fitzgen (he/him) (Jul 16 2020 at 17:19):

context: the point of the stack canary is to ensure that we've walked every Wasm frame. If we didn't walk every Wasm frame, then when we sweep, we might reclaim objects that are still in use, leading to use-after-free. If we don't observe the stack canary, then we skip the sweep to avoid these issues

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:19):

it's expected that (if all the frame management is working right) whenever we're in Wasm code, we should reach the canary, right?

view this post on Zulip fitzgen (he/him) (Jul 16 2020 at 17:20):

yes, this is a workaround for .NET always emitting windows stack unwind info, even on linux/macos, and libunwind doesn't know how to handle that info, so it fails to keep walking

view this post on Zulip fitzgen (he/him) (Jul 16 2020 at 17:21):

so if someone calls into wasm, which calls into .NET, and then a GC is triggered, we miss all the frames older than the .NET frames

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:21):

I see, OK

view this post on Zulip fitzgen (he/him) (Jul 16 2020 at 17:22):

Chris Fallin said:

I'm thinking this may be an issue with the way that stackframes work on aarch64; in particular, the FP tuple may be at the bottom of an activation record rather than the top (i.e., we don't have the equivalent of push rbp / mov rbp, rsp at the top of every function like x86 does) so we may not actually step over the canary

can you explain a little more here? I'm not sure I follow

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:23):

So basically, as far as I understand, backtrace should walk the stack by following the linked-list from fp (the frame pointer register); each entry has fp at offset 0 and lr (return address link register) at offset 8

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:23):

On aarch64, the stack management is somewhat flexible

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:24):

So some compilers (e.g. I just disassembled /bin/ls on our aarch64 box and found an example) allocate stack storage above this record

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:24):

On x86 it'd be the equivalent of sub rsp, $MY_LOCALS_SIZE / push rbp / mov rbp, rsp

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:25):

So if one is hoping that the two fp values (which I assume is what backtrace returns for the "stack pointer"?) land on either side of some local stack storage, this might not happen if the very topmost frame is where it lives

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:25):

(What's not clear to me is whether the canary is in the very topmost frame: the unittest framework should call the #[test] function from somewhere else; but maybe the Rust code is compiled without frame pointers?)

view this post on Zulip fitzgen (he/him) (Jul 16 2020 at 17:26):

the canary is not in the topmost frame, but in the youngest frame before we enter the oldest wasm frame

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:26):

Hmm, OK, so perhaps the problem is just that the backtrace walk stops prematurely

view this post on Zulip fitzgen (he/him) (Jul 16 2020 at 17:27):

there could be older frames than the frame containing the canary, but they would all be non-wasm frames

view this post on Zulip fitzgen (he/him) (Jul 16 2020 at 17:27):

Chris Fallin said:

Hmm, OK, so perhaps the problem is just that the backtrace walk stops prematurely

Yeah, sounds like this to me

view this post on Zulip fitzgen (he/him) (Jul 16 2020 at 17:27):

to double check: on aarch64 the stack grows down, right?

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:28):

The smoke-test passes now so it seems at least some of the time, this works

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:28):

Yes, that's right

view this post on Zulip fitzgen (he/him) (Jul 16 2020 at 17:28):

ok, good because basically all of this code assumes that :)

view this post on Zulip Alex Crichton (Jul 16 2020 at 17:28):

@Chris Fallin afaik stack walking doesn't work on aarch64 b/c we don't emit unwind tables?

view this post on Zulip Alex Crichton (Jul 16 2020 at 17:28):

I think all the other trap-related tests that rely on unwinding are ignored as well

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:28):

Oh! Does backtrace want those too?

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:29):

OK, that's a big chunk of not-happening-today then :-)

view this post on Zulip Alex Crichton (Jul 16 2020 at 17:29):

yeah lol that's ok

view this post on Zulip Alex Crichton (Jul 16 2020 at 17:29):

but yeah backtrace relies on libunwind which relies on dwarf unwind tables

view this post on Zulip Alex Crichton (Jul 16 2020 at 17:29):

and we only emit those on x86_64 right now

view this post on Zulip fitzgen (he/him) (Jul 16 2020 at 17:29):

backtrace wraps libunwind which relies on DWARF/ehidx (or whatever its called)

view this post on Zulip Alex Crichton (Jul 16 2020 at 17:29):

for the old backend

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:29):

OK cool, well, more reason to get to those soon. In any case, I'm happy we're not throwing random segfaults

view this post on Zulip Chris Fallin (Jul 16 2020 at 17:29):

Thanks!

view this post on Zulip fitzgen (he/him) (Jul 16 2020 at 17:31):

/me is updating ubuntu; offline for a while


Last updated: Nov 22 2024 at 16:03 UTC