Stream: git-wasmtime

Topic: wasmtime / PR #1573 Implement stack limit checks for AArch64


view this post on Zulip Wasmtime GitHub notifications bot (Apr 21 2020 at 23:39):

alexcrichton opened PR #1573 from aarch64-stack-limit to master:

This commit implements the stack limit checks in cranelift for the
AArch64 backend. This gets the stack_limit argument purpose as well as
a function's global stack_limit directive working for the AArch64
backend. I've tested this locally on some hardware and in an emulator
and it looks to be working for basic tests, but I've never really done
AArch64 before so some scrutiny on the instructions would be most
welcome!

I also suspect that these stack limit checks aren't quite optimal, but they relatively closely match the x86 backend (which also isn't optimal), so for now I'm largely hoping for correctness as opposed to speed.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 21 2020 at 23:39):

alexcrichton edited PR #1573 from aarch64-stack-limit to master:

This commit implements the stack limit checks in cranelift for the
AArch64 backend. This gets the stack_limit argument purpose as well as
a function's global stack_limit directive working for the AArch64
backend. I've tested this locally on some hardware and in an emulator
and it looks to be working for basic tests, but I've never really done
AArch64 before so some scrutiny on the instructions would be most
welcome!

I also suspect that these stack limit checks aren't quite optimal, but they relatively closely match the x86 backend (which also isn't optimal), so for now I'm largely hoping for correctness as opposed to speed.

Closes #1569

view this post on Zulip Wasmtime GitHub notifications bot (Apr 21 2020 at 23:40):

alexcrichton updated PR #1573 from aarch64-stack-limit to master:

This commit implements the stack limit checks in cranelift for the
AArch64 backend. This gets the stack_limit argument purpose as well as
a function's global stack_limit directive working for the AArch64
backend. I've tested this locally on some hardware and in an emulator
and it looks to be working for basic tests, but I've never really done
AArch64 before so some scrutiny on the instructions would be most
welcome!

I also suspect that these stack limit checks aren't quite optimal, but they relatively closely match the x86 backend (which also isn't optimal), so for now I'm largely hoping for correctness as opposed to speed.

Closes #1569

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 10:12):

bnjbvr requested cfallin for a review on PR #1573.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 14:07):

bjorn3 created PR Review Comment:

        // documentation in x86/abi.rs for why this is here. The general idea is

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 14:07):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 19:03):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 19:03):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 19:03):

cfallin created PR Review Comment:

Hmm. I'm not wild about having this GlobalValue expansion implementation local to the stack-limit code, but upon looking at aarch64/lower.rs, it seems that in all other cases the legalization pass transforms actual GlobalValue ops into core ops as here.

Perhaps we could at least move generate_gv to aarch64/lower.rs (passing in the vmctx reg as an arg)? This could eventually be used if we shift from the current encodings-based legalization framework to something simpler/more purpose-built as well.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 19:03):

cfallin created PR Review Comment:

Factor this pattern out (also occurs above) -- take f and sig, and return reg for vmctx plus Vec<Inst>?

To allocate a new temp as an into_reg to hold stack_limit, we'll need to finagle the plumbing a bit. Vreg temps are allocated by the LowerCtx, which we've kept separate from the ABI traits, and using this will be tricky because we can't have generic methods on the ABIBody trait. The ordering is slightly tricky too: the ABIBody is created before lowering starts, so that's too early to alloc a temp, but gen_prologue happens late (we lower instructions in reverse order) so we can't do it there either.

Suggestion: perhaps add an init method on the ABIBody trait that takes a "just in case" temp? (vregs are basically free, aside from creating sparsity in the ID space, so this is fine.) Then this could set stack_limit for use by gen_prologue and any calls.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 19:03):

cfallin created PR Review Comment:

Eek, I'm not sure if it's safe to use spilltmp here -- or at least it's not clear it will interact well with anything we might write in the future. The basic invariant is that spilltmp should only be used within a sequence of instructions generated as one unit (the original use was to compute large stackframe offsets for regalloc spills/reloads, hence the name). We should probably take an into_reg as an argument of generate_gv here, and then by implication, to gen_stack_limit too.

(I should've added more documentation about this, sorry -- if you wouldn't mind adding a comment on spilltmp_reg() to this effect, that'd be very helpful, thanks!)

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 19:03):

cfallin created PR Review Comment:

Are we worried about >4K struct offsets? If we're making this into a more generic thing anyway, we could handle the overflow case here too -- basically just use Inst::load_constant(rd, value) giving a Vec<Inst> (and we can use the spilltemp reg for the constant).

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 19:03):

cfallin created PR Review Comment:

This little register dance hopefully goes away if we use a proper vreg for stack_limit...

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 19:03):

cfallin created PR Review Comment:

Actually, one more thought: after reading through the rest of the patch, it seems that stack_limit is not used beyond the prologue. Given this, I think it would be fine to use spilltmp within gen_prologue itself, and then carefully validate that we don't keep it live across anything else that could implicitly use it. This includes any virtual registers at all: the register allocator is liable to pop in and do a surprise spill with a big offset somewhere, wiping out spilltmp.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 19:03):

cfallin created PR Review Comment:

This should have a test case in the emission tests (test_aarch64_binemit() below), with the golden machine code validated as the comment notes (using real aarch64 assembler). Thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 21:12):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 21:12):

alexcrichton created PR Review Comment:

Oh nice, Inst::load_constant works great! While I suspect that >4K offsets aren't really that common it seems easy enough to support, so I've gone ahead and added it in.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 21:15):

alexcrichton submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 21:15):

alexcrichton created PR Review Comment:

So FWIW this is definitely an "unfortunate" special context. This is "duplicated" with the x86 backend as well. The reason for this is that prologue/epilogue happens so late (way after legalization and even after register allocation) that it's difficult to share code. The hope is that the number of global values we'd need to process wouldn't be that large and/or it'd be easy to extend them over time.

In that sense I suspect that the global value legalization pass is unlikely to go away (it's architecture independent too!). That being said I'm happy to move this code around. Given that do you think it'd still have a better home in lower.rs?

(I can definitely do a better job about documenting all this too)

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 22:55):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 22 2020 at 22:55):

cfallin created PR Review Comment:

Hmm, OK. I still think that we should move this to lower.rs, but other refactoring and cleanup can come later -- don't want to hold this up for it!

view this post on Zulip Wasmtime GitHub notifications bot (Apr 24 2020 at 19:35):

alexcrichton updated PR #1573 from aarch64-stack-limit to master:

This commit implements the stack limit checks in cranelift for the
AArch64 backend. This gets the stack_limit argument purpose as well as
a function's global stack_limit directive working for the AArch64
backend. I've tested this locally on some hardware and in an emulator
and it looks to be working for basic tests, but I've never really done
AArch64 before so some scrutiny on the instructions would be most
welcome!

I also suspect that these stack limit checks aren't quite optimal, but they relatively closely match the x86 backend (which also isn't optimal), so for now I'm largely hoping for correctness as opposed to speed.

Closes #1569

view this post on Zulip Wasmtime GitHub notifications bot (Apr 24 2020 at 19:51):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 24 2020 at 19:51):

cfallin created PR Review Comment:

nit: s/caler/caller/

view this post on Zulip Wasmtime GitHub notifications bot (Apr 24 2020 at 19:51):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 24 2020 at 20:01):

alexcrichton updated PR #1573 from aarch64-stack-limit to master:

This commit implements the stack limit checks in cranelift for the
AArch64 backend. This gets the stack_limit argument purpose as well as
a function's global stack_limit directive working for the AArch64
backend. I've tested this locally on some hardware and in an emulator
and it looks to be working for basic tests, but I've never really done
AArch64 before so some scrutiny on the instructions would be most
welcome!

I also suspect that these stack limit checks aren't quite optimal, but they relatively closely match the x86 backend (which also isn't optimal), so for now I'm largely hoping for correctness as opposed to speed.

Closes #1569

view this post on Zulip Wasmtime GitHub notifications bot (Apr 24 2020 at 20:01):

alexcrichton merged PR #1573.


Last updated: Nov 22 2024 at 17:03 UTC