wasmtime / PR #7947 winch: Move the stack overflow check ... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #7947 winch: Move the stack overflow check ...

Wasmtime GitHub notifications bot (Feb 15 2024 at 20:19):

elliottt requested wasmtime-compiler-reviewers for a review on PR #7947.

Wasmtime GitHub notifications bot (Feb 15 2024 at 20:19):

elliottt opened PR #7947 from elliottt:trevor/winch-stack-check-in-prologue to bytecodealliance:main:

Move the stack check to the function prologue in winch generated code by adding the final stack maximum to the stack min bound before comparing against the current SP.

This new approach is more accurate than the previous implementation that would only check the size of locals, but does require that we patch the function during MacroAssembler::finalize, as we don't know the stack high water mark until then. Enabling the patching required the addition of the start_patchable/end_patchable api to MachBuffer, which gives a way to name sections of the code buffer (with some restrictions) that can be edited later. We use this then to modify an add-with-immediate instruction to add the used stack space to the lower bound, when finalizing the function.

Wasmtime GitHub notifications bot (Feb 15 2024 at 20:19):

elliottt requested cfallin for a review on PR #7947.

Wasmtime GitHub notifications bot (Feb 15 2024 at 20:20):

elliottt requested saulecabrera for a review on PR #7947.

Wasmtime GitHub notifications bot (Feb 15 2024 at 20:24):

elliottt updated PR #7947.

Wasmtime GitHub notifications bot (Feb 15 2024 at 20:25):

elliottt edited PR #7947:

Move the stack check to the function prologue in winch generated code by adding the final stack maximum to the stack min bound before comparing against the current SP.

This new approach is more accurate than the previous implementation that would only check the size of locals, but does require that we patch the function during MacroAssembler::finalize, as we don't know the stack high water mark until then. Enabling the patching required the addition of the start_patchable/end_patchable api to MachBuffer, which gives a way to name sections of the code buffer (with some restrictions) that can be edited later. We use this then to modify an add-with-immediate instruction to add the used stack space to the stack lower bound from vmctx, when finalizing the function.

Wasmtime GitHub notifications bot (Feb 15 2024 at 20:56):

cfallin submitted PR review:

Overall approach seems fine; some thoughts on the x64 machine-code patching specifically:

Wasmtime GitHub notifications bot (Feb 15 2024 at 20:56):

cfallin submitted PR review:

Overall approach seems fine; some thoughts on the x64 machine-code patching specifically:

Wasmtime GitHub notifications bot (Feb 15 2024 at 20:56):

cfallin created PR review comment:

Can we add a bit more doc here? "Represents an editable region that is currently being emitted in the MachBuffer. When closed, this becomes a PatchRegion and allows editing the machine code post-emission." or something like that?

Wasmtime GitHub notifications bot (Feb 15 2024 at 20:56):

cfallin created PR review comment:

Likewise here; note what it's for.

Wasmtime GitHub notifications bot (Feb 15 2024 at 20:56):

cfallin created PR review comment:

This and the asserts on opcode bytes below feel a little fragile. For example if we change the way immediates are emitted in the future, this might break (we're tying masm's lowering strategy to a specific assumption about its output here). Also the REX byte is pinned by an assert, and might change if the regalloc picks a register in the other half of the 16-reg bank (!). Perhaps what we could do instead is something like:

Have a PatchableValueLoad abstraction (either an MInst case or a little struct here that does its own encoding, to nail down the length-invariance); ensure that it always produces machine code of the same length. ::new(value: u32, reg: Reg), ::update_value(value: u32) and ::emit_to_slice(&mut [u8]) or something like that.

Use it here to initially put a placeholder in; we can avoid using a magic value and magic knowledge about immediate encoding then. Perhaps we can even write a wrapper ::emit_to_buffer(&mut MachBuffer) -> PatchRegion.

Later, patch using this; perhaps with ::emit_to_region(&mut MachBuffer, PatchRegion).

Thoughts?

Wasmtime GitHub notifications bot (Feb 15 2024 at 20:56):

cfallin created PR review comment:

I think branches are the only case actually; and at the MachBuffer layer all that matters are branches that we tell the MachBuffer about, via add_cond_branch and add_uncond_branch. So I think we can make this more concrete by saying "It must not introduce any MachBuffer-visible branches (i.e., you must not call add_cond_branch or add_uncond_branch between start_patchable and end_patchable)."

Wasmtime GitHub notifications bot (Feb 15 2024 at 20:56):

cfallin created PR review comment:

Could we add this assert in add_cond_branch and add_uncond_branch as well?

Wasmtime GitHub notifications bot (Feb 15 2024 at 21:05):

elliottt commented on PR #7947:

The tests are failing due to the addition of stack checking to one of the trampolines (as the vmctx isn't setup yet in one case). I might roll those back so that this can be debugged separately.

Wasmtime GitHub notifications bot (Feb 15 2024 at 21:06):

elliottt updated PR #7947.

Wasmtime GitHub notifications bot (Feb 15 2024 at 23:44):

github-actions[bot] commented on PR #7947:

Subscribe to Label Action

cc @saulecabrera

<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:machinst", "winch"

Thus the following users have been cc'd because of the following labels:

saulecabrera: winch

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (Feb 16 2024 at 05:53):

elliottt submitted PR review.

Wasmtime GitHub notifications bot (Feb 16 2024 at 05:53):

elliottt created PR review comment:

I agree that it's fragile as-is, and something a bit more robust would be good here.

Adding a special PatchableValueLoad type sounds good to me, as long as you don't mind the more specific "add patched immediate to register" behavior. I wanted to avoid using an additional register here, so that the sequence of loading the stack lower bound and adding the immediate for the final stack value could be done with only the scratch register.

I'll experiment with adding a PatchableAddRegImm struct in the x64 backend in cranelift, as that would make it easy to reuse logic from the existing emit implemetation for add.

Wasmtime GitHub notifications bot (Feb 16 2024 at 06:06):

elliottt updated PR #7947.

Wasmtime GitHub notifications bot (Feb 16 2024 at 06:28):

elliottt updated PR #7947.

Wasmtime GitHub notifications bot (Feb 16 2024 at 06:35):

elliottt updated PR #7947.

Wasmtime GitHub notifications bot (Feb 19 2024 at 15:58):

saulecabrera submitted PR review:

The Winch pieces look good to me. Left a small question and a comment. I'm going to defer the final approval to Chris, given that I'm not as familiar with the MachBuffer implementation.

Wasmtime GitHub notifications bot (Feb 19 2024 at 15:58):

saulecabrera submitted PR review:

The Winch pieces look good to me. Left a small question and a comment. I'm going to defer the final approval to Chris, given that I'm not as familiar with the MachBuffer implementation.

Wasmtime GitHub notifications bot (Feb 19 2024 at 15:58):

saulecabrera created PR review comment:

    /// Regions where the addition of the used stack must be fixed up.

Wasmtime GitHub notifications bot (Feb 19 2024 at 15:58):

saulecabrera created PR review comment:

Could we add a comment here on why we're using .max? The reason why I bring this up is that we currently don't have a way to track the "canonical stack pointer offset" across the compiler, which is what we'd probably want to use here instead (that's what .max is doing here if I'm understanding correctly). We currently use the sp_offset as the single source of truth, but lately I've been thinking that it might be a good idea to change that (but not a blocker for this change!). This is mostly relevant in control flow, where we have to soft-reset the stack pointer in some situations.

Wasmtime GitHub notifications bot (Feb 19 2024 at 15:58):

saulecabrera created PR review comment:

One clarification question here: why the 2? Unless I'm missing something, I'm only able to spot a single push (via add_stack_max).

Wasmtime GitHub notifications bot (Feb 19 2024 at 17:34):

elliottt submitted PR review.

Wasmtime GitHub notifications bot (Feb 19 2024 at 17:34):

elliottt created PR review comment:

This could definitely be 1 or even an optional. Given that we could possibly call check_stack multiple times, I wanted to make sure that we would still generate valid code. Perhaps the right solution is an Option<PatchRegion> and an assert?

Wasmtime GitHub notifications bot (Feb 19 2024 at 17:39):

elliottt submitted PR review.

Wasmtime GitHub notifications bot (Feb 19 2024 at 17:39):

elliottt created PR review comment:

The intent was to track the largest stack offset throughout code generation, so that we had an upper bound for the stack space needed. As push and reserve_stack both use increment_sp, this felt like a pretty good proxy for tracking the maximum seen stack allocation, but maybe I'm missing something here?

Wasmtime GitHub notifications bot (Feb 19 2024 at 18:07):

saulecabrera submitted PR review.

Wasmtime GitHub notifications bot (Feb 19 2024 at 18:07):

saulecabrera created PR review comment:

this felt like a pretty good proxy for tracking the maximum seen stack allocation

I think it is! I was just thinking that maybe it makes sense to have a comment stating that we do this (.max) given that we currently don't explicitly track the maximum amount of stack generated anywhere in the compiler yet, which I think we could as an optimization.

Wasmtime GitHub notifications bot (Feb 19 2024 at 18:09):

saulecabrera submitted PR review.

Wasmtime GitHub notifications bot (Feb 19 2024 at 18:09):

saulecabrera created PR review comment:

Oh that makes sense, thanks for explaining. Yeah perhaps we could try going with Option<PatchRegion> + assert.

Wasmtime GitHub notifications bot (Feb 19 2024 at 20:45):

elliottt submitted PR review.

Wasmtime GitHub notifications bot (Feb 19 2024 at 20:45):

elliottt created PR review comment:

Ah! I'll add a comment to clarify the intent here. I can also put some more documentation into the sp_max field itself :+1:

Wasmtime GitHub notifications bot (Feb 20 2024 at 17:40):

elliottt updated PR #7947.

Wasmtime GitHub notifications bot (Feb 20 2024 at 20:19):

elliottt updated PR #7947.

Wasmtime GitHub notifications bot (Feb 20 2024 at 20:19):

elliottt requested cfallin for a review on PR #7947.

Wasmtime GitHub notifications bot (Feb 20 2024 at 20:22):

elliottt submitted PR review.

Wasmtime GitHub notifications bot (Feb 20 2024 at 20:22):

elliottt created PR review comment:

Would you mind taking a look at the changes in x64/asm.rs? I had to copy some details over from cranelift's emit module for rex encoding, but I believe that this approach should be much more resilient: it doesn't assume anything about how cranelift would encode an instruction; it records the offset into the patch region where the immediate starts, so it should also be resilient to local changes.

The downside is that it brought a lot of details into the asm module, but I think that's somewhat unavoidable at this point.

Wasmtime GitHub notifications bot (Feb 20 2024 at 20:22):

elliottt submitted PR review.

Wasmtime GitHub notifications bot (Feb 20 2024 at 20:22):

elliottt created PR review comment:

I added some more comments to this point in increment_sp, and to the definition of sp_max, would you mind taking another look?

Wasmtime GitHub notifications bot (Feb 20 2024 at 20:25):

elliottt updated PR #7947.

Wasmtime GitHub notifications bot (Feb 20 2024 at 20:25):

elliottt submitted PR review.

Wasmtime GitHub notifications bot (Feb 20 2024 at 20:25):

elliottt created PR review comment:

I've updated the language on this comment, what do you think?

Wasmtime GitHub notifications bot (Feb 20 2024 at 20:27):

cfallin submitted PR review:

Looks great, thanks! Just one code-reuse nit below, otherwise good to go.

Wasmtime GitHub notifications bot (Feb 20 2024 at 20:27):

cfallin submitted PR review:

Looks great, thanks! Just one code-reuse nit below, otherwise good to go.

Wasmtime GitHub notifications bot (Feb 20 2024 at 20:27):

cfallin created PR review comment:

Can we use the version of this struct and of encode_modrm below in cranelift_codegen::isa::x64::encoding? It looks like most of that path is already pub; only the RexFlags and encode_modrm items themselves are not. (It's a bit suboptimal to export over the public interface, but then we already reserve the right via semver to change at each release and this is the sort of thing I'd have no reservations refactoring later)

Wasmtime GitHub notifications bot (Feb 20 2024 at 20:31):

saulecabrera submitted PR review.

Wasmtime GitHub notifications bot (Feb 20 2024 at 20:31):

saulecabrera created PR review comment:

    /// shrink as space is reserved and freed during compilation), but once all instructions have

Wasmtime GitHub notifications bot (Feb 20 2024 at 20:32):

saulecabrera submitted PR review.

Wasmtime GitHub notifications bot (Feb 20 2024 at 20:32):

saulecabrera created PR review comment:

Looks great, thanks!

Wasmtime GitHub notifications bot (Feb 20 2024 at 22:31):

elliottt updated PR #7947.

Wasmtime GitHub notifications bot (Feb 20 2024 at 22:31):

elliottt updated PR #7947.

Wasmtime GitHub notifications bot (Feb 20 2024 at 22:32):

elliottt submitted PR review.

Wasmtime GitHub notifications bot (Feb 20 2024 at 22:32):

elliottt created PR review comment:

Yep! My original worry was needing to compute the offset for what was written to the MachBuffer, but that's done easily enough with MachBuffer::cur_offset. It's much nicer not duplicating this!

Wasmtime GitHub notifications bot (Feb 20 2024 at 22:32):

elliottt submitted PR review.

Wasmtime GitHub notifications bot (Feb 20 2024 at 22:32):

elliottt created PR review comment:

Amazing spelling mistake, thank you for catching that!

Wasmtime GitHub notifications bot (Feb 20 2024 at 22:33):

elliottt updated PR #7947.

Wasmtime GitHub notifications bot (Feb 20 2024 at 23:17):

elliottt merged PR #7947.

Last updated: Apr 17 2025 at 00:13 UTC