elliottt requested wasmtime-compiler-reviewers for a review on PR #7947.
elliottt opened PR #7947 from elliottt:trevor/winch-stack-check-in-prologue
to bytecodealliance:main
:
Move the stack check to the function prologue in winch generated code by adding the final stack maximum to the stack min bound before comparing against the current SP.
This new approach is more accurate than the previous implementation that would only check the size of
locals
, but does require that we patch the function duringMacroAssembler::finalize
, as we don't know the stack high water mark until then. Enabling the patching required the addition of thestart_patchable
/end_patchable
api toMachBuffer
, which gives a way to name sections of the code buffer (with some restrictions) that can be edited later. We use this then to modify an add-with-immediate instruction to add the used stack space to the lower bound, when finalizing the function.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
elliottt requested cfallin for a review on PR #7947.
elliottt requested saulecabrera for a review on PR #7947.
elliottt updated PR #7947.
elliottt edited PR #7947:
Move the stack check to the function prologue in winch generated code by adding the final stack maximum to the stack min bound before comparing against the current SP.
This new approach is more accurate than the previous implementation that would only check the size of
locals
, but does require that we patch the function duringMacroAssembler::finalize
, as we don't know the stack high water mark until then. Enabling the patching required the addition of thestart_patchable
/end_patchable
api toMachBuffer
, which gives a way to name sections of the code buffer (with some restrictions) that can be edited later. We use this then to modify an add-with-immediate instruction to add the used stack space to the stack lower bound fromvmctx
, when finalizing the function.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
cfallin submitted PR review:
Overall approach seems fine; some thoughts on the x64 machine-code patching specifically:
cfallin submitted PR review:
Overall approach seems fine; some thoughts on the x64 machine-code patching specifically:
cfallin created PR review comment:
Can we add a bit more doc here? "Represents an editable region that is currently being emitted in the
MachBuffer
. When closed, this becomes aPatchRegion
and allows editing the machine code post-emission." or something like that?
cfallin created PR review comment:
Likewise here; note what it's for.
cfallin created PR review comment:
This and the asserts on opcode bytes below feel a little fragile. For example if we change the way immediates are emitted in the future, this might break (we're tying masm's lowering strategy to a specific assumption about its output here). Also the REX byte is pinned by an assert, and might change if the regalloc picks a register in the other half of the 16-reg bank (!). Perhaps what we could do instead is something like:
- Have a
PatchableValueLoad
abstraction (either anMInst
case or a little struct here that does its own encoding, to nail down the length-invariance); ensure that it always produces machine code of the same length.::new(value: u32, reg: Reg)
,::update_value(value: u32)
and::emit_to_slice(&mut [u8])
or something like that.- Use it here to initially put a placeholder in; we can avoid using a magic value and magic knowledge about immediate encoding then. Perhaps we can even write a wrapper
::emit_to_buffer(&mut MachBuffer) -> PatchRegion
.- Later, patch using this; perhaps with
::emit_to_region(&mut MachBuffer, PatchRegion)
.Thoughts?
cfallin created PR review comment:
I think branches are the only case actually; and at the
MachBuffer
layer all that matters are branches that we tell theMachBuffer
about, viaadd_cond_branch
andadd_uncond_branch
. So I think we can make this more concrete by saying "It must not introduce any MachBuffer-visible branches (i.e., you must not calladd_cond_branch
oradd_uncond_branch
betweenstart_patchable
andend_patchable
)."
cfallin created PR review comment:
Could we add this assert in
add_cond_branch
andadd_uncond_branch
as well?
elliottt commented on PR #7947:
The tests are failing due to the addition of stack checking to one of the trampolines (as the vmctx isn't setup yet in one case). I might roll those back so that this can be debugged separately.
elliottt updated PR #7947.
github-actions[bot] commented on PR #7947:
Subscribe to Label Action
cc @saulecabrera
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:machinst", "winch"Thus the following users have been cc'd because of the following labels:
- saulecabrera: winch
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
elliottt submitted PR review.
elliottt created PR review comment:
I agree that it's fragile as-is, and something a bit more robust would be good here.
Adding a special
PatchableValueLoad
type sounds good to me, as long as you don't mind the more specific "add patched immediate to register" behavior. I wanted to avoid using an additional register here, so that the sequence of loading the stack lower bound and adding the immediate for the final stack value could be done with only the scratch register.I'll experiment with adding a
PatchableAddRegImm
struct in the x64 backend in cranelift, as that would make it easy to reuse logic from the existing emit implemetation for add.
elliottt updated PR #7947.
elliottt updated PR #7947.
elliottt updated PR #7947.
saulecabrera submitted PR review:
The Winch pieces look good to me. Left a small question and a comment. I'm going to defer the final approval to Chris, given that I'm not as familiar with the
MachBuffer
implementation.
saulecabrera submitted PR review:
The Winch pieces look good to me. Left a small question and a comment. I'm going to defer the final approval to Chris, given that I'm not as familiar with the
MachBuffer
implementation.
saulecabrera created PR review comment:
/// Regions where the addition of the used stack must be fixed up.
saulecabrera created PR review comment:
Could we add a comment here on why we're using
.max
? The reason why I bring this up is that we currently don't have a way to track the "canonical stack pointer offset" across the compiler, which is what we'd probably want to use here instead (that's what.max
is doing here if I'm understanding correctly). We currently use thesp_offset
as the single source of truth, but lately I've been thinking that it might be a good idea to change that (but not a blocker for this change!). This is mostly relevant in control flow, where we have to soft-reset the stack pointer in some situations.
saulecabrera created PR review comment:
One clarification question here: why the
2
? Unless I'm missing something, I'm only able to spot a singlepush
(viaadd_stack_max
).
elliottt submitted PR review.
elliottt created PR review comment:
This could definitely be
1
or even an optional. Given that we could possibly callcheck_stack
multiple times, I wanted to make sure that we would still generate valid code. Perhaps the right solution is anOption<PatchRegion>
and an assert?
elliottt submitted PR review.
elliottt created PR review comment:
The intent was to track the largest stack offset throughout code generation, so that we had an upper bound for the stack space needed. As push and reserve_stack both use
increment_sp
, this felt like a pretty good proxy for tracking the maximum seen stack allocation, but maybe I'm missing something here?
saulecabrera submitted PR review.
saulecabrera created PR review comment:
this felt like a pretty good proxy for tracking the maximum seen stack allocation
I think it is! I was just thinking that maybe it makes sense to have a comment stating that we do this (
.max
) given that we currently don't explicitly track the maximum amount of stack generated anywhere in the compiler yet, which I think we could as an optimization.
saulecabrera submitted PR review.
saulecabrera created PR review comment:
Oh that makes sense, thanks for explaining. Yeah perhaps we could try going with
Option<PatchRegion>
+ assert.
elliottt submitted PR review.
elliottt created PR review comment:
Ah! I'll add a comment to clarify the intent here. I can also put some more documentation into the
sp_max
field itself :+1:
elliottt updated PR #7947.
elliottt updated PR #7947.
elliottt requested cfallin for a review on PR #7947.
elliottt submitted PR review.
elliottt created PR review comment:
Would you mind taking a look at the changes in x64/asm.rs? I had to copy some details over from cranelift's emit module for rex encoding, but I believe that this approach should be much more resilient: it doesn't assume anything about how cranelift would encode an instruction; it records the offset into the patch region where the immediate starts, so it should also be resilient to local changes.
The downside is that it brought a lot of details into the asm module, but I think that's somewhat unavoidable at this point.
elliottt submitted PR review.
elliottt created PR review comment:
I added some more comments to this point in
increment_sp
, and to the definition ofsp_max
, would you mind taking another look?
elliottt updated PR #7947.
elliottt submitted PR review.
elliottt created PR review comment:
I've updated the language on this comment, what do you think?
cfallin submitted PR review:
Looks great, thanks! Just one code-reuse nit below, otherwise good to go.
cfallin submitted PR review:
Looks great, thanks! Just one code-reuse nit below, otherwise good to go.
cfallin created PR review comment:
Can we use the version of this struct and of
encode_modrm
below incranelift_codegen::isa::x64::encoding
? It looks like most of that path is alreadypub
; only theRexFlags
andencode_modrm
items themselves are not. (It's a bit suboptimal to export over the public interface, but then we already reserve the right via semver to change at each release and this is the sort of thing I'd have no reservations refactoring later)
saulecabrera submitted PR review.
saulecabrera created PR review comment:
/// shrink as space is reserved and freed during compilation), but once all instructions have
saulecabrera submitted PR review.
saulecabrera created PR review comment:
Looks great, thanks!
elliottt updated PR #7947.
elliottt updated PR #7947.
elliottt submitted PR review.
elliottt created PR review comment:
Yep! My original worry was needing to compute the offset for what was written to the
MachBuffer
, but that's done easily enough withMachBuffer::cur_offset
. It's much nicer not duplicating this!
elliottt submitted PR review.
elliottt created PR review comment:
Amazing spelling mistake, thank you for catching that!
elliottt updated PR #7947.
elliottt merged PR #7947.
Last updated: Jan 24 2025 at 00:11 UTC