wasmtime / PR #4747 cranelift: Add inline stack probing f... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #4747 cranelift: Add inline stack probing f...

Wasmtime GitHub notifications bot (Aug 21 2022 at 10:52):

afonso360 opened PR #4747 from inline-stackprobing to main:

:wave: Hey,

This PR adds inline stack probing for the x64 backend. Opening as a draft since there are still some pending questions.

This is feature gated behind a enable_inline_probestack and also requires the user to enable enable_probestack.

Currently we have two implementations, If we only need to probe a few stack pages, we unroll the probes. If we need more we fall back to using a loop.

Closes: #2299
cc: @bjorn3 @cfallin

Wasmtime GitHub notifications bot (Aug 21 2022 at 10:52):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Aug 21 2022 at 10:52):

afonso360 created PR review comment:

I'm not too happy about this, and did this mostly as a proof of concept to see if it fixes the issues we were having in https://github.com/bjorn3/rustc_codegen_cranelift/issues/1261#issuecomment-1219307630 .

I think the real solution here is allowing us to emit labels and use those with the regular jump instruction.

gen_prologue curently does two things:

Initializes part of the Callee struct (calculates stack sizes)

Generates the prologue instructions

These instructions aren't emitted immediatley, instead we call this early on in vcode emission to initialize the Callee struct, and the only later on emit the previously generated instructions.

https://github.com/bytecodealliance/wasmtime/blob/d620705a323e3da59bd90473b4e627c8502b1255/cranelift/codegen/src/machinst/vcode.rs#L790-L793

I think we can change this slightly to allow label emission in gen_prologue by splitting it into two functions.

One that precomputes the stack sizes and initializes the Callee struct, that is called early on (where the current gen_prologue currently is).

And a second one that directly emits the prologue instructions. This way we can emit them and at the same time pass a function to generate a label at the current emit point.

Are there issues with emitting labels so late in the pipeline? And does this sound like a reasonable refactor?

Wasmtime GitHub notifications bot (Aug 21 2022 at 10:53):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Aug 21 2022 at 10:53):

afonso360 created PR review comment:

I'm not too happy about this, and did this mostly as a proof of concept to see if it fixes the issues we were having in https://github.com/bjorn3/rustc_codegen_cranelift/issues/1261#issuecomment-1219307630 .

I think the real solution here is allowing us to emit labels and use those with the regular jump instruction.

gen_prologue curently does two things:

Initializes part of the Callee struct (calculates stack sizes)

Generates the prologue instructions

These instructions aren't emitted immediatley, instead we call this early on in vcode emission to initialize the Callee struct, and the only later on emit the previously generated instructions.

https://github.com/bytecodealliance/wasmtime/blob/d620705a323e3da59bd90473b4e627c8502b1255/cranelift/codegen/src/machinst/vcode.rs#L790-L793

I think we can change this slightly to allow label emission in gen_prologue by splitting it into two functions.

One that precomputes the stack sizes and initializes the Callee struct, that is called early on (where the current gen_prologue currently is).

And a second one that directly emits the prologue instructions. This way we can emit them and at the same time pass a function to generate a label at the current emit point.

Are there issues with emitting labels so late in the pipeline? And does this sound like a reasonable refactor?

Wasmtime GitHub notifications bot (Aug 21 2022 at 10:53):

afonso360 deleted PR review comment.

Wasmtime GitHub notifications bot (Aug 21 2022 at 11:02):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Aug 21 2022 at 11:02):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Aug 21 2022 at 11:02):

bjorn3 created PR review comment:

You can only have either outlined or inlined stack probes, not both at the same time. Maybe make it an enum?

Wasmtime GitHub notifications bot (Aug 21 2022 at 13:17):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Aug 21 2022 at 13:17):

afonso360 created PR review comment:

Yeah, that sounds like a good idea.

Wasmtime GitHub notifications bot (Aug 21 2022 at 13:17):

afonso360 edited PR review comment.

Wasmtime GitHub notifications bot (Aug 22 2022 at 20:51):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Aug 22 2022 at 20:51):

cfallin created PR review comment:

@afonso360 in principle we could allow for control flow to be inserted this late in the pipeline, but in practice it's a fairly intrusive refactor as we have to somehow communicate out-of-band the label and reference to it (or hack it with an explicit offset as here, but this definitely is not the right answer IMHO as it bakes in a dependence on encoded instruction sizes determined elsewhere in the code, in a way that could lead to a CVE if it later breaks).

I think the better answer here is to actually have a psuedo-MachInst that represents "stack-probe loop" and emits whatever is appropriate. We already have control-flow-inside-an-instruction for cases like TrapIf; this works well enough and maintains the abstraction that an instruction is a single-in, single-out thing (ignoring CLIF-level branches for now), which the rest of the backend assumes. Does that make sense?

Wasmtime GitHub notifications bot (Aug 22 2022 at 21:08):

afonso360 created PR review comment:

Yeah that does sound reasonable, and a lot easier that performing that refactor. Thanks!

Wasmtime GitHub notifications bot (Aug 22 2022 at 21:08):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Aug 23 2022 at 10:19):

afonso360 updated PR #4747 from inline-stackprobing to main.

Wasmtime GitHub notifications bot (Aug 23 2022 at 10:37):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Aug 23 2022 at 10:37):

afonso360 created PR review comment:

I've added a stackprobe_strategy flag, and while right now its just inline and outline I think in the future we can experiment with adding different strategies to mix the two, such as allowing both and in OptLevel::Size emitting the unroll inline or the call, whichever is smallest.

Wasmtime GitHub notifications bot (Aug 23 2022 at 10:38):

bjorn3 created PR review comment:

Looks like you haven't pushed it yet.

Wasmtime GitHub notifications bot (Aug 23 2022 at 10:38):

bjorn3 submitted PR review.

Wasmtime GitHub notifications bot (Aug 23 2022 at 10:40):

afonso360 edited PR review comment.

Wasmtime GitHub notifications bot (Aug 23 2022 at 10:41):

afonso360 updated PR #4747 from inline-stackprobing to main.

Wasmtime GitHub notifications bot (Aug 23 2022 at 10:42):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Aug 23 2022 at 10:42):

afonso360 created PR review comment:

Oops, fixed

Wasmtime GitHub notifications bot (Aug 23 2022 at 10:42):

afonso360 edited PR review comment.

Wasmtime GitHub notifications bot (Aug 23 2022 at 10:42):

afonso360 updated PR #4747 from inline-stackprobing to main.

Wasmtime GitHub notifications bot (Aug 23 2022 at 10:42):

afonso360 updated PR #4747 from inline-stackprobing to main.

Wasmtime GitHub notifications bot (Aug 23 2022 at 14:36):

afonso360 has marked PR #4747 as ready for review.

Wasmtime GitHub notifications bot (Sep 01 2022 at 08:58):

afonso360 updated PR #4747 from inline-stackprobing to main.

Wasmtime GitHub notifications bot (Sep 01 2022 at 20:56):

jameysharp submitted PR review.

Wasmtime GitHub notifications bot (Sep 01 2022 at 20:56):

jameysharp submitted PR review.

Wasmtime GitHub notifications bot (Sep 01 2022 at 20:56):

jameysharp created PR review comment:

This comment is no longer true, is it? I think you can delete it.

Wasmtime GitHub notifications bot (Sep 01 2022 at 20:56):

jameysharp created PR review comment:

Minor edits I just noticed:

            // The inline stack probe loop has 3 phases:
            //
            // We generate the "guard area" register which is essentially the frame_size aligned to
            // guard_size. We copy the stack pointer and subtract the guard area from it. This
            // gets us a register that we can use to compare when looping.
            //
            // After that we emit the loop. Essentially we just adjust the stack pointer one guard_size'd
            // distance at a time and then touch the stack by writing anything to it. We use the previously
            // created "guard area" register to know when to stop looping.

Wasmtime GitHub notifications bot (Sep 01 2022 at 20:56):

jameysharp created PR review comment:

Why do just a few runtests need stack-probing disabled?

Wasmtime GitHub notifications bot (Sep 01 2022 at 21:14):

afonso360 updated PR #4747 from inline-stackprobing to main.

Wasmtime GitHub notifications bot (Sep 01 2022 at 21:18):

afonso360 submitted PR review.

Wasmtime GitHub notifications bot (Sep 01 2022 at 21:18):

afonso360 created PR review comment:

These tests have stack slots above the threshold where we emit a stack probe. The default strategy is to outline the stack probing to a external function call, but since we never provide an implementation for that anywhere the JIT can't link these tests, so they fail.

I manually disabled stack probing for these which reflects the old behaviour of force disabling it everywhere.

Wasmtime GitHub notifications bot (Sep 01 2022 at 21:20):

afonso360 edited PR review comment.

Wasmtime GitHub notifications bot (Sep 01 2022 at 21:27):

jameysharp has enabled auto merge for PR #4747.

Wasmtime GitHub notifications bot (Sep 01 2022 at 22:32):

jameysharp merged PR #4747.

Last updated: Apr 17 2025 at 06:04 UTC