Stream: git-wasmtime

Topic: wasmtime / issue #4466 Out-of-band fuel metering


view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2022 at 18:00):

github-actions[bot] commented on issue #4466:

Subscribe to Label Action

cc @peterhuene

<details>
This issue or pull request has been labeled: "wasmtime:api", "wasmtime:config"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Jul 19 2022 at 18:01):

github-actions[bot] commented on issue #4466:

Label Messager: wasmtime:config

It looks like you are changing Wasmtime's configuration options. Make sure to
complete this check list:

[fuzzing-config]: https://github.com/bytecodealliance/wasmtime/blob/ca0e8d0a1d8cefc0496dba2f77a670571d8fdcab/crates/fuzzing/src/generators.rs#L182-L194
[fuzzing-docs]: https://docs.wasmtime.dev/contributing-fuzzing.html


<details>

To modify this label's message, edit the <code>.github/label-messager/wasmtime-config.md</code> file.

To add new label messages or remove existing label messages, edit the
<code>.github/label-messager.json</code> configuration file.

Learn more.

</details>

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2022 at 16:26):

pepyakin commented on issue #4466:

Thank you for taking a look!

Yes, @alexcrichton, this PR is very raw right, way much less that I prefer right now. Sorry about that. I was eager to create it because of two reasons:

  1. I wanted to check if the general approach is what we want to,
  2. the actual trigger, was that I discovered Nick's PR #4431.

Specifically, to give the context for this question https://github.com/bytecodealliance/wasmtime/pull/4431#discussion_r924804070. I was scared that it made this PR impossible or very hard to implement.

Right now, I am rebasing this PR onto Nick's and trying to figure out what to do with the trampolines. The best I came up with so far is under the spoiler below. I hope my programming license won't be revoked.

<details>

struct VMRuntimeLimits {
    // ...

    /// A flag that indicates if the out-of-band fuel metering is enabled.
    pub outband_fuel: UnsafeCell<bool>,

    /// host-to-wasm return address.
    pub orig_entry_trampoline_return_address: UnsafeCell<usize>,

    /// wasm-to-host return address.
    pub orig_exit_trampoline_return_address: UnsafeCell<usize>,
}

// ...

asm_func!(
    "host_to_wasm_trampoline",
    "
        .cfi_startproc simple
        .cfi_def_cfa_offset 0

        // Load the pointer to `VMRuntimeLimits` in `r10`.
        mov r10, 8[rsi]

        // Check to see if this is a core `VMContext` (MAGIC == 'core').
        cmp DWORD PTR [rdi], 0x65726f63

        // Store the last Wasm SP into the `last_wasm_entry_sp` in the limits, if this
        // was core Wasm, otherwise store an invalid sentinal value.
        mov r11, -1
        cmove r11, rsp
        mov 40[r10], r11

        // Check if the out-of-band fuel metering is enabled. If it is not enabled, skip to the
        // tail call directly.
        cmp QWORD PTR 48[r10], 0x00
        je host_to_wasm_trampoline__perform_tail_call

        // OK, here we know that the out-of-band fuel metering is enabled.
        //
        // Dump the `fuel_consumed` into the fuel register.
        mov r15, 8[r10]

        // Next, we have to do some shenanigans. Below we call into the wasm with a tail call, to
        // not mess with the stack layout. But at the same time, after returning from wasm, we
        // have to our fuel register r15 back into `fuel_consumed` of `VMRuntimeLimits`. To do that
        // we stash the original return address (which is currently located at `[rsp]`) that points
        // back to the caller from the host side and change it with the continuation defined
        // below. The continuation will dump the r15 back into the memory and unstash the original
        // return address and jump at it.

        mov r11, [rsp]
        mov 56[r10], r11
        lea r11, [rip+host_to_wasm_trampoline__cont]
        mov [rsp], r11

        host_to_wasm_trampoline__perform_tail_call:

        // Tail call to the callee function pointer in the vmctx.
        jmp 16[rsi]

        host_to_wasm_trampoline__cont:

        // Here we're in a inconvenient situation, where the callee's frame is popped and all the
        // arguments are gone. We have to improvise our way out of this situation.

        // Open a temporary stack frame.
        //
        // The frame layout:
        //
        // rsp +  0: The original return address. Written by `wasmtime_outband_fuel_on_entry`.
        // rsp +  8: r15
        // rsp + 16: rax
        // rsp + 24: padding
        // rsp + 32: xmm0
        sub rsp, 48
        mov 8[rsp], r15
        mov 16[rsp], rax
        movdqu XMMWORD PTR 32[rsp], xmm0

        // Finally, move the current rsp into the first argument.
        mov rdi, rsp
        call wasmtime_outband_fuel_on_entry

        // Load the original return address into r10
        mov r10, 0[rsp]

        // Restore the initial values of rax and xmm0.
        mov rax, 16[rsp]
        movdqu xmm0, XMMWORD PTR 32[rsp]

        // Release the allocated stack area.
        add rsp, 48

        // Jump at the original return address.
        jmp r10

        .cfi_endproc
    ",
);


/// Dumps fuel into the `VMRuntimeLimits->fuel_consumed` and unstashes the original return address
/// for the trampoline.
///
/// Only used if out-of-band fuel is used.
#[no_mangle]
pub unsafe extern "C" fn wasmtime_outband_fuel_on_entry(sp: *mut usize) {
    use crate::traphandlers::tls;
    tls::with(|info| {
        let info = match info {
            Some(info) => info,
            None => {
                // Entering trampoline must set the TLS.
                std::hint::unreachable_unchecked();
            }
        };

        #[repr(C)]
        struct StackArgs {
            return_address: usize,
            fuel: i64,
        }

        let args = sp as *mut StackArgs;

        (*args).return_address = *(*info.runtime_limits())
            .orig_entry_trampoline_return_address
            .get();
        *(*info.runtime_limits()).fuel_consumed.get() = (*args).fuel;

        if (*args).fuel > 0 {
            // Fuel ran out and we bail through unwinding.
            crate::traphandlers::raise_lib_trap(wasmtime_environ::TrapCode::Interrupt);
        }
    });
}

</details>

This seems to be working and I hope I am not missing anything. I am in the process of replicating the same construction for the wasm-to-host and libcall trampolines.

You can see there it should address the concerns you expressed wrt not checking the fuel. To be clear, the trampoline generation in compiler.rs will also be removed, since the common variadic asm trampoline (what's a better name to distingiush it from the ones generated in compiler.rs?) takes care of that.

I ack the other notes. I hope the next push will feature not as raw code as this one. Also will run the benchmarks on it.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 22 2022 at 19:30):

alexcrichton commented on issue #4466:

Oh no worries! I mostly want to make sure I'm clear on expectations and such, it's totally fine to push up work-in-progress PRs.

Also wow that is some gnarly assembly. I... can see how it might work though! I suspect one day we're going to need to severly improve our trampoline story, even with "just maintain frame pointers" it was already complicated enough... In any case though not a blocker for this work, just something for us to ponder.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 15:14):

alexcrichton commented on issue #4466:

One thing I just thought of I want to note before I forget about, right now "is this a wasm pc" is pretty crude where it just checks if the instruction pointer is in the code section of a loaded module (this is preexisting in Wasmtime) but I think that we'll want to enhance that because if an interrupt happens during an entry trampoline we can't guarantee that the fuel register is configured yet, so we'll want to make sure that the "is this a wasm pc" check may exclude trampolines.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 15:36):

pepyakin commented on issue #4466:

Oh, yes, definitely. On top of that, the query should be a bit more smarter. It should only return true if pc is in a wasm module that defines r15 as fuel, not just any module. That should come.

I've been battling with the trampolines for quite some time now, and it does not seem good.

The problem is I am not sure how to make the system unwinder work through this trampoline. So at first, I was getting the SIGABRT from the unwinder since it couldn't find the CFA. I fixed that by replacing cfi_def_cfa_offset 0 with .cfi_def_cfa rsp, 0. That kind of worked, but then the unwinder started getting stuck in the infinite loop.

I fixed that by slapping .cfi_undefined 16 before the tail call. That worked but, expectedly, now the unwinder can't get past the trampolines, and the stack traces (e.g. created by anyhow) are chopped.

We need to annotate where we get the original return address to fix that. The return address is stored behind TLS. There is no way, we can come up with a DWARF expression that fetches it from the VMRuntimeLimits.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 15:36):

pepyakin edited a comment on issue #4466:

Oh, yes, definitely. On top of that, the query should be a bit more smarter. It should only return true if pc is in a wasm module that defines r15 as fuel, not just any module. That should come later.

I've been battling with the trampolines for quite some time now, and it does not seem good.

The problem is I am not sure how to make the system unwinder work through this trampoline. So at first, I was getting the SIGABRT from the unwinder since it couldn't find the CFA. I fixed that by replacing cfi_def_cfa_offset 0 with .cfi_def_cfa rsp, 0. That kind of worked, but then the unwinder started getting stuck in the infinite loop.

I fixed that by slapping .cfi_undefined 16 before the tail call. That worked but, expectedly, now the unwinder can't get past the trampolines, and the stack traces (e.g. created by anyhow) are chopped.

We need to annotate where we get the original return address to fix that. The return address is stored behind TLS. There is no way, we can come up with a DWARF expression that fetches it from the VMRuntimeLimits.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 16:02):

pepyakin commented on issue #4466:

Ah, regarding the fuel register being initialized, I think it should already work. Here we ask for the DefinedFunctionIndex which is available only for functions defined by a wasm module and trampolines should be different.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 16:02):

pepyakin edited a comment on issue #4466:

Ah, regarding the fuel register being initialized, I think it should already work. Here we ask for the DefinedFunctionIndex which is available only for functions defined by a wasm module. If pc falls on a trampoline that would return None.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 25 2022 at 21:43):

alexcrichton commented on issue #4466:

I'm pretty sure our dwarf unwind information is not "async correct" where every single ip in a function guarantees a correct backtrace. With the fast-stack-walking branch and fp-based unwinds that may get better though.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 26 2022 at 09:58):

pepyakin commented on issue #4466:

I'm pretty sure our dwarf unwind information is not "async correct" where every single ip in a function guarantees a correct backtrace. With the fast-stack-walking branch and fp-based unwinds that may get better though.

Oh, sure, I remember that. It should be fine though since I was not planning on capturing the backtraces in the fuel interrupts, and I still keep in mind the landing pads idea.

The problem I am dealing with now is not about async traps. I should've been clearer on this. That SIGABRT would happen during capturing the stack trace iff the trampoline on the stack, and it went through the return address swizzling of the continuation.

I came up with a standalone minimized version of the problem which can be found under the spoiler:

<details>

// extern crate backtrace;

use std::arch;
use std::mem;

arch::global_asm!(
    "
    .p2align 4
    .global trampoline
    .type trampoline, @function

    trampoline:

    .cfi_startproc simple
    .cfi_def_cfa rsp, 0

    // move RDI, the original function, into r11
    mov r11, rdi

    // the return address pushed by the `call` instruction calling this trampoline
    // is located at *rsp. Move the value to rdi.
    mov rdi, [rsp]

    // Then, replace the return value with the continuation at the label `cont`.
    lea r10, cont[rip]
    mov [rsp], r10

    //
    // Important!!
    //
    // Mark the register 16, aka the return address, as undefined. That stops the unwinder here.
    // If that was not here, the unwinder would fall into an infinite loop.
    //
    // An unfortunate side-effect is the stack traces are also chopped here.
    .cfi_undefined 16

    // Finally, perform the jump.
    jmp r11

    cont:

    jmp rax

    .cfi_endproc

    .size trampoline, .-trampoline
    "
);

extern "C" {
    fn trampoline();
}

type Func = extern "C" fn(*const (), usize) -> *const ();

#[inline]
unsafe fn prepare_trampoline(dest: Func) -> (*const (), Func) {
    let dest = dest as *const Func as *const ();
    let func = mem::transmute_copy(&(trampoline as usize));

    (dest, func)
}

fn main() {
    unsafe {
        let (dest, f) = prepare_trampoline(host_func);
        f(dest, 48);
    }
}

extern "C" fn host_func(rdi: *const (), rsi: usize) -> *const () {
    println!("{}", rsi);
    let mut count = 0;
    backtrace::trace(|frame| {
        let ip = frame.ip();
        println!("{:X}", ip as usize);

        if count == 100 {
            // likely we stuck in a loop. Break into the debugger.
            unsafe {
                arch::asm!("ud2");
            }
        }
        count += 1;

        true
    });

    // we don't have such luxury in the real thing.
    rdi
}

</details>

If you remove the .cfi_undefined 16 directive, the unwinder would get stuck in an infinite loop.
If you remove the .cfi_def_cfa rsp, 0 directive, or even change to .cfi_def_cfa_offset 0 (Does this require attention in https://github.com/bytecodealliance/wasmtime/pull/4431?) found in the original version, then you would get SIGABRT:

* thread #1, name = 'cfi-dwarf', stop reason = signal SIGABRT
  * frame #0: 0x00007ffff7e3abaa libc.so.6`raise + 202
    frame #1: 0x00007ffff7e25523 libc.so.6`abort + 255
    frame #2: 0x00007ffff7dd5b40 libgcc_s.so.1`uw_update_context_1 + 1008
    frame #3: 0x00007ffff7dd5cd1 libgcc_s.so.1`uw_update_context + 17
    frame #4: 0x00007ffff7dd68bd libgcc_s.so.1`_Unwind_Backtrace + 93
    frame #5: 0x000055555555cfa1 cfi-dwarf`backtrace::backtrace::trace_unsynchronized::hf1601be64f8e51c8 [inlined] backtrace::backtrace::libunwind::trace::h80afc20a98e10b83(cb=&mut dyn core::ops::function::FnMut<(&backtrace::backtrace::Frame), Output=bool> @ 0x00007fffffff9d58) at libunwind.rs:93:5
    frame #6: 0x000055555555cf8c cfi-dwarf`backtrace::backtrace::trace_unsynchronized::hf1601be64f8e51c8(cb={closure_env#0} @ 0x00007fffffff9d40) at mod.rs:66:5
    frame #7: 0x000055555555d038 cfi-dwarf`backtrace::backtrace::trace::h4ba2f0e531a2363e(cb={closure_env#0} @ 0x00007fffffff9da0) at mod.rs:53:14
    frame #8: 0x000055555555d1d3 cfi-dwarf`cfi_dwarf::host_func::h89c8978f2a6c15a6(rdi=0x000055555555d148, rsi=48) at main.rs:74:5

view this post on Zulip Wasmtime GitHub notifications bot (Jul 26 2022 at 16:14):

alexcrichton commented on issue #4466:

Oh right yeah with the trampoline you gisted above I was wondering how backtrace info would work out and it seems like it doesn't. I don't know how to solve that myself, the trampoline there is quite gnarly.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 26 2022 at 16:41):

pepyakin commented on issue #4466:

Oh right yeah with the trampoline you gisted above I was wondering how backtrace info would work out and it seems like it doesn't. I don't know how to solve that myself, the trampoline there is quite gnarly.

If wasmtime-flavoured ABIs enshrined the caller vmctx and guaranteed it so that it is still usable after the callee returns, it would've made things simpler: no need for reaching to tls for the original return address. The dwarf expression would require a few dereferences but otherwise seems to be doable.

If you (or anyone from BA) think that's acceptable, I can try to research this implementation strategy.

Otherwise, I think I ran out of ideas how to make this work.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 26 2022 at 19:59):

alexcrichton commented on issue #4466:

Yeah unfortunately I don't have a great answer myself on that. I feel like with this and the stacks PR our trampoline story is pretty lacking, but I don't know how best to rectify it. Short of that it may be that a pinned register won't work at all until we figure out a better story for trampolines. Ideally all the pinned register/fuel stuff would be part of cranelift-compiled trampolines and we wouldn't have to tinker with global_asm! blobs at all. We're definitely not in a place to easily be able to do that right now though.


Last updated: Dec 23 2024 at 12:05 UTC