wasmtime / Issue #2018 Jitdump support sometimes doesn't ... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / Issue #2018 Jitdump support sometimes doesn't ...

Wasmtime GitHub notifications bot (Jul 14 2020 at 15:46):

alexcrichton opened Issue #2018:

I noticed recently that jitdump support seems broken on master when used through the wasmtime CLI. I'm pretty perplexed by this, especially because bisection pointed to #1565 as a cause. I think, though, that #1565 as a cause is a red herring, and this has always been a bug. Given this fib.rs:
fn main() {
    let n = 42;
    println!("fib({}) = {}", n, fib(n));
}

fn fib(n: u32) -> u32 {
    if n <= 2 {
        1
    } else {
        fib(n - 1) + fib(n - 2)
    }
}
along with this embedding:
use wasmtime::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut config = Config::new();
    config.profiler(ProfilingStrategy::JitDump)?;
    let engine = Engine::new(&config);
    let store = Store::new(&engine);
    let module = Module::from_file(&engine, "./fib.wasm")?;
    let instance = Instance::new(&store, &module, &[])?;
    let start = instance.get_func("main").unwrap();
    let start = start.get2::<i32, i32, i32>()?;
    let f = if false { // changeme
        Func::wrap(&store, move || {
            start(0, 0)?;
            Ok(())
        })
    } else {
        let ty = FuncType::new(Box::new([]), Box::new([]));
        Func::new(&store, ty, move |_, _, _| {
            start(0, 0)?;
            Ok(())
        })
    };
    f.call(&[]).unwrap();

    Ok(())
}
When this is run as:
$ rustc --target wasm32-unknown-unknown fib.rs -O
$ cargo build && perf record -k mono ./target/debug/wasmtime
$ perf inject --jit --input perf.data --output perf.jit.data
$ perf report --input perf.jit.data
this will not attribute time to fib::fib, but instead lots of jit addresses show up:
# Overhead  Command   Shared Object       Symbol                                                                                                                                                                                     >
# ........  ........  ..................  ...........................................................................................................................................................................................>
#
     2.02%  wasmtime  [JIT] tid 142064    [.] 0x00007f6d16b4c418
     1.73%  wasmtime  libc-2.31.so        [.] __memmove_avx_unaligned_erms
     1.34%  wasmtime  [JIT] tid 142064    [.] 0x00007f6d16b4c422
     1.32%  wasmtime  [JIT] tid 142064    [.] 0x00007f6d16b4c499
     1.29%  wasmtime  wasmtime            [.] crossbeam_epoch::internal::Local::pin
     1.20%  wasmtime  wasmtime            [.] core::ptr::swap_nonoverlapping_one
     1.09%  wasmtime  wasmtime            [.] core::sync::atomic::atomic_load
     1.05%  wasmtime  [JIT] tid 142064    [.] 0x00007f6d16b4c479
     1.03%  wasmtime  [JIT] tid 142064    [.] 0x00007f6d16b4c428
If, however, the line with //changeme is switched to true then you'll get the profile as expected:
# Overhead  Command   Shared Object       Symbol
# ........  ........  ..................  ...........................................................................................................................................................
#
    25.77%  wasmtime  jitted-142410-4.so  [.] fib::fib
     1.80%  wasmtime  libc-2.31.so        [.] __memmove_avx_unaligned_erms
     1.14%  wasmtime  wasmtime            [.] crossbeam_epoch::internal::Local::pin
     0.91%  wasmtime  wasmtime            [.] core::sync::atomic::atomic_load
     0.91%  wasmtime  wasmtime            [.] core::sync::atomic::atomic_compare_exchange
I'm honestly not entirely sure what's going on. I don't think that our jitdump integration is what's broken. The *.dump file emitted looks reasonable, perf inject appears to work, *.so files pop up and they've all got functions in them with names as expected. I'm not really sure what's happening here.

@jlb6740 do you know more about perf perhaps to help debug? It'd be great to know why perf is failing to recognize the function in one case but not the other.

The only real difference here seems to be what's on the stack, whether it's native versions of trampolines (Func::wrap) or runtime-generated versions of trampolines (Func::new). We don't tell jitdump about the trampolines generated by Func::new but after doing that it didn't seem to fix anything. Additionally I think we have unwind info for everything registered, but I'm not even sure why perf would want to unwind since we're not collecting stacks, just samples.

Overall I'm at a loss myself and I'm not sure what to do next. I'm hoping @jlb6740 you've got ideas of what might be going on :(

Wasmtime GitHub notifications bot (Jul 17 2020 at 18:40):

jlb6740 assigned Issue #2018:

I noticed recently that jitdump support seems broken on master when used through the wasmtime CLI. I'm pretty perplexed by this, especially because bisection pointed to #1565 as a cause. I think, though, that #1565 as a cause is a red herring, and this has always been a bug. Given this fib.rs:
fn main() {
    let n = 42;
    println!("fib({}) = {}", n, fib(n));
}

fn fib(n: u32) -> u32 {
    if n <= 2 {
        1
    } else {
        fib(n - 1) + fib(n - 2)
    }
}
along with this embedding:
use wasmtime::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut config = Config::new();
    config.profiler(ProfilingStrategy::JitDump)?;
    let engine = Engine::new(&config);
    let store = Store::new(&engine);
    let module = Module::from_file(&engine, "./fib.wasm")?;
    let instance = Instance::new(&store, &module, &[])?;
    let start = instance.get_func("main").unwrap();
    let start = start.get2::<i32, i32, i32>()?;
    let f = if false { // changeme
        Func::wrap(&store, move || {
            start(0, 0)?;
            Ok(())
        })
    } else {
        let ty = FuncType::new(Box::new([]), Box::new([]));
        Func::new(&store, ty, move |_, _, _| {
            start(0, 0)?;
            Ok(())
        })
    };
    f.call(&[]).unwrap();

    Ok(())
}
When this is run as:
$ rustc --target wasm32-unknown-unknown fib.rs -O
$ cargo build && perf record -k mono ./target/debug/wasmtime
$ perf inject --jit --input perf.data --output perf.jit.data
$ perf report --input perf.jit.data
this will not attribute time to fib::fib, but instead lots of jit addresses show up:
# Overhead  Command   Shared Object       Symbol                                                                                                                                                                                     >
# ........  ........  ..................  ...........................................................................................................................................................................................>
#
     2.02%  wasmtime  [JIT] tid 142064    [.] 0x00007f6d16b4c418
     1.73%  wasmtime  libc-2.31.so        [.] __memmove_avx_unaligned_erms
     1.34%  wasmtime  [JIT] tid 142064    [.] 0x00007f6d16b4c422
     1.32%  wasmtime  [JIT] tid 142064    [.] 0x00007f6d16b4c499
     1.29%  wasmtime  wasmtime            [.] crossbeam_epoch::internal::Local::pin
     1.20%  wasmtime  wasmtime            [.] core::ptr::swap_nonoverlapping_one
     1.09%  wasmtime  wasmtime            [.] core::sync::atomic::atomic_load
     1.05%  wasmtime  [JIT] tid 142064    [.] 0x00007f6d16b4c479
     1.03%  wasmtime  [JIT] tid 142064    [.] 0x00007f6d16b4c428
If, however, the line with //changeme is switched to true then you'll get the profile as expected:
# Overhead  Command   Shared Object       Symbol
# ........  ........  ..................  ...........................................................................................................................................................
#
    25.77%  wasmtime  jitted-142410-4.so  [.] fib::fib
     1.80%  wasmtime  libc-2.31.so        [.] __memmove_avx_unaligned_erms
     1.14%  wasmtime  wasmtime            [.] crossbeam_epoch::internal::Local::pin
     0.91%  wasmtime  wasmtime            [.] core::sync::atomic::atomic_load
     0.91%  wasmtime  wasmtime            [.] core::sync::atomic::atomic_compare_exchange
I'm honestly not entirely sure what's going on. I don't think that our jitdump integration is what's broken. The *.dump file emitted looks reasonable, perf inject appears to work, *.so files pop up and they've all got functions in them with names as expected. I'm not really sure what's happening here.

@jlb6740 do you know more about perf perhaps to help debug? It'd be great to know why perf is failing to recognize the function in one case but not the other.

The only real difference here seems to be what's on the stack, whether it's native versions of trampolines (Func::wrap) or runtime-generated versions of trampolines (Func::new). We don't tell jitdump about the trampolines generated by Func::new but after doing that it didn't seem to fix anything. Additionally I think we have unwind info for everything registered, but I'm not even sure why perf would want to unwind since we're not collecting stacks, just samples.

Overall I'm at a loss myself and I'm not sure what to do next. I'm hoping @jlb6740 you've got ideas of what might be going on :(

Wasmtime GitHub notifications bot (Jul 18 2020 at 01:40):

jlb6740 unassigned Issue #2018 (assigned to jlb6740):

I noticed recently that jitdump support seems broken on master when used through the wasmtime CLI. I'm pretty perplexed by this, especially because bisection pointed to #1565 as a cause. I think, though, that #1565 as a cause is a red herring, and this has always been a bug. Given this fib.rs:
fn main() {
    let n = 42;
    println!("fib({}) = {}", n, fib(n));
}

fn fib(n: u32) -> u32 {
    if n <= 2 {
        1
    } else {
        fib(n - 1) + fib(n - 2)
    }
}
along with this embedding:
use wasmtime::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut config = Config::new();
    config.profiler(ProfilingStrategy::JitDump)?;
    let engine = Engine::new(&config);
    let store = Store::new(&engine);
    let module = Module::from_file(&engine, "./fib.wasm")?;
    let instance = Instance::new(&store, &module, &[])?;
    let start = instance.get_func("main").unwrap();
    let start = start.get2::<i32, i32, i32>()?;
    let f = if false { // changeme
        Func::wrap(&store, move || {
            start(0, 0)?;
            Ok(())
        })
    } else {
        let ty = FuncType::new(Box::new([]), Box::new([]));
        Func::new(&store, ty, move |_, _, _| {
            start(0, 0)?;
            Ok(())
        })
    };
    f.call(&[]).unwrap();

    Ok(())
}
When this is run as:
$ rustc --target wasm32-unknown-unknown fib.rs -O
$ cargo build && perf record -k mono ./target/debug/wasmtime
$ perf inject --jit --input perf.data --output perf.jit.data
$ perf report --input perf.jit.data
this will not attribute time to fib::fib, but instead lots of jit addresses show up:
# Overhead  Command   Shared Object       Symbol                                                                                                                                                                                     >
# ........  ........  ..................  ...........................................................................................................................................................................................>
#
     2.02%  wasmtime  [JIT] tid 142064    [.] 0x00007f6d16b4c418
     1.73%  wasmtime  libc-2.31.so        [.] __memmove_avx_unaligned_erms
     1.34%  wasmtime  [JIT] tid 142064    [.] 0x00007f6d16b4c422
     1.32%  wasmtime  [JIT] tid 142064    [.] 0x00007f6d16b4c499
     1.29%  wasmtime  wasmtime            [.] crossbeam_epoch::internal::Local::pin
     1.20%  wasmtime  wasmtime            [.] core::ptr::swap_nonoverlapping_one
     1.09%  wasmtime  wasmtime            [.] core::sync::atomic::atomic_load
     1.05%  wasmtime  [JIT] tid 142064    [.] 0x00007f6d16b4c479
     1.03%  wasmtime  [JIT] tid 142064    [.] 0x00007f6d16b4c428
If, however, the line with //changeme is switched to true then you'll get the profile as expected:
# Overhead  Command   Shared Object       Symbol
# ........  ........  ..................  ...........................................................................................................................................................
#
    25.77%  wasmtime  jitted-142410-4.so  [.] fib::fib
     1.80%  wasmtime  libc-2.31.so        [.] __memmove_avx_unaligned_erms
     1.14%  wasmtime  wasmtime            [.] crossbeam_epoch::internal::Local::pin
     0.91%  wasmtime  wasmtime            [.] core::sync::atomic::atomic_load
     0.91%  wasmtime  wasmtime            [.] core::sync::atomic::atomic_compare_exchange
I'm honestly not entirely sure what's going on. I don't think that our jitdump integration is what's broken. The *.dump file emitted looks reasonable, perf inject appears to work, *.so files pop up and they've all got functions in them with names as expected. I'm not really sure what's happening here.

@jlb6740 do you know more about perf perhaps to help debug? It'd be great to know why perf is failing to recognize the function in one case but not the other.

The only real difference here seems to be what's on the stack, whether it's native versions of trampolines (Func::wrap) or runtime-generated versions of trampolines (Func::new). We don't tell jitdump about the trampolines generated by Func::new but after doing that it didn't seem to fix anything. Additionally I think we have unwind info for everything registered, but I'm not even sure why perf would want to unwind since we're not collecting stacks, just samples.

Overall I'm at a loss myself and I'm not sure what to do next. I'm hoping @jlb6740 you've got ideas of what might be going on :(

Wasmtime GitHub notifications bot (Jul 18 2020 at 03:29):

jlb6740 commented on Issue #2018:

@alexcrichton Sorry for taking so long to chime in. Yes you're right, the only difference does look like how the func is created. I have not run jitdump with an embedded instantiation in this way but will try to trace to see what is going wrong. If the difference between wrap and new is really just when the trampoline is created then I guess they should have no bearing. Also not sure the impact of #1565 but will investigate.

Wasmtime GitHub notifications bot (Jul 20 2020 at 14:56):

alexcrichton commented on Issue #2018:

Oh no worries! What I gisted above was basically a reduction of the CLI to a smaller thing to poke around with. I did some investigation over the weekend, though, and I don't think that this is our bug. I believe that this behavior is a "feature" of perf's support for JIT code.

I managed to reduce this code even more to something that was pretty bizarre. That gist sort of simulates what we're doing in Func::new, which is to map some memory as read/write and then remap it as read/execute. I get symbols correctly in that gist if the mmap size is 4096, but it fails for anything larger than that (like 8192).

Digging into perf itself I found that, for the bad profiles where symbols didn't appear correctly, perf was actually finding and loading all the jitted-*.so executables emitted by perf inject. The only gotcha was that after it loaded everything emitted by perf inject there was a "mmap2" event which claimed that it mapped over all our previously emitted JIT code. AFAIK what's happening under the hood is that perf inject is injecting events into the data stream, feigning loading all the little *.so files into the address space. The perf report tool then recognizes all this and knows what executables (synthetic files created by perf inject) are mapped into the address space and where.

The problem is that this extra "mmap2" event claims that it spans the entire area of JIT code we just emitted. So what perf report is seeing is a careful construction of the address space with a bunch of jitted-*.so files, and then it all gets blown away with the call to mprotect which thinks that new code was mapped over all the previous code. Then perf report has no connection for this "new" region to files on disk, so it presents no symbols.

I did my best to figure out why perf thinks there's an mmap2 event for such a large region of the address space when clearly mprotect is not for a large amount. In the debug logs I was seeing perf think that it was hundreds of kilobytes large when the size passed to mprotect was only 8k. Digging in though it appears that perf, at runtime, will dynamically attempt to guess the size of mprotect and such calls by parsing /proc/self/maps. That, AFAIK, lists all the regions of a process's address space and where they're mapped to with what permissions and such. Digging in to what was happening with the address space this is what I got:

First, just after the call to mmap in the example program above, the process's mappings changed to look like:
--- m1  2020-07-19 10:25:13.379913654 -0700
+++ m2  2020-07-19 10:25:16.219914330 -0700
@@ -118,6 +118,7 @@
 7f6cdcaa6000-7f6cdcca6000 rw-p 00000000 00:00 0
 7f6cdcca6000-7f6cdcca7000 ---p 00000000 00:00 0
 7f6cdcca7000-7f6cdcea7000 rw-p 00000000 00:00 0
+7f6cdd147000-7f6cdd149000 rw-p 00000000 00:00 0
 7f6cdd149000-7f6cdd161000 r-xp 00000000 00:00 0
 7f6cdd161000-7f6cdd162000 ---p 00000000 00:00 0
 7f6cdd162000-7f6cdd164000 rw-p 00000000 00:00 0
After the mprotect call it looks like:
--- m2  2020-07-19 10:25:16.219914330 -0700
+++ m3  2020-07-19 10:25:18.543914882 -0700
@@ -118,8 +118,7 @@
 7f6cdcaa6000-7f6cdcca6000 rw-p 00000000 00:00 0
 7f6cdcca6000-7f6cdcca7000 ---p 00000000 00:00 0
 7f6cdcca7000-7f6cdcea7000 rw-p 00000000 00:00 0
-7f6cdd147000-7f6cdd149000 rw-p 00000000 00:00 0
-7f6cdd149000-7f6cdd161000 r-xp 00000000 00:00 0
+7f6cdd147000-7f6cdd161000 r-xp 00000000 00:00 0
 7f6cdd161000-7f6cdd162000 ---p 00000000 00:00 0
 7f6cdd162000-7f6cdd164000 rw-p 00000000 00:00 0
 7f6cdd164000-7f6cdd165000 ---p 00000000 00:00 0
which I think starts to explain things. It just so happens that our trampoline JIT memory sits just before the rest of the module's JIT memory, so that when we map them all as the same permissions the /proc/self/maps file coalesces everything into one block. This makes perf report think that on the small mprotect it actually affected a very large region, including our JIT code.

The tl;dr; I think is basically:

perf report uses /proc/self/maps to figure out the length of a modification to the address space when something like mmap or mprotect happens

This can cause larger-than-actual-lengths to be reported for operations if regions of memory are coalesced in how they're listed in /proc/self/maps

This can then cause perf report to think that JIT code was overwritten when it in fact was not, which causes perf report to forget about all the information loaded by perf inject.

Overall there's no bug in Wasmtime from what I understand. I'm not even sure if this is a bug in perf, it sort of seems like it's just how it works. In any case I think we may still be able to fix this. One thing we could do is that when --jitdump is enabled we could allocate JIT code with an extra empty page at the end of it all the time. This means that /proc/self/maps won't be able to ever coalesce anything and everything will be disjoint regions. This I believe means that we won't have this issue since perf report will see mmap2 events for all disjoint regions.

Wasmtime GitHub notifications bot (Jul 20 2020 at 16:55):

jlb6740 commented on Issue #2018:

@alexcrichton excellent. Hi Alex, very good analysis. What you describes sounds like a good explanation and I would like to go through your steps and see for myself if I can duplicate your findings. Also, maybe off base but I am wondering about the other record, JIT_CODE_MOVE, and whether it may be of help here? Specifically, if we describe the trampoline and jitted code with the JIT_CODE_LOAD record (are we including the trampoline properly??) then should we be using JIT_CODE_MOVE when the mprotect call happens.

Last updated: Apr 18 2025 at 05:03 UTC