alexcrichton opened Issue #2018:
I noticed recently that jitdump support seems broken on master when used through the wasmtime CLI. I'm pretty perplexed by this, especially because bisection pointed to #1565 as a cause. I think, though, that #1565 as a cause is a red herring, and this has always been a bug. Given this
fib.rs
:fn main() { let n = 42; println!("fib({}) = {}", n, fib(n)); } fn fib(n: u32) -> u32 { if n <= 2 { 1 } else { fib(n - 1) + fib(n - 2) } }
along with this embedding:
use wasmtime::*; fn main() -> Result<(), Box<dyn std::error::Error>> { let mut config = Config::new(); config.profiler(ProfilingStrategy::JitDump)?; let engine = Engine::new(&config); let store = Store::new(&engine); let module = Module::from_file(&engine, "./fib.wasm")?; let instance = Instance::new(&store, &module, &[])?; let start = instance.get_func("main").unwrap(); let start = start.get2::<i32, i32, i32>()?; let f = if false { // changeme Func::wrap(&store, move || { start(0, 0)?; Ok(()) }) } else { let ty = FuncType::new(Box::new([]), Box::new([])); Func::new(&store, ty, move |_, _, _| { start(0, 0)?; Ok(()) }) }; f.call(&[]).unwrap(); Ok(()) }
When this is run as:
$ rustc --target wasm32-unknown-unknown fib.rs -O $ cargo build && perf record -k mono ./target/debug/wasmtime $ perf inject --jit --input perf.data --output perf.jit.data $ perf report --input perf.jit.data
this will not attribute time to
fib::fib
, but instead lots of jit addresses show up:# Overhead Command Shared Object Symbol > # ........ ........ .................. ...........................................................................................................................................................................................> # 2.02% wasmtime [JIT] tid 142064 [.] 0x00007f6d16b4c418 1.73% wasmtime libc-2.31.so [.] __memmove_avx_unaligned_erms 1.34% wasmtime [JIT] tid 142064 [.] 0x00007f6d16b4c422 1.32% wasmtime [JIT] tid 142064 [.] 0x00007f6d16b4c499 1.29% wasmtime wasmtime [.] crossbeam_epoch::internal::Local::pin 1.20% wasmtime wasmtime [.] core::ptr::swap_nonoverlapping_one 1.09% wasmtime wasmtime [.] core::sync::atomic::atomic_load 1.05% wasmtime [JIT] tid 142064 [.] 0x00007f6d16b4c479 1.03% wasmtime [JIT] tid 142064 [.] 0x00007f6d16b4c428
If, however, the line with
//changeme
is switched totrue
then you'll get the profile as expected:# Overhead Command Shared Object Symbol # ........ ........ .................. ........................................................................................................................................................... # 25.77% wasmtime jitted-142410-4.so [.] fib::fib 1.80% wasmtime libc-2.31.so [.] __memmove_avx_unaligned_erms 1.14% wasmtime wasmtime [.] crossbeam_epoch::internal::Local::pin 0.91% wasmtime wasmtime [.] core::sync::atomic::atomic_load 0.91% wasmtime wasmtime [.] core::sync::atomic::atomic_compare_exchange
I'm honestly not entirely sure what's going on. I don't think that our jitdump integration is what's broken. The
*.dump
file emitted looks reasonable,perf inject
appears to work,*.so
files pop up and they've all got functions in them with names as expected. I'm not really sure what's happening here.@jlb6740 do you know more about
perf
perhaps to help debug? It'd be great to know whyperf
is failing to recognize the function in one case but not the other.The only real difference here seems to be what's on the stack, whether it's native versions of trampolines (
Func::wrap
) or runtime-generated versions of trampolines (Func::new
). We don't tell jitdump about the trampolines generated byFunc::new
but after doing that it didn't seem to fix anything. Additionally I think we have unwind info for everything registered, but I'm not even sure why perf would want to unwind since we're not collecting stacks, just samples.Overall I'm at a loss myself and I'm not sure what to do next. I'm hoping @jlb6740 you've got ideas of what might be going on :(
jlb6740 assigned Issue #2018:
I noticed recently that jitdump support seems broken on master when used through the wasmtime CLI. I'm pretty perplexed by this, especially because bisection pointed to #1565 as a cause. I think, though, that #1565 as a cause is a red herring, and this has always been a bug. Given this
fib.rs
:fn main() { let n = 42; println!("fib({}) = {}", n, fib(n)); } fn fib(n: u32) -> u32 { if n <= 2 { 1 } else { fib(n - 1) + fib(n - 2) } }
along with this embedding:
use wasmtime::*; fn main() -> Result<(), Box<dyn std::error::Error>> { let mut config = Config::new(); config.profiler(ProfilingStrategy::JitDump)?; let engine = Engine::new(&config); let store = Store::new(&engine); let module = Module::from_file(&engine, "./fib.wasm")?; let instance = Instance::new(&store, &module, &[])?; let start = instance.get_func("main").unwrap(); let start = start.get2::<i32, i32, i32>()?; let f = if false { // changeme Func::wrap(&store, move || { start(0, 0)?; Ok(()) }) } else { let ty = FuncType::new(Box::new([]), Box::new([])); Func::new(&store, ty, move |_, _, _| { start(0, 0)?; Ok(()) }) }; f.call(&[]).unwrap(); Ok(()) }
When this is run as:
$ rustc --target wasm32-unknown-unknown fib.rs -O $ cargo build && perf record -k mono ./target/debug/wasmtime $ perf inject --jit --input perf.data --output perf.jit.data $ perf report --input perf.jit.data
this will not attribute time to
fib::fib
, but instead lots of jit addresses show up:# Overhead Command Shared Object Symbol > # ........ ........ .................. ...........................................................................................................................................................................................> # 2.02% wasmtime [JIT] tid 142064 [.] 0x00007f6d16b4c418 1.73% wasmtime libc-2.31.so [.] __memmove_avx_unaligned_erms 1.34% wasmtime [JIT] tid 142064 [.] 0x00007f6d16b4c422 1.32% wasmtime [JIT] tid 142064 [.] 0x00007f6d16b4c499 1.29% wasmtime wasmtime [.] crossbeam_epoch::internal::Local::pin 1.20% wasmtime wasmtime [.] core::ptr::swap_nonoverlapping_one 1.09% wasmtime wasmtime [.] core::sync::atomic::atomic_load 1.05% wasmtime [JIT] tid 142064 [.] 0x00007f6d16b4c479 1.03% wasmtime [JIT] tid 142064 [.] 0x00007f6d16b4c428
If, however, the line with
//changeme
is switched totrue
then you'll get the profile as expected:# Overhead Command Shared Object Symbol # ........ ........ .................. ........................................................................................................................................................... # 25.77% wasmtime jitted-142410-4.so [.] fib::fib 1.80% wasmtime libc-2.31.so [.] __memmove_avx_unaligned_erms 1.14% wasmtime wasmtime [.] crossbeam_epoch::internal::Local::pin 0.91% wasmtime wasmtime [.] core::sync::atomic::atomic_load 0.91% wasmtime wasmtime [.] core::sync::atomic::atomic_compare_exchange
I'm honestly not entirely sure what's going on. I don't think that our jitdump integration is what's broken. The
*.dump
file emitted looks reasonable,perf inject
appears to work,*.so
files pop up and they've all got functions in them with names as expected. I'm not really sure what's happening here.@jlb6740 do you know more about
perf
perhaps to help debug? It'd be great to know whyperf
is failing to recognize the function in one case but not the other.The only real difference here seems to be what's on the stack, whether it's native versions of trampolines (
Func::wrap
) or runtime-generated versions of trampolines (Func::new
). We don't tell jitdump about the trampolines generated byFunc::new
but after doing that it didn't seem to fix anything. Additionally I think we have unwind info for everything registered, but I'm not even sure why perf would want to unwind since we're not collecting stacks, just samples.Overall I'm at a loss myself and I'm not sure what to do next. I'm hoping @jlb6740 you've got ideas of what might be going on :(
jlb6740 unassigned Issue #2018 (assigned to jlb6740):
I noticed recently that jitdump support seems broken on master when used through the wasmtime CLI. I'm pretty perplexed by this, especially because bisection pointed to #1565 as a cause. I think, though, that #1565 as a cause is a red herring, and this has always been a bug. Given this
fib.rs
:fn main() { let n = 42; println!("fib({}) = {}", n, fib(n)); } fn fib(n: u32) -> u32 { if n <= 2 { 1 } else { fib(n - 1) + fib(n - 2) } }
along with this embedding:
use wasmtime::*; fn main() -> Result<(), Box<dyn std::error::Error>> { let mut config = Config::new(); config.profiler(ProfilingStrategy::JitDump)?; let engine = Engine::new(&config); let store = Store::new(&engine); let module = Module::from_file(&engine, "./fib.wasm")?; let instance = Instance::new(&store, &module, &[])?; let start = instance.get_func("main").unwrap(); let start = start.get2::<i32, i32, i32>()?; let f = if false { // changeme Func::wrap(&store, move || { start(0, 0)?; Ok(()) }) } else { let ty = FuncType::new(Box::new([]), Box::new([])); Func::new(&store, ty, move |_, _, _| { start(0, 0)?; Ok(()) }) }; f.call(&[]).unwrap(); Ok(()) }
When this is run as:
$ rustc --target wasm32-unknown-unknown fib.rs -O $ cargo build && perf record -k mono ./target/debug/wasmtime $ perf inject --jit --input perf.data --output perf.jit.data $ perf report --input perf.jit.data
this will not attribute time to
fib::fib
, but instead lots of jit addresses show up:# Overhead Command Shared Object Symbol > # ........ ........ .................. ...........................................................................................................................................................................................> # 2.02% wasmtime [JIT] tid 142064 [.] 0x00007f6d16b4c418 1.73% wasmtime libc-2.31.so [.] __memmove_avx_unaligned_erms 1.34% wasmtime [JIT] tid 142064 [.] 0x00007f6d16b4c422 1.32% wasmtime [JIT] tid 142064 [.] 0x00007f6d16b4c499 1.29% wasmtime wasmtime [.] crossbeam_epoch::internal::Local::pin 1.20% wasmtime wasmtime [.] core::ptr::swap_nonoverlapping_one 1.09% wasmtime wasmtime [.] core::sync::atomic::atomic_load 1.05% wasmtime [JIT] tid 142064 [.] 0x00007f6d16b4c479 1.03% wasmtime [JIT] tid 142064 [.] 0x00007f6d16b4c428
If, however, the line with
//changeme
is switched totrue
then you'll get the profile as expected:# Overhead Command Shared Object Symbol # ........ ........ .................. ........................................................................................................................................................... # 25.77% wasmtime jitted-142410-4.so [.] fib::fib 1.80% wasmtime libc-2.31.so [.] __memmove_avx_unaligned_erms 1.14% wasmtime wasmtime [.] crossbeam_epoch::internal::Local::pin 0.91% wasmtime wasmtime [.] core::sync::atomic::atomic_load 0.91% wasmtime wasmtime [.] core::sync::atomic::atomic_compare_exchange
I'm honestly not entirely sure what's going on. I don't think that our jitdump integration is what's broken. The
*.dump
file emitted looks reasonable,perf inject
appears to work,*.so
files pop up and they've all got functions in them with names as expected. I'm not really sure what's happening here.@jlb6740 do you know more about
perf
perhaps to help debug? It'd be great to know whyperf
is failing to recognize the function in one case but not the other.The only real difference here seems to be what's on the stack, whether it's native versions of trampolines (
Func::wrap
) or runtime-generated versions of trampolines (Func::new
). We don't tell jitdump about the trampolines generated byFunc::new
but after doing that it didn't seem to fix anything. Additionally I think we have unwind info for everything registered, but I'm not even sure why perf would want to unwind since we're not collecting stacks, just samples.Overall I'm at a loss myself and I'm not sure what to do next. I'm hoping @jlb6740 you've got ideas of what might be going on :(
jlb6740 commented on Issue #2018:
@alexcrichton Sorry for taking so long to chime in. Yes you're right, the only difference does look like how the func is created. I have not run jitdump with an embedded instantiation in this way but will try to trace to see what is going wrong. If the difference between wrap and new is really just when the trampoline is created then I guess they should have no bearing. Also not sure the impact of #1565 but will investigate.
alexcrichton commented on Issue #2018:
Oh no worries! What I gisted above was basically a reduction of the CLI to a smaller thing to poke around with. I did some investigation over the weekend, though, and I don't think that this is our bug. I believe that this behavior is a "feature" of
perf
's support for JIT code.I managed to reduce this code even more to something that was pretty bizarre. That gist sort of simulates what we're doing in
Func::new
, which is to map some memory as read/write and then remap it as read/execute. I get symbols correctly in that gist if the mmap size is 4096, but it fails for anything larger than that (like 8192).Digging into
perf
itself I found that, for the bad profiles where symbols didn't appear correctly,perf
was actually finding and loading all thejitted-*.so
executables emitted byperf inject
. The only gotcha was that after it loaded everything emitted byperf inject
there was a "mmap2" event which claimed that it mapped over all our previously emitted JIT code. AFAIK what's happening under the hood is thatperf inject
is injecting events into the data stream, feigning loading all the little*.so
files into the address space. Theperf report
tool then recognizes all this and knows what executables (synthetic files created byperf inject
) are mapped into the address space and where.The problem is that this extra "mmap2" event claims that it spans the entire area of JIT code we just emitted. So what
perf report
is seeing is a careful construction of the address space with a bunch ofjitted-*.so
files, and then it all gets blown away with the call tomprotect
which thinks that new code was mapped over all the previous code. Thenperf report
has no connection for this "new" region to files on disk, so it presents no symbols.I did my best to figure out why perf thinks there's an mmap2 event for such a large region of the address space when clearly
mprotect
is not for a large amount. In the debug logs I was seeing perf think that it was hundreds of kilobytes large when the size passed tomprotect
was only 8k. Digging in though it appears thatperf
, at runtime, will dynamically attempt to guess the size ofmprotect
and such calls by parsing/proc/self/maps
. That, AFAIK, lists all the regions of a process's address space and where they're mapped to with what permissions and such. Digging in to what was happening with the address space this is what I got:First, just after the call to
mmap
in the example program above, the process's mappings changed to look like:--- m1 2020-07-19 10:25:13.379913654 -0700 +++ m2 2020-07-19 10:25:16.219914330 -0700 @@ -118,6 +118,7 @@ 7f6cdcaa6000-7f6cdcca6000 rw-p 00000000 00:00 0 7f6cdcca6000-7f6cdcca7000 ---p 00000000 00:00 0 7f6cdcca7000-7f6cdcea7000 rw-p 00000000 00:00 0 +7f6cdd147000-7f6cdd149000 rw-p 00000000 00:00 0 7f6cdd149000-7f6cdd161000 r-xp 00000000 00:00 0 7f6cdd161000-7f6cdd162000 ---p 00000000 00:00 0 7f6cdd162000-7f6cdd164000 rw-p 00000000 00:00 0
After the
mprotect
call it looks like:--- m2 2020-07-19 10:25:16.219914330 -0700 +++ m3 2020-07-19 10:25:18.543914882 -0700 @@ -118,8 +118,7 @@ 7f6cdcaa6000-7f6cdcca6000 rw-p 00000000 00:00 0 7f6cdcca6000-7f6cdcca7000 ---p 00000000 00:00 0 7f6cdcca7000-7f6cdcea7000 rw-p 00000000 00:00 0 -7f6cdd147000-7f6cdd149000 rw-p 00000000 00:00 0 -7f6cdd149000-7f6cdd161000 r-xp 00000000 00:00 0 +7f6cdd147000-7f6cdd161000 r-xp 00000000 00:00 0 7f6cdd161000-7f6cdd162000 ---p 00000000 00:00 0 7f6cdd162000-7f6cdd164000 rw-p 00000000 00:00 0 7f6cdd164000-7f6cdd165000 ---p 00000000 00:00 0
which I think starts to explain things. It just so happens that our trampoline JIT memory sits just before the rest of the module's JIT memory, so that when we map them all as the same permissions the
/proc/self/maps
file coalesces everything into one block. This makesperf report
think that on the smallmprotect
it actually affected a very large region, including our JIT code.The tl;dr; I think is basically:
perf report
uses /proc/self/maps to figure out the length of a modification to the address space when something likemmap
ormprotect
happens- This can cause larger-than-actual-lengths to be reported for operations if regions of memory are coalesced in how they're listed in
/proc/self/maps
- This can then cause
perf report
to think that JIT code was overwritten when it in fact was not, which causesperf report
to forget about all the information loaded byperf inject
.Overall there's no bug in Wasmtime from what I understand. I'm not even sure if this is a bug in
perf
, it sort of seems like it's just how it works. In any case I think we may still be able to fix this. One thing we could do is that when--jitdump
is enabled we could allocate JIT code with an extra empty page at the end of it all the time. This means that/proc/self/maps
won't be able to ever coalesce anything and everything will be disjoint regions. This I believe means that we won't have this issue sinceperf report
will see mmap2 events for all disjoint regions.
jlb6740 commented on Issue #2018:
@alexcrichton excellent. Hi Alex, very good analysis. What you describes sounds like a good explanation and I would like to go through your steps and see for myself if I can duplicate your findings. Also, maybe off base but I am wondering about the other record, JIT_CODE_MOVE, and whether it may be of help here? Specifically, if we describe the trampoline and jitted code with the JIT_CODE_LOAD record (are we including the trampoline properly??) then should we be using JIT_CODE_MOVE when the mprotect call happens.
Last updated: Nov 22 2024 at 16:03 UTC