jlb6740 commented on Issue #1017:
@alexcrichton Hi, my understand of what may be going on here is evolving. It seems clear that using the rust allocator (calls malloc eventually) instead of mmap breaks jitdump support, but I have not pinpointed a way for confirmation to understand mitigations that don't require reverting to mmap. Here https://lore.kernel.org/patchwork/cover/622238/ and here https://lore.kernel.org/patchwork/patch/622240/ it appears it is implied that perf is using mmap records in the perf.dat file to help determine the injection points for the jitted code captured in the jitdump file. That makes sense but when doing something like: "sudo perf script --input perf.data --show-mmap-events | grep MMAP" to print mmaps recorded, data files both before and after the allocator change appear consistent. I am convinced thought the issue lies with perf inject not being able to figure out how to inject the jitted code. I'll do more digging.
@yurydelendik @alexcrichton In the meantime note ittapi for vtune support has been rebased and is not hampered by the allocator change and so https://github.com/bytecodealliance/wasmtime/pull/819 is unaffected.
jlb6740 commented on Issue #1017:
Note also, to reproduce:
git checkout -b test_with_mmap c8ab1e293e8457c61e34a74baad6649f446dbdc7
git revert -n b15b5cd
git cherry-pick f322555a33e4b63fedc6682cd47c0241dff65e28
... compile with jitdump feature flag and follow instructions in jitdump.rsvs
git checkout -b test_with_malloc c8ab1e293e8457c61e34a74baad6649f446dbdc7
git cherry-pick f322555a33e4b63fedc6682cd47c0241dff65e28
... compile with jitdump feature flag and follow instructions in jitdump.rs
alexcrichton commented on Issue #1017:
Ah sorry for the delay, but I'm trying to reproduce this locally but I seem to be having difficulty.
Using this script (and a custom-built
perf
from the latest kernel branch):set -ex cargo build --release --features jitdump rustc fib.rs --target wasm32-wasi -Cdebuginfo=0 $HOME/code/linux/tools/perf/perf record -k 1 -e instructions:u target/release/wasmtime -g --jitdump fib.wasm $HOME/code/linux/tools/perf/perf inject -v -j -i perf.data -o perf.jit.data exec $HOME/code/linux/tools/perf/perf report -i perf.jit.data -F+period,srclineand
fib.rs
looking like:fn main() { let n = 40; println!("fib({}) = {}", n, fib(n)); } fn fib(n: u32) -> u32 { if n <= 2 { 1 } else { fib(n - 1) + fib(n - 2) } }I get:
$ sh run.sh + cargo build --release --features jitdump Finished release [optimized] target(s) in 0.09s + rustc fib.rs --target wasm32-wasi -Cdebuginfo=0 + /home/alex/code/linux/tools/perf/perf record -k 1 -e instructions:u target/release/wasmtime -g --jitdump fib.wasm fib(1) = 1 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.017 MB perf.data (381 samples) ] + /home/alex/code/linux/tools/perf/perf inject -v -j -i perf.data -o perf.jit.data build id event received for [kernel.kallsyms]: 8081b1e9482d2e82a84b1fdbadd94e2f03abc94e build id event received for /home/alex/code/wasmtime/target/release/wasmtime: b1d159cff0963d77ad19d66a2d17318c17b9db7e build id event received for /lib/x86_64-linux-gnu/ld-2.27.so: 64df1b961228382fe18684249ed800ab1dceaad4 build id event received for [vdso]: d94bf3cbb1070beb0ee71a51bbb2b4b3a26d353e build id event received for /lib/x86_64-linux-gnu/libpthread-2.27.so: 28c6aade70b2d40d1f0f3d0a1a0cad1ab816448f build id event received for /lib/x86_64-linux-gnu/libgcc_s.so.1: b6a35ea137b91d88bc9f0a2dbf04000b0db8c874 build id event received for /lib/x86_64-linux-gnu/libc-2.27.so: b417c0ba7cc5cf06d1d1bed6652cedb9253c60d0 jit marker found: /home/alex/code/wasmtime/jit-21004.dump injecting: /home/alex/code/wasmtime/jit-21004.dumpEven after I revert the malloc change I still ge tthe same error, so I'm not sure what it is I'm doing wrong?
jlb6740 commented on Issue #1017:
@alexcrichton Hi, thanks for trying this! Did you try using sudo? Without sudo perf and related tools appear to be limited. There is a setting, perf_event_paranoid that I read should help, but I just always use sudo while doing any perf command.
jlb6740 edited a comment on Issue #1017:
@alexcrichton Hi, thanks for trying this! Did you try using sudo? Without sudo perf and related tools appear to be limited. There is a setting, perf_event_paranoid that I read should help, but I just always use sudo while doing any perf command. Also ... what exact error are you seeing. I think it maybe got cut off?
jlb6740 commented on Issue #1017:
@alexcrichton Also separately, are your HEAD set to c8ab1e2. I ask because when I compile the fib.rs file and try to run, it will hang if I use the -g option ... even when not dumping a jitdump file.
alexcrichton commented on Issue #1017:
Ah yeah it's true I didn't use that same HEAD, I used master of wasmtime yesterday. I believe the hang was resolved by https://github.com/bytecodealliance/wasmtime/pull/1228 so I wanted a build that included that.
I do seem to be getting further with
sudo
, thanks for the tip! I thought that paranoid thing was already set and meant I didn't need sudo, but alas!With that though I'm still getting confusing results, nothing really looks like it's symbolicated. Going off the master branch, and the HEAD you're using, both have confusingly verbose perf reports before/after the revert of the malloc/mmap commit for me locally.
Can you gist what it's expected to see from
perf
?
jlb6740 commented on Issue #1017:
Hi @alexcrichton, the difference manifest in perf report and perf annotate and also perf script on the injected file where you can see the jitted.so file is correctly associated with instruction samples when working correctly. In the screen shots below you can see the difference. The left is when malloc is used to create the instance versus the right when mmap is used. I don't feel comfortable not understanding why perf needs this region to be mmaped notes from the posted conversations about the patches imply that is the case. During perf inject you can see perf have trouble accessing an map file in temp that should be associated with the pid of the process and it not have trouble on the mmapped version, but things aren't crystal clear. I am going through the perf kernel code now to see if I can understand better.
perf report (left is using the malloc patch, right is with it removed)
![image](https://user-images.githubusercontent.com/45402135/76563624-02d1cd80-6465-11ea-8cea-883a262f559c.png)perf report with drill down into annotation
![image](https://user-images.githubusercontent.com/45402135/76563649-0c5b3580-6465-11ea-8ee7-2b84e9aa818d.png)
alexcrichton commented on Issue #1017:
Ok I've been poking around this a lot more and I've managed to get some degree of success. I think there's a lot going on under the hood and there's a huge number of places things could go wrong, and I don't think malloc/mmap of an instance is one of them.
My main discovery I've found so far is that if you remove the dwarf info from the wasm module input it appears to work. In my fib example
perf report
yields:Samples: 1K of event 'instructions:u', Event count (approx.): 4226064351 Overhead Command Shared Object Symbol 99.89% wasmtime jitted-30178-0.so [.] wasm_module::_ZN3fib3fib17hedcc9d2af68c6e00E 0.01% wasmtime libc-2.27.so [.] __memmove_avx_unaligned_erms 0.01% wasmtime ld-2.27.so [.] _dl_relocate_object 0.01% wasmtime wasmtime [.] <std::collections::hash::map::DefaultHasher as core::hash::Hasher>::writewhich is basically what I would want to see (modulo some demangling ideally). If I annotate the
fib
function I get:Samples: 1K of event 'instructions:u', 4000 Hz, Event count (approx.): 4226064351 wasm_module::_ZN3fib3fib17hedcc9d2af68c6e00E /home/alex/code/wasmtime/wut/jitted-30178-0.so [Percent: local period] Percent│ │ │ │ Disassembly of section .text: │ │ 0000000000000040 <wasm_module::_ZN3fib3fib17hedcc9d2af68c6e00E>: │ wasm_module::_ZN3fib3fib17hedcc9d2af68c6e00E(): 5.17 │ rex push %rbp 1.50 │ mov %rsp,%rbp 4.63 │ push %r12 2.24 │ push %r13 2.06 │ push %r14 3.28 │ push %r15 2.52 │ sub $0x20,%rsp 2.42 │ mov %rdi,0x10(%rsp) 5.88 │ rex mov %edx,0x1c(%rsp) 3.07 │ rex mov $0x1,%eax 1.51 │ rex mov %eax,0x18(%rsp) 2.47 │ mov %edx,%r15d 2.37 │ cmp $0x3,%r15d │ rex jb e4 <wasm_module::_ZN3fib3fib17hedcc9d2af68c6e00E+0xa4> 1.73 │ movabs $0x7fe11e9b0000,%rax 1.85 │ mov %rax,0x8(%rsp) 3.56 │ mov 0x1c(%rsp),%r15d 2.49 │ add $0xffffffff,%r15d 3.07 │ mov 0x10(%rsp),%r14 1.12 │ mov 0x8(%rsp),%r13 3.19 │ mov %r14,%r12 1.81 │ mov %r12,%rdi 2.21 │ mov %r14,%rsi 1.46 │ mov %r15d,%edx 2.69 │ → callq *%r13 2.75 │ mov 0x18(%rsp),%r15d 3.11 │ add %r15d,%eax 1.95 │ rex mov %eax,0x18(%rsp) 3.40 │ mov 0x1c(%rsp),%r15d 1.75 │ add $0xfffffffe,%r15d 2.34 │ mov %r15d,0x1c(%rsp) 1.32 │ cmp $0x2,%r15d │ rex ja 8b <wasm_module::_ZN3fib3fib17hedcc9d2af68c6e00E+0x4b> ...basically things actually look pretty accurate here.
That was generated on the master branch with a few modifications:
- I manually removed dwarf info from the input file
- I manually removed emission of dwarf-related entries into the jitdump file
I suspect that there's probably bugs in the handling of dwarf information? Either that or I suspect there's bugs in how there's tons of dwarf in a normal wasm file from rustc which is related to libstd, but most of libstd isn't present in the wasm file itself. (and maybe there's dwarf lacking for the functions generated in the file?)
After reviewing the code a bit I think that we may want to refactor this quite a bit? One improvement I can think of is that the
--jitdump
flag shouldn't require-g
, and without-g
we should still emit a code load entry per-function instead of one per-module. We've got the metadata to know where all the functions are and we at least have the name section. I think the way the code should be structured is that we always emit one code load entry per function in a wasm module, and then optionally, if available, we emit extra debuginfo entries as well.I haven't really looked too too much into the debuginfo processing itself, but I think that the general output of the jitdump module will be improved if debuginfo is only augmenting existing records rather than driving the emission of records. My current suspicion is that the debuginfo augmentation is perhaps buggy right now which leads
perf inject
to run amok when parsing the dump file, leading to corrupted annotations and misinformation.I'm not really sure why it worked before or after the mmap/malloc change. I read over the patches you linked and I don't think that it has anything to do with the mmap/malloc change that we made. We made a change of how an
Instance
is allocated but it doesn't really have anything to do with JIT code or anything like that. The best I could surmise is thatmmap
is just required for the dump file to happen at runtime, soperf inject
knows how to find the dump file at all. It didn't even look, though, likemmap
-business was required for the code itself.
jlb6740 commented on Issue #1017:
This is very interest! Thanks for poking around here. note, I am not using -g in the experiments above so I'd like to reproduce what you've done since that shouldn't be an issue for me either. In general, the -g is not required. It is only an option to take advantage of dwarf included in the wasm to include line information in the annotation. If dwarf isn't included the jitdump should still be generated without issue, but perhaps there is something more there to look at even when the flag isn't used.
What steps can I use to reproduce your findings? BTW .. I have a habit of using C .. my fib wasm file is actually compiled from C using Clang and includes dwarf debug but I should also try the version that doesn't. I should also try your Rust version ... but that didn't run on my HEAD due to the bug you pointed out.
"The best I could surmise is that mmap is just required for the dump file to happen at runtime, so perf inject knows how to find the dump file at all. It didn't even look, though, like mmap-business was required for the code itself."
What are you thinking here? The jitdump file needs to mmap'd during running so that it is injected into the original perf file. Ok, I think I see. Yes, I think you are correct. Perhaps using mmap to create this jitdump file is the only extra mmap requirement and the problem indeed is something else. This is good. Let me try your steps to reproduce. Was your experiment done with HEAD on a rust generated wasm file with no debug information?
alexcrichton commented on Issue #1017:
Oh sure yeah, lemme write some stuff down. So as I was reading all this I started to do a bit of the refactoring I was mentioning, and I've made a commit but it doesn't work at all with debuginfo. Without debuginfo it should work though.
In any case, I've been using
-g
this whole time because of this comment but actually wasmtime from master works for me without-g
. Using the master branch I see:99.90% wasmtime jitted-25162-0.so [.] wasm_module
. I think the only issue is that the entire wasm module is lumped into one function instead of being split apart (but my commit fixes that).In that sense I don't actually need any changes to get
perf
working without-g
, and only with-g
do things start going awry.
jlb6740 commented on Issue #1017:
Think I can close this. Based on discussion here the code has been refactored: https://github.com/bytecodealliance/wasmtime/pull/360/ and is working for most examples (though not 100%). Also it is thought that #948 simply highlighted a bug and profiling support should work fine with the change.
jlb6740 closed Issue #1017:
@yurydelendik @alexcrichton
Hi ... an initial patch that provides support for Perf's jitdump specification merged recently (https://github.com/bytecodealliance/wasmtime/pull/360/) and another JIT supporting patch based on ittapi is in review (https://github.com/bytecodealliance/wasmtime/pull/819). I recently noticed though that the jitdump is suddenly no longer resolving properly when doing a perf report. I've traced the issue to a patch submitted right before merging the perf jitdump patch that apparently my last manual end-to-end tests weren't rebased against before it merged. It is the patch https://github.com/bytecodealliance/wasmtime/pull/948 which does away with mmap in favor a mechanism that uses the alloc crate which I believe uses malloc when creating instances in memory. There are no rust errors and perf report -v does not show any errors, but I think this is fatal to perf jit support (and maybe any jit support of wasmtime) because perf is trying to mmap that jitted memory region and afaik you can't mmap a malloc'd region. This theory may be incomplete, but certainly doing a "git revert -n b15b5cd05" resolves the issue. Not sure the best way forward here.
Also this brings up a gap in testing where I am not sure how to automatically test the breakage of external tools such as the issue here.
jlb6740 edited a comment on Issue #1017:
Think I can close this. Based on discussion here the code has been refactored: (#1310) and is working for most examples (though not 100%). Also it is thought that (#948) simply highlighted a bug and profiling support should work fine with the change.
Last updated: Jan 24 2025 at 00:11 UTC