https://github.com/fitzgen/winliner
https://crates.io/crates/winliner
https://docs.rs/winliner/latest/winliner/
Winliner speculatively inlines indirect calls in WebAssembly, based on observed information from a previous profiling phase. This is a form of profile-guided optimization that we affectionately call winlining.
First, Winliner inserts instrumentation to observe the actual target callee of every indirect call site in your Wasm program. Next, you run the instrumented program for a while, building up a profile. Finally, you invoke Winliner again, this time providing it with the recorded profile, and it optimizes your Wasm program based on the behavior observed in that profile.
Similar to Wizer, the wins you get from this tool are going to be very dependent on your exact workload. How bottlenecked are you on indirect calls? And how monomorphic are those indirect calls in practice?
Some very very very preliminary results for spidermonkey.wasm
show octane's richards benchmark (which happens to be the only benchmark I have on hand) going from 141
to 232
(higher is better) for a speed up of 1.65x
!
So, give Winliner a try and let me know if you run into any bugs or see any fun results!
Some very very very preliminary results for spidermonkey.wasm show octane's richards benchmark (which happens to be the only benchmark I have on hand) going from 141 to 232 (higher is better) for a speed up of 1.65x!
Important clarification for folks -- that's on a test case that has ICs added/generated by some (not yet released) tooling with PBL and weval; the speedup comes from inlining the ICs. In other words the case was full of just-ready-to-be-inlined code. Unfortunately this speedup isn't as likely to come from arbitrary inlining on general programs (though we hypothesize C++ vtables might give some opportunity elsewhere!)
Last updated: Nov 22 2024 at 16:03 UTC