What is the current best practice to measure and report WebAssembly runtime performance(outside browser)? One of the most recent publications in this field is this: https://github.com/wabench. I have currently the following concerns:
For the first concern, I am thinking that I could either measure the time for an "empty" main program and subtract that from other measurements. For the second I guess I could add a repetition loop inside the main and disregard the first 100 iterations of 200 (or how long it takes to warm up the JIT). The first concern could also be resolved with this method having the timing done in the code itself.
Finally, would it be meaningful for the Bytecode Alliance to consider setting up an approved runtime performance benchmark suite? I would be interested in helping out with this work, if wo. Besides wabench we also have the libsodium which has been used to benchmark wasm runtimes as well: https://00f.net/2023/01/04/webassembly-benchmark-2023/
We use the sightglass benchmark suite to track Wasmtime/Cranelift performance for our own needs but actively discourage benchmark wars
Thanks a lot, it's very helpful.
However, if I were to compare two runtimes for use outside the browser, do you have any comments on my concerns?
fitzgen (he/him) said:
We use the sightglass benchmark suite to track Wasmtime/Cranelift performance for our own needs but actively discourage benchmark wars
- https://github.com/bytecodealliance/sightglass
- Details and aspirations here: https://github.com/bytecodealliance/rfcs/blob/main/accepted/benchmark-suite.md
- In particular see the "non-goals" section
Your specific concerns are addressed by Sightglass: runtime setup/teardown is addressed with an explicit separation of phases, so Sightglass measures only the actual (compilation, instantiation, execution) and nothing else; JIT effects are N/A in Wasmtime as we compile the whole module with our only tier (Cranelift optimizing mode) on load; and we avoid excessive IO in benchmarks that could become a bottleneck instead of CPU time
I'll add a bit of detail to @fitzgen (he/him) 's point about "benchmark wars": we recognize folks will want to compare engines, and actually I think that's a very useful data point (of course one wants to know whether an engine is fast enough, or which one is most efficient, when choosing what to use), but the main concern is just against incentivizing a "performance at any cost" culture. This comes in particular from the browser world where a singleminded focus on microbenchmark performance led JS engines astray with overcomplex optimizations; and in the case of the web it turns out that most code is cold and load/start time matters a lot more. The analogous guidance in the Wasm engine world might be: it's important to know what the whole system's performance characteristics and requirements are, and consciously choose a design point that fits those. E.g. Cranelift is less optimizing than LLVM but it's far simpler, and we're working on novel ways to formally verify it, which is made feasible because of its simplicity
All of that said, though, we are interested (or at least I am) in relative performance measurements insofar as they can point out where we can do better. If you find anything surprising, please do let us know!
from the corporate world I can add: We only benchmark the solution, then drive in on bottlenecks. In distributed work, it's the solution speed that matters, not the pure speed of a single component. That said, we've benchmarked several runtimes including wasmtime in high performance solutions and.... the runtimes are not the throughput problem.
A question about Sightglass, can the CPU cycles it reports be translated into real time by dividing them with the CPU clock frequency (assuming it is locked)? Is time spent waiting for IO to complete and system calls included in the CPU cycles?
can the CPU cycles it reports be translated into real time by dividing them with the CPU clock frequency (assuming it is locked)?
It can be translated into task clock time. The task clock time is the result of adding the time the process spent on each cpu core (excluding waiting for io or sleeping) together. The wall clock time is the time from the start of the process until the end. This includes any time waiting on io and sleeping and counts any second during which multiple cpu cores are used as a single second, while task clock time counts a second for each cpu core that was running it.
Is time spent waiting for IO to complete and system calls included in the CPU cycles?
No for IO, yes for system calls. cpu-cycles includes system calls. cpu-cycles:u only includes userspace time. Sightglass uses cpu-cycles, not cpu-cycles:u.
@wsta, you might be interested in the other measurement mechanisms in Sightglass as well: cycles
is the default since it is the most portable but you might interested in perf-counters
on Linux, which uses perf
to get different rows of data. There are others there if you run sightglass benchmark --help
and more can be added if needed.
Thanks to you both!
Ralph said:
from the corporate world I can add: We only benchmark the solution, then drive in on bottlenecks. In distributed work, it's the solution speed that matters, not the pure speed of a single component. That said, we've benchmarked several runtimes including wasmtime in high performance solutions and.... the runtimes are not the throughput problem.
I'm joining this conversation late, and thanks @Chris Fallin for pointing me here.
Yes, the overall solution is important, however, which choosing a runtime implementation having knowledge of any peformance characteristics and differences between the engines is important. It is a comparison we did. That's not to say one engine is "better" than another. That is only to say some engines are designed for different environments and cope with differing workloads in different ways. - After all, we have both Wasmtime and WAMR.
It is important that our internal engineering teams understand the characteristics of each.
Re posting here, after a great redirect
Hi folks. We've a framework we created for building and testing multiple non-web based WASM engines, it gathers informaiton on jitter, latency, and compares various algortthms between the differning engines and against native speed too. We shared our results initial at the WASM Research event last year. It's taken a while, but we've got the necessary internal approval (it took a while, longer than expected) to share this framework and the tests as an Apache2 licensed open source contribution....
We're hoping this would help toward to wider community goal of having a shared set of tests and a framework to run time. It was originally created by us as a home grown way to provide some repeatable, cross engine performance tests. To allow us to better understand the timing, jitter, and potential real time use cases for WASM and the various runtimes. The test framework accounts for the different engines and all the possible ways in which they could be configured, for for WAMR - it covers AoT, JIT, Fast Interpreter, and Standard Interpreter.
What I think is of value here, is that the framework includes all the scafolding and code for downloading the latest branchs of various non-web runtimes, building them each several times with muiltiple configuration options and then running a set of tests. Each test is run multiple times. As a guide a single test run would gather over 50k data points.
Now the tests themsevles are not limited, they can of course be changed. But our inital thought was that the framework and the scaffolding it provides might be useful if there was a desire to create cross engine performance tooling...
I see SightGlass exists, and to be fair we created this framework in isolation, so I'm not sure if some of it would be of interest at all. - Of course no offence taken if it isn't.
Are there any other efforts within the alliance to create this type of cross-engine performance analysis? -- Is this something we feel we'd need?
As I post this - I'm playing catch up and reading @fitzgen (he/him) 's benchmark-suite link..
more information is always good. I have never, ever met humans that didn't use the requirement for component benchmarking as an excuse to ignore the solution's performance. I include myself in that space. It's a human thing, generally. But I love more info, too.
@Chris Woods, I'd be interested to talk to you about your framework and I can probably more quickly compare and contrast it with Sightglass since I've worked on that project. I will say that Sightglass has needed to add the ability to add more Wasm engines for a while now but some of that work has been stuck due to other work (I started but did not finish a V8 integration here and @Yury Delendik had a SpiderMonkey integration that has not been upstreamed). So I think your work is in the "might be helpful!" category. Do you want to message me privately to set up time to talk?
Hi @Andrew Brown - thanks for the ping, yes would love to chat. I'll drop you a message to setup a time for a chat!
@Chris Woods do you have a link to your benchmark framework?
@Mats Brorsson Hey Mats, I'm currently caught in a catch-22, the internal company process wants to know where I will share it. Happy to host a 20-30 min meeting to give an overview of the framework, if that would be of interest ?
Definitely, I’d like my PhD student to join as well. Which time zone are you in. We’re at CET.
Last updated: Jan 24 2025 at 00:11 UTC