ShuyaoJiang opened issue #7732:
Summary
Hi, I ran the attached case (C program, compiled to Wasm by
Emscripten
) in different Wasm runtimes, and found abnormal performance inWasmtime
compared with other runtimes. The execution time of this case (time interval from the start to the end of the execution of Wasm bytecode running command) on different runtimes is as follows:
- Wasmtime: 136ms
- WasmEdge (AOT): 16ms
- WAMR (AOT): 11ms
We found that in most other test cases, Wasmtime can achieve similar performance (1-2x) with the other two runtimes. However, in this case, Wasmtime is about 10x slower than the other two runtimes.
Emscripten
- emcc (Emscripten gcc/clang-like replacement + linker emulating GNU ld) 3.1.49 (04a0cad4d4c9f5d62876821274d78cd0a52427af)
clang version 18.0.0 (https://github.com/llvm/llvm-project 269685545e439ad050b67740533c59f965cae955)
Target: wasm32-unknown-emscripten
Thread model: posixWasm Runtime Version
- Wasmtime: cli 15.0.0
- WasmEdge: 0.13.5
- WAMR: 1.2.3
Hardware & OS
- CPU: Intel(R) Xeon(R) E5-2686 v4 CPU @ 2.30GHz
- Memory: 16GB
- OS: Ubuntu 20.04.6 LTS
Additional details
The attached source program is synthesized by a Csmith seed and a code snippet from another program. The inserted code snippet is on lines
2370-2375
of the source program, which is a loop accessing a pointer to a constant (the code snippet is attached below). So, we think that this abnormal performance may be caused by some improper handling when accessing such pointers to constant. Could you please check this situation? Thank you!for (print_hash_value = 0; (print_hash_value > 16); ++print_hash_value) { /* block id: 394 */ if ((*g_1123)) break; }
fitzgen commented on issue #7732:
Thanks for the bug report.
Similar to the other issue you filed, can you separate timing of the compilation and execution phases and let us know whether execution is still 10x slower than others?
Thanks!
alexcrichton commented on issue #7732:
I've taken a look at this locally (thanks for the report!) and I've used this script
<details>
<summary><code>run.mjs</code></summary>
import { readFile } from 'node:fs/promises'; import { WASI } from 'wasi'; import { argv } from 'node:process'; const wasi = new WASI({ version: 'preview1', args: argv.slice(2), }); const m = process.argv[2]; const wasm = await WebAssembly.compile(await readFile(m)); const obj = wasi.getImportObject(); obj.env = { memory: new WebAssembly.Memory({shared: true, initial: 8000, maximum: 16000}), }; obj.wasi = { 'thread-spawn': function() { throw new Error('wasi thread spawn'); }, }; const instance = await WebAssembly.instantiate(wasm, obj); const initial = performance.now(); wasi.start(instance); console.log(performance.now() - initial);
</details>
to compare Wasmtime and node (as those were the easiest comparisons I could get on hand).
I added a little instrumentation to
wasmtime run
and I got Node executing this function in 1.2ms and Wasmtime was in 400us. Given that I don't think that this is related to runtime, so I would instead also be interested in what @fitzgen of seeing if the timing for Wasmtime included the compilation to native for the Wasm module.Now that being said if Cranelift takes 100ms to compile something that LLVM compiles in 10ms that's also a bug, so still interested in this!
Last updated: Dec 23 2024 at 12:05 UTC