Stream: general

Topic: memory overhead and latency


view this post on Zulip Audacious Tux (Nov 26 2021 at 08:13):

hi... can someone shade some light one these queries?

  1. Lowest runtime overhead of just the Wasm VM and runtime for dummy code, say for C - excluding the initial page size
  2. Lowest startup latency for dummy code, for C
  3. How low you foresee the overhead can get reduced in the next couple years

view this post on Zulip bjorn3 (Nov 26 2021 at 13:48):

See https://github.com/bytecodealliance/wasmtime/pull/3319#issuecomment-916236710 for benchmarks of the overhead of calling between wasm code and the host.

This commit is an alternative to #3298 which achieves effectively the same goal of optimizing the Func::call API as well as its C API sibling of wasmtime_func_call. The strategy taken here is diffe...

view this post on Zulip bjorn3 (Nov 26 2021 at 13:49):

For reducing the startup time you can precompile the wasm using wasmtime compile.

view this post on Zulip Chris Woods (Nov 30 2021 at 18:57):

We've done some WAMR profiling, and memory wise, we got WAMR + zephyr rtos running on STM microcontroller with 340kb of RAM, the runtime appeared to take roughly ~120kb of RAM, the Wasm application's we were running (c based) were between 3kb - 5kb of RAM.

We've done some more extensive testing recently on real-time performance, and I'd be happy to share what we have.

view this post on Zulip Wang Xin (Dec 01 2021 at 01:36):

@Chris Woods In our test with WAMR, we can manage to run a hello-world wasm module under 5KB RAM (given runtime binary directly run from flash). We can follow up on the details.

view this post on Zulip Chris Woods (Dec 01 2021 at 12:23):

@Wang Xin That's awesome. I believe, in our tests, we struggled with an STM with less ram - at the time we just assumed that was due to the RTOS + runtime overhead. I don't think we measured the runtime only overhead.

view this post on Zulip Wenyong Huang (Dec 02 2021 at 01:16):

@Chris Woods Have you referred to the following document to build the wasm app so as to reduce the memory usage?
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/build_wasm_app.md#2-how-to-reduce-the-footprint
Normally in IoT system, as file io operation isn't needed, we can build the wasm app with '-nostdlib' mode and run it with WAMR's builtin libc, for example, use the following command to build wasm app:
/opt/wasi-sdk/bin/clang -O3 -z stack-size=4096 -Wl,--initial-memory=65536 \
-o test.wasm main.c \
-Wl,--export=__main_argc_argv -Wl,--export=main \
-Wl,--export=__heap_base,--export=__data_end \
-nostdlib -Wl,--no-entry -Wl,--strip-all -Wl,--allow-undefined
Here '-z stack-size=4096' is to specify the auxiliary stack size (auxiliary stack is part of linear memory), you can increase it if it is not enough.
And for iwasm, you can specify the heap size and stack size, with "iwasm --heap-size=n --stack-size=n".
Or specify them when calling wasm_runtime_instantiate:
wasm_module_inst_t
wasm_runtime_instantiate(const wasm_module_t module,
uint32_t stack_size, uint32_t heap_size,
char *error_buf, uint32_t error_buf_size);
By default they are both16KB, but some simple wasm apps, we can decrease them, for example, 4KB to 8KB might be
enough.

And also WAMR provides memory profiling feature, please refer to the following link for more details:
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/build_wamr.md#enable-memory-profiling-experiment

WebAssembly Micro Runtime (WAMR). Contribute to bytecodealliance/wasm-micro-runtime development by creating an account on GitHub.
WebAssembly Micro Runtime (WAMR). Contribute to bytecodealliance/wasm-micro-runtime development by creating an account on GitHub.

view this post on Zulip Ajay Chhokra (Dec 02 2021 at 17:54):

We (me and Chris Woods) are evaluating the real time capabilities of WAMR. we have performed some experiments on x86 and arm machines (raspberry pis) to measure latencies and jitter. The summary of the preliminary results are as follows:

1) Start up latency

-> Experiments performed on zbook intel core i7

-> Two benchmarking scenarios are considered (https://gist.github.com/chhokrad/91b2ffbbcb8e07e81e4877d7df8619c8). In the first scenario, the timestamp is taken before and after calling a web assembly function (void do_nothing()). The web assembly binary is generated by translating the c code (https://gist.github.com/chhokrad/7d2206d625132919b80f95810f7cf50c). The second scenario is similar to the first except, the second timestamp is taken by a native call from within the wasm ( see function void do_nothing_with_native() https://gist.github.com/chhokrad/7d2206d625132919b80f95810f7cf50c).

-> Results: Benchmark scenario 1 --> Median: 100.0 nanoseconds Average: 129.8 nanoseconds Standard Deviation: 318.5 nanoseconds

-> Results: Benchmark scenario 2--> Median: 100.0 nanoseconds Average: 52.1 nanoseconds Standard Deviation: 100.9 nanoseconds

2) GPIO access latency

-> Experiments performed on raspberry pi 4b with wiringPi 2.5.2

-> The benchmark application replicates the signal from an input pin to an output pin. The signal generated at the input pin of the raspberry pi is a square wave of fixed frequency (100 Hz) using an external source. The difference between the timestamps of the rising/ trailing edges of the waveform at two pins is considered as the hardware IO latency.

-> A total of 60 experiments were performed with varying scheduling priorities (non-real, 50, 70, 90) and each experiments lasts 0.3 secs generating 64 data points. (The sampling rate used for data acquisition is 32 nanoseconds).

-> Results: Median: 224 nanoseconds Average: 225 nanoseconds Standard Deviation: 82.5 nanoseconds

--> Observations: 1) Scheduling priority did not have much effect on the latency 2) On an average the wasm incurs 19% more latency as compared to executing the application natively.

GitHub Gist: instantly share code, notes, and snippets.
GitHub Gist: instantly share code, notes, and snippets.

view this post on Zulip Chris Woods (Dec 06 2021 at 17:36):

Wenyong Huang said:

Chris Woods Have you referred to the following document to build the wasm app so as to reduce the memory usage?
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/build_wasm_app.md#2-how-to-reduce-the-footprint
Normally in IoT system, as file io operation isn't needed, we can build the wasm app with '-nostdlib' mode and run it with WAMR's builtin libc, for example, use the following command to build wasm app:
/opt/wasi-sdk/bin/clang -O3 -z stack-size=4096 -Wl,--initial-memory=65536 \
-o test.wasm main.c \
-Wl,--export=__main_argc_argv -Wl,--export=main \
-Wl,--export=__heap_base,--export=__data_end \
-nostdlib -Wl,--no-entry -Wl,--strip-all -Wl,--allow-undefined
Here '-z stack-size=4096' is to specify the auxiliary stack size (auxiliary stack is part of linear memory), you can increase it if it is not enough.
And for iwasm, you can specify the heap size and stack size, with "iwasm --heap-size=n --stack-size=n".
Or specify them when calling wasm_runtime_instantiate:
wasm_module_inst_t
wasm_runtime_instantiate(const wasm_module_t module,
uint32_t stack_size, uint32_t heap_size,
char *error_buf, uint32_t error_buf_size);
By default they are both16KB, but some simple wasm apps, we can decrease them, for example, 4KB to 8KB might be
enough.

And also WAMR provides memory profiling feature, please refer to the following link for more details:
https://github.com/bytecodealliance/wasm-micro-runtime/blob/main/doc/build_wamr.md#enable-memory-profiling-experiment

That is awesome! thank you so much for the details !


Last updated: Dec 23 2024 at 12:05 UTC