hi... can someone shade some light one these queries?
See https://github.com/bytecodealliance/wasmtime/pull/3319#issuecomment-916236710 for benchmarks of the overhead of calling between wasm code and the host.
For reducing the startup time you can precompile the wasm using wasmtime compile
We've done some WAMR profiling, and memory wise, we got WAMR + zephyr rtos running on STM microcontroller with 340kb of RAM, the runtime appeared to take roughly ~120kb of RAM, the Wasm application's we were running (c based) were between 3kb - 5kb of RAM.
We've done some more extensive testing recently on real-time performance, and I'd be happy to share what we have.
@Chris Woods In our test with WAMR, we can manage to run a hello-world wasm module under 5KB RAM (given runtime binary directly run from flash). We can follow up on the details.
@Wang Xin That's awesome. I believe, in our tests, we struggled with an STM with less ram - at the time we just assumed that was due to the RTOS + runtime overhead. I don't think we measured the runtime only overhead.
@Chris Woods Have you referred to the following document to build the wasm app so as to reduce the memory usage?
Normally in IoT system, as file io operation isn't needed, we can build the wasm app with '-nostdlib' mode and run it with WAMR's builtin libc, for example, use the following command to build wasm app:
/opt/wasi-sdk/bin/clang -O3 -z stack-size=4096 -Wl,--initial-memory=65536 \
-o test.wasm main.c \
-Wl,--export=__main_argc_argv -Wl,--export=main \
-Wl,--export=__heap_base,--export=__data_end \
-nostdlib -Wl,--no-entry -Wl,--strip-all -Wl,--allow-undefined
Here '-z stack-size=4096' is to specify the auxiliary stack size (auxiliary stack is part of linear memory), you can increase it if it is not enough.
And for iwasm, you can specify the heap size and stack size, with "iwasm --heap-size=n --stack-size=n".
Or specify them when calling wasm_runtime_instantiate:
wasm_runtime_instantiate(const wasm_module_t module,
uint32_t stack_size, uint32_t heap_size,
char *error_buf, uint32_t error_buf_size);
By default they are both16KB, but some simple wasm apps, we can decrease them, for example, 4KB to 8KB might be
And also WAMR provides memory profiling feature, please refer to the following link for more details:
We (me and Chris Woods) are evaluating the real time capabilities of WAMR. we have performed some experiments on x86 and arm machines (raspberry pis) to measure latencies and jitter. The summary of the preliminary results are as follows:
1) Start up latency
-> Experiments performed on zbook intel core i7
-> Two benchmarking scenarios are considered (https://gist.github.com/chhokrad/91b2ffbbcb8e07e81e4877d7df8619c8). In the first scenario, the timestamp is taken before and after calling a web assembly function (void do_nothing()). The web assembly binary is generated by translating the c code (https://gist.github.com/chhokrad/7d2206d625132919b80f95810f7cf50c). The second scenario is similar to the first except, the second timestamp is taken by a native call from within the wasm ( see function void do_nothing_with_native() https://gist.github.com/chhokrad/7d2206d625132919b80f95810f7cf50c).
-> Results: Benchmark scenario 1 --> Median: 100.0 nanoseconds Average: 129.8 nanoseconds Standard Deviation: 318.5 nanoseconds
-> Results: Benchmark scenario 2--> Median: 100.0 nanoseconds Average: 52.1 nanoseconds Standard Deviation: 100.9 nanoseconds
2) GPIO access latency
-> Experiments performed on raspberry pi 4b with wiringPi 2.5.2
-> The benchmark application replicates the signal from an input pin to an output pin. The signal generated at the input pin of the raspberry pi is a square wave of fixed frequency (100 Hz) using an external source. The difference between the timestamps of the rising/ trailing edges of the waveform at two pins is considered as the hardware IO latency.
-> A total of 60 experiments were performed with varying scheduling priorities (non-real, 50, 70, 90) and each experiments lasts 0.3 secs generating 64 data points. (The sampling rate used for data acquisition is 32 nanoseconds).
-> Results: Median: 224 nanoseconds Average: 225 nanoseconds Standard Deviation: 82.5 nanoseconds
--> Observations: 1) Scheduling priority did not have much effect on the latency 2) On an average the wasm incurs 19% more latency as compared to executing the application natively.
Wenyong Huang said:
Chris Woods Have you referred to the following document to build the wasm app so as to reduce the memory usage?
Normally in IoT system, as file io operation isn't needed, we can build the wasm app with '-nostdlib' mode and run it with WAMR's builtin libc, for example, use the following command to build wasm app:
/opt/wasi-sdk/bin/clang -O3 -z stack-size=4096 -Wl,--initial-memory=65536 \
-o test.wasm main.c \
-Wl,--export=__main_argc_argv -Wl,--export=main \
-Wl,--export=__heap_base,--export=__data_end \
-nostdlib -Wl,--no-entry -Wl,--strip-all -Wl,--allow-undefined
Here '-z stack-size=4096' is to specify the auxiliary stack size (auxiliary stack is part of linear memory), you can increase it if it is not enough.
And for iwasm, you can specify the heap size and stack size, with "iwasm --heap-size=n --stack-size=n".
Or specify them when calling wasm_runtime_instantiate:
wasm_runtime_instantiate(const wasm_module_t module,
uint32_t stack_size, uint32_t heap_size,
char *error_buf, uint32_t error_buf_size);
By default they are both16KB, but some simple wasm apps, we can decrease them, for example, 4KB to 8KB might be
enough.And also WAMR provides memory profiling feature, please refer to the following link for more details:
That is awesome! thank you so much for the details !
Last updated: Feb 28 2025 at 02:27 UTC