Stream: general

Topic: High Overhead for Wasm Import Call Despite AOT Compilation


view this post on Zulip Nicholas Renner (Jul 29 2025 at 02:20):

Hi everyone,

I’m currently benchmarking a host function modeled after the __imported_wasi_snapshot_preview1 template and observing ~2 µs per import call from Wasm, which feels excessively slow. Meanwhile, a Wasmtime PR claims this overhead should be closer to 10 ns: https://github.com/bytecodealliance/wasmtime/pull/6262. Has anyone seen this level of performance in practice?

We are ahead-of-time compiling the Wasm module and would appreciate any insight into what might cause this kind of discrepancy or how import call overhead is typically minimized when using Wasmtime with __imported_wasi_snapshot_preview1-style imports. Thanks!

This commit splits VMCallerCheckedFuncRef::func_ptr into three new function pointers: VMCallerCheckedFuncRef::{wasm,array,native}_call. Each one has a dedicated calling convention, so callers just ...

view this post on Zulip Pat Hickey (Jul 29 2025 at 04:51):

can you share something where we can reproduce your measurements?

view this post on Zulip Alex Crichton (Jul 29 2025 at 14:48):

Are you using WAMR, not Wasmtime? Wasmtime doesn't have any mention of __imported_* but searching the WAMR repository it looks like it's there. If so you might want to raise this with WAMR developers.

view this post on Zulip Nicholas Renner (Jul 29 2025 at 17:32):

@Alex: To clarify — we’re using WASI-SDK with Wasmtime, not WAMR. The __imported_wasi_snapshot_preview1_* naming came from WASI-SDK’s import declarations, and we modeled our own import on that pattern. For example:

int __imported_wasi_snapshot_preview1_lind_syscall(
    unsigned int callnumber,
    unsigned long long callname,
    unsigned long long arg1,
    unsigned long long arg2,
    unsigned long long arg3,
    unsigned long long arg4,
    unsigned long long arg5,
    unsigned long long arg6)
  __attribute__((__import_module__("lind"),
                 __import_name__("lind-syscall")));

I did forget to mention that we're calling from C code, perhaps that adds to overhead?

If theres a good example of how to do this from C in a wasmtime specific way that would be awesome to be pointed to. @Pat Hickey I can try that out and report back on what overhead I see.

Thanks for your help all!

view this post on Zulip Pat Hickey (Jul 29 2025 at 17:33):

yes, using wasmtime's C API adds overhead compared to the rust API, and the results described in the linked PR wont apply to use of the C API

view this post on Zulip Pat Hickey (Jul 29 2025 at 17:39):

I still dont really understand what you mean by the wasi-sdk import declarations style. wasi-sdk contains wasi-libc which describes to guest wasm how to call wasm import functions. the implementation of those import functons in the host is using wasmtime to resolve import functions to native code, using wasmtime's linker. the C api to wasmtime's linker has the prototypes given here https://docs.wasmtime.dev/c-api/linker_8h_source.html . Nobody has taken the time to write up comprehensive docs for the c api, so you're going to have to read the source code (e.g. https://github.com/bytecodealliance/wasmtime/blob/main/crates/c-api/src/func.rs) and then the docs of the wasmtime Rust apis that calls (e.g. https://docs.rs/wasmtime/latest/wasmtime/struct.Func.html#method.call_unchecked)

A lightweight WebAssembly runtime that is fast, secure, and standards-compliant - bytecodealliance/wasmtime

view this post on Zulip Pat Hickey (Jul 29 2025 at 17:41):

in particular if you need lower overhead on func calls then you need to use the unchecked variants. those are tricky to use correctly in C, because the type system cant do any work for you, but that should get you just about the same performance as the typed apis get you in Rust (where the type system provides assurances that the unchecked call is correct)

view this post on Zulip Nicholas Renner (Jul 29 2025 at 17:55):

Okay this is super helpful thank you. Sorry about my weird nomenclature, kind of fumbling around here but I don't think that matters.

I'll read through the links you sent and try out a minimal example.

view this post on Zulip Pat Hickey (Jul 29 2025 at 17:59):

thats fine, im being particular because the details here are subtle and may matter.

view this post on Zulip Alex Crichton (Jul 29 2025 at 18:01):

For the lowest overhead in the C API you'll want to use the *_unchecked APIs such as wasmtime_linker_define_func_unchecked

view this post on Zulip Nicholas Renner (Jul 31 2025 at 19:19):

Hi all,

Thank you for all your help and pointing me in the right direction. Once I understood how the linker works I realized that the trampoline is costing close to whats reported and the overhead I was seeing was from marshalling a somewhat large data struct. Sorry for my confusion and thanks again!


Last updated: Dec 06 2025 at 05:03 UTC