I'm missing wasmtime_tls_{get,set}
when linking for thumbv7em-none-eabi
. Am I supposed to implement them like the min-platform
example? Here's the code: https://github.com/google/wasefire/compare/main...ia0:wasefire:pulley (single core where wasmtime is not accessed from interrupt handlers)
Context: I'm trying to use Pulley on embedded platforms after the recent progress on https://github.com/bytecodealliance/wasmtime/issues/7311
Follow-up question: When I implement those functions myself, I get a panic at src/runtime/vm/mmap_vec.rs:72
"Allocation of MmapVec storage failed" trying to allocate 264912 bytes, which is too much. Is there a way to control this particular allocation to be below 128K?
Thanks!
For the last question, it seems that it's the pulley bytecode which is huge (264920 bytes versus 7771 for the wasm module, i.e. 34x bigger). It seems the pulley bytecode is a full ELF file. Is there a way to have a binary format on par with WASM bytecode? The pulley module will be written to flash which is also a limited resource, also I'm surprised it needs to be loaded to RAM at runtime. The pulley interpreter needs to modify it? Can't it read it from the provided read-only slice?
(seems like you can't edit your messages, or I fail to find how) The code was merged (and branch deleted), so here it is now: https://github.com/google/wasefire/pull/753
Am I supposed to implement them like the
min-platform
example?
Yes, there's some more documentation here on that but the tl;dr is that you need to implement this header file (and that's released per-version of Wasmtime)
Is there a way to control this particular allocation to be below 128K?
Yes and no. If the wasm module itself is asking for more than 128K of memory, for example if it's initial linear memory is 3+ pages, then there's nothing that can be done from the embedder about that. You'll instead need to build the wasm module differently such that it requires 2 or fewer pages. In the future the custom-page-sizes proposal should help this but that's not integrated into toolchains yet (although I think it will be soon-ish)
Otherwise you can also explore various configuration options such as Config::memory_reservation_for_growth
where the defaults may not be suitable for your embedding (for example you might want to set that to zero)
it seems that it's the pulley bytecode which is huge
My guess is that a large part of this is padding with zeros. We emit object files that are suitable for mmap-ing to virtual memory to assist with copy-on-write initialization. For Pulley we don't know the target platform so we conservatively assume the target page size is 64k (which corresponds to some arm64 platforms) which greatly increases the size of the ELF output file. This naturally doesn't make sense in an embedded context, however, and such padding shouldn't be present at all. Basically one big optimization here is going to be plumbing the config knob for "CoW is disabled, don't align things".
Otherwise Pulley doesn't modify bytecode, it's possible to read it all from a read-only slice provided im a rom. The missing piece here is that internally Wasmtime needs to expose an API to read this from an external location rather than trying to copy it around.
Basically I think the issues you're seeing here should be fixable but will likely require modifications to Wasmtime. We'd be more than happy to help guide such changes and review, too!
those would be cool wasmtime advances, frankly.....
Thanks for the answers!
(memory $0 1)
and never grows, so that's not the issue. I used memory_reservation(64*1024)
and memory_reservation_for_growth(0)
to confirm and I still get the 262K allocation, which I really think is copying the ELF to RAM.Ah ok yeah 262k tracks with the cwasm you're loading so that makes sense. Agreed that's probably what's happening here. And no worries on contributing, I'll take some time soon to file issues for these improvements regardless.
For binary size there's notes here on various rust compile flags you can use to build a minimally-sized binary. It's one where we've tried to optimize for size/dependencies in Wasmtime but we got to a point where we can't reasonably push it further without something concrete to work towards. If you're able to have a standalone "example embedding" that would be extremely helpful for us to have something to target (e.g. "this crate" should compile down to something smaller than XX kilobytes or something like that)
Also for binary size even ballpark numbers would be super helpful. For example if you're aiming for 10k vs aiming for 1M we don't have many existing users with constraints like that so we've just been shooting in the dark historically
Actually for binary size, I forgot to subtract the embedded ELF, so using Wasmtime is only 120KiB bigger (making the final binary 2 to 3 times bigger depending on features) instead of the 380KiB with ELF. So it's not as bad as I initially thought (in particular because I expect the perf to be around 100x if not more). To give an order of magnitude for numbers:
Oh wow those are fantastic numbers, thank you for those!
When I've looked at Wasmtime historically Module::deserialize
was a huge portion of the size, specifically the serde deserialization and validation that happens of the loaded module/Config
. My guess is that it wouldn't be too too hard to optimize all that and cut it down to a much smaller size, probably shaving off ~20k at least from the base. That being said I haven't done any sort of rigorous testing in the past.
I realize that this may be a bit of a stretch, but if you're able to describe what your embedding does (or even better have a fork/project that can be built) and/or describe what the wasm is doing (or even better share a sample wasm) that'd be awesome. I'd love to take some time into the future myself to dig in and see what can't be improved (in addition to the cwasm improvements above)
Ok I've filed https://github.com/bytecodealliance/wasmtime/issues/10244 and https://github.com/bytecodealliance/wasmtime/issues/10245 for follow-ups on this
Thanks, I've added some background in https://github.com/google/wasefire/issues/458 and answered your last questions there. The next update from my side might take some time (on vacations with kid).
I've opened https://github.com/bytecodealliance/wasmtime/pull/10285 for #10244 and plan to take a look at #10245, will update if it looks like I'll be landing an impl for that as well.
Last updated: Feb 27 2025 at 22:03 UTC