wasi-preview1-component-adapter currently results in a 78K wasm file. While pretty small already, it would be nice to get it even smaller. Some (relatively) easy wins may be:
trapping_unwrap
by replacing unreachable!()
with a direct trap instruction.unreachable!()
inside exports like sock_recv
with calls to a single function which prints to stderr using raw syscalls followed by a direct trap to skip the panicking and formatting machinery and to deduplicate everything else.cc @Yoshua Wuyts to increase the gap between native and wasip2 for your TCP echo server even more.
I've been thinking about adapter size recently; having a standard approach to trapping with some output without std::fmt
would be really helpful.
Would it make sense to have an e.g. wasi-nostd-helpers
crate or something?
Re using bulk-memory: RUSTFLAGS="-Ctarget-feature=+bulk-memory"
saves 1481 bytes total (79409 -> 77928).
I think these would all be quite reasonable to implement, even bulk-memory is so widespread nowadays I don't think there's any particular reason to leave it turned off
Replacing the unreachable!()
in trapping_unwrap
with core::arch::wasm32::unreachable()
reduces the size by another 2925 bytes (77928 -> 75003).
Enabling LTO another 2247 bytes (75003 -> 72756).
I just noticed that unreachable!()
is already not the libstd version, but one provided by the component adapter itself, so all the wins for the unreachable!()
replacement change are likely just caused by skipping the pretty failure message using eprint!()
.
Most of the unreachable!()
macro can probably be moved to a new function to deduplicate the code between call sites.
We could still have pretty output without std
I think since wasi stderr doesn't require a lock?
Also the eprint!("unreachable executed at adapter line "); crate::macros::eprint_u32(line!());
can be replaced with eprint!(concat!("unreachable executed at adapter line ", line!()))
to remove the eprint_u32
function.
Lann Martin said:
We could still have pretty output without
std
I think since wasi stderr doesn't require a lock?
Turns out that is exactly what is done already. It is just that most of this code is duplicated at each unreachable!()
call site.
Oh sure enough...I didn't scroll up :sweat_smile:
Just bulk-memory + LTO saves 4573 bytes.
@bjorn3 out of curiosity: does this save anything on the base binary too - or just on the adapter?
beware, many of the macro (and other) shenanigans in the adapter are done the not obvious or idiomatic way in order to dance around llvm creating anything that ends up in the data section
One thing I'll note as well is that the adapter is automatically GC'd, e.g. it exports every single wasip1 function but most modules don't import all of them. The wit-component
adapter process will remove all exports that aren't needed and then GC the wasm module itself, so the full size of the adapter is unlikely to go into a final component. Either that or if the importing component reduces its imports as well that's a way to shrink the adapter.
One thing that might be worth testing is that LLVM is known to produce pretty suboptimal binaries size-wise and running through wasm-opt
can probably shave off 30-40%
bjorn3 said:
cc Yoshua Wuyts to increase the gap between native and wasip2 for your TCP echo server even more.
by the way, for context on this - here are the numbers I found the other day
The results: async-std comes in at half a Megabyte for the echo server. WASI 0.2 comes in at just 100 Kilobytes. And in even better news: it currently still uses a WASI 0.1 adapter that weighs 80 Kilobytes. WASI binaries are small.
if you want to optimize the adapter down to 0, there is the remaining work in wasi-libc to use p2 for filesystem and everything else. the p2 support in there is right now limited to sockets
that would additionally unlock using rust, c, c++ to target the new single-module representation of a component that luke has been presenting
which saves even more bytes by not encoding any of the component type information
Opened https://github.com/bytecodealliance/wasmtime/pull/8858 for LTO + bulk-memory
Managed to save another 21154 bytes (75029 -> 53875) by changing the unreachable!() and assert!() macros. This is without losing any information that may be useful for debugging, but with a slight tweak to the assertion failure message from "unreachable executed at adapter line ...: assertion failure" to "assertion failed at adapter line ...", which is slightly shorter. This tweak is only a small part of the saved bytes, but I figured I did still make it.
wow that's a huge reduction of 30%?!
Yeah!
https://github.com/bytecodealliance/wasmtime/pull/8859
Got another easy 1.6k win, will push in a bit.
Pushed
awesome!
For reference removing all unreachable and assertion message printing brings the size down to 47162 bytes, which is only 5050 bytes less than with wasmtime#8859. IMHO winning 5050 bytes is not enough to justify making it harder to find the root cause when someone reports an issue caused by the adapter.
Pat Hickey said:
that would additionally unlock using rust, c, c++ to target the new single-module representation of a component that luke has been presenting
I hadn't heard of that before and I'm intrigued - could you please point me to an explanation of the proposal? Thanks :)
So that would work in the browser with a shim similar to browser_wasi_shim without requiring a wasm component parser like jco?
presumably
That would be great!
Somewhat relatedly, is there any plan to write a tool going from a wasm component to a (possibly multi-memory) core wasm module with the same kind of custom section as would be used for building a wasm component from a core wasm module?
that could be done, yes. i think when we last discussed it in early 2023, the reason it wasnt pursued was that multi memory hadnt shipped enough places to make it worthwhile
its still behind a flag in chrome
at any rate, since jco provides something equivalent implemented in JS rather than multi-memory wasm, id consider it just an implementation detail that jco could use to optimize binary size
Last updated: Nov 22 2024 at 17:03 UTC