I know I can use wasmtime compile -o expr2.cwasm expr2.wasm
, which will generate native machine code, however it is not a standalone binary that I can just run, it seems it still requires some kind of a runtime or other infrastructure to actually run.
How can I create a standalone native binary?
We use WASM as one of the backends in the LFortran/LPython compilers. And we wrote our own WASM->x64 binary generator that generates standalone binaries, here it is: https://github.com/lcompilers/lpython/blob/5b51c3cf8879bdd8f3c567ed38262e6b9ea55126/src/libasr/codegen/wasm_to_x64.cpp. It works great, for the subset that we currently support, and it generates a native binary that you just run.
However, instead of us maintaining such a WASM to binary compiler, is there some tool that can do this that we could just use and collaborate on?
I can at least confirm that Wasmtime doesn't support this feature. (it'd be neat to add though!) I'll let others chime in though if they know of tools to do this.
It would be nice if you could deserialize a module/component by mmaping from a text section.
Yeah, I was just going to suggest using include_bytes!("path/to/cwasm")
in a Rust program that embeds Wasmtime. You could do that today, but an mmap capability would make it more efficient.
Ideally, there would be a way to do it without requiring a Rust compiler or linker to be installed, but I don't know of any such tool. Certainly would be a fun thing to build, though.
if you want to take this to the limit I think it would be possible to create a very stripped down Wasmtime build which support purely a single instance being created. We could probably use ELF sections to reserve space for linear memory and everything. In that sense it'd be theoretically possible to have a tiny libwasmtime-exe.a
(or whatever) which is linked with a native linker to the *.cwasm
we produce and then there's a whole bunch of static calls between everything.
That all being said it would still require a native linker and support like wasmtime-wasi so there's still a mess of runtime bits involved, and producing this artifact would be somewhat nontrivial, so probably much better to instead have a program with wasmtime
-the-crate and the cwasm already mapped in in theory
For our compiler, we need to create native binaries. We are happy to create a standalone library out of our WASM->x64 backend, I think it doesn't depend on anything, which would allow us to collaborate on it with you. We still need to add Apple M1 support, right now we only support 64bit x86.
In the meantime, to get started, is there some example in Rust that uses the include_bytes!("path/to/cwasm")
trick? My understanding is that it would include the native code (cwasm
) and then somehow run it via the wasmtime rust library? So if I compile it with cargo/rust, it will create a standalone binary? That would indeed be a big step forward to what we need, and we might be able to take it and iterate further on it.
If you have a Rust compiler on-hand it's possible to construct a project that includes the *.cwasm
, excludes Cranelift, and includes Wasmtime, and runs the wasm binary. My guess though is what you're looking for is probably a precompiled *.a
which is linked with the *.cwasm
or similar to create an executable, and that hasn't been done yet.
My guess though is what you're looking for is probably a precompiled *.a which is linked with the *.cwasm or similar to create an executable, and that hasn't been done yet.
Yes, that's what I am looking for ultimately, but because it has not been done yet, the first approach is a good start:
If you have a Rust compiler on-hand it's possible to construct a project that includes the *.cwasm, excludes Cranelift, and includes Wasmtime, and runs the wasm binary.
Is there some example that does this?
I just started putting an example together. I have a meeting now, but I should be able to post a GitHub link in a couple of hours.
@Joel Dice awesome, thank you, much appreciated!
A thought: ELF allows for the concept of an "interpreter"; in the usual Linux use-case, a dynamically-linked binary names /lib64/ld-linux.so
or whatever (the dynamic linker) as its interpreter, and the kernel leaves all the processing of the actual program ELF to that. (One can actually do /lib64/ld-linux.so my-program
, i.e. invoke the dynamic linker as a command, and this will work.) Could we make the interpreter field in the .cwasm
(which is actually an ELF) be wasmtime itself, and make that work somehow?
Another variant of the idea could be to have a "CLI stub" that we prepend to the cwasm, and depend on a libwasmtime.so
oh man that'd be slick if you could ./foo.cwasm
I don't think you can use a binary as its own .interp
... :thinking:
You could point at a system e.g. /lib/wasmtime-ld.so
, but if you have that control you're probably better off using binfmt_misc
(on linux anyway)
ah, no, "binary as its own interpreter" isn't the suggestion, rather the .interp section in the .cwasm file
@Ondřej Čertík here's the example I threw together: https://github.com/dicej/cwasm-standalone. It uses Wasmtime 13's snapshot of WASI Preview 2, based on the component model. You could alternatively do the same thing using WASI Preview 1 with a cwasm based on a core module if you preferred. Happy to answer any questions about it.
Right but what would you put in the .cwasm's .interp
for wasmtime?
@Lann Martin the interp field in the .cwasm would be /usr/bin/wasmtime
or similar (perhaps libwasmtime.so
if it has to be a shared object, though I think PIE executables are also technically shared objects these days)
One annoying thing about using .interp
is that that would mean you don't get to use the usual ld-linux*.so, which means you can't easily dynamically link to any other libraries.
An alternative would be to just make .cwasm files depend on libwasmtime.so
as a normal dynamic library.
It sounds like the two major things we'd need to do that are (i) have a main() stub that we prepend, basically @Joel Dice 's thing above; and (ii) emit PLT relocs for libcalls?
@Joel Dice beautiful, thank you! That's exactly what I wanted. I tried it, it works. I then tried my own WASM file, so far I didn't succeed, here is exactly what I tried: https://github.com/dicej/cwasm-standalone/issues/1, if you see what I am doing wrong, let me know.
I assume the g2.wasm in your example is a core module rather than a component (i.e. a file conforming to https://github.com/WebAssembly/component-model)? They both use the .wasm file extension, but the latter is a superset of the former. My example involves pre-compiling a component at build time, and the runtime code expects a cwasm file containing a pre-compiled component. If it gets a core module, it won't accept it. If you change the runtime code to use the Wasmtime module API instead of the component API, you can make it run a core module instead.
When I get a chance, I can create a branch that uses WASI Preview 1 and a core module instead of a component.
@Joel Dice here is our code: https://github.com/dicej/cwasm-standalone/issues/1#issuecomment-1738103929
yeah, that's a core module
I am not well-versed in WASM, we just read the spec, produced the binary ourselves and then used wasmtime and nodejs in our tests, and we use WASI preview1. So we are using "core module" it seems. What should we be using?
Core module is fine -- it's just that I've been working on WASI Preview 2 and the component model lately, so that's what I defaulted to. I'll post a link to a version that uses core modules.
Very good, thank you.
@Joel Dice we use our custom WASM backend in order to have quick compilation and runtime at https://dev.lfortran.org/ and https://dev.lpython.org/. That would not be possible with using LLVM (too big and slow).
Separately we also want a direct x86 (and arm) binary generation, not using LLVM, for Debug builds which require very fast compilation. So we thought why not use our WASM backend and just write the WASM->binary part? So we did exactly that, so far it seems very simple and maintainable.
Are you saying that the "component" is a subset of the "core module"? So it seems we should be using a "component"? For our use case (see my last comment), our main goal is to piggyback (and be compatible) with the state-of-the-art and all the WASM tooling, that way we only have to maintain what is specific to our compilers, the rest we can maintain collaboratively with the WASM community, which seems like a win-win.
Components are a superset of core modules. Some relatively easy reading here: https://github.com/WebAssembly/component-model/tree/main/design/high-level
(less-easy reading elsewhere in that repo :smile:)
First: yes, I've read about LFortran and LPython and I think they're super exciting projects. I'm especially eager to see SciPy, OpenBLAS, etc. supported.
The component model can be considered a superset of the core WebAssembly specification. There's a tool called wit-component
(part of wasm-tools
) that will automatically convert from a WASI Preview 1 core module to a WASI Preview 2 component. For the time being, it's totally fine if your toolchain only targets WASI Preview 1, since you (or your users) can always use wit-component
to turn that Preview 1 module into a Preview 2 component. Eventually, you'll want to consider targetting Preview 2 and the component model directly, but it's not released yet, so no hurry :)
Ah, so let me install wit-component, and try it!
If you have Rust, you can cargo install wasm-tools
When using wit-component
, you'll need to specify which WASI adapter to use. You'll want to use https://github.com/bytecodealliance/wasmtime/releases/download/v13.0.0/wasi_snapshot_preview1.command.wasm in order to make the component compatible with Wasmtime 13, which is what my example above uses.
Since Preview 2 hasn't stabilized yet, it's still changing from one release to the next, so the adapter needs to match the host implementation in Wasmtime.
More introductory docs, for reference: https://component-model.bytecodealliance.org/
On it!
(Yes, we are making excellent progress with compiling SciPy, can now fully compile several of the Fortran modules via our LLVM backend, and all SciPy tests pass.)
Ok, I have .cargo/bin/wasm-tools
but not .cargo/bin/wit-component
ok, figured it out: https://crates.io/crates/wit-component, wasm-tools component new core.wasm -o component.wasm
The wit-component
gives me an error message: https://github.com/dicej/cwasm-standalone/issues/1#issuecomment-1738151203
That's what I was saying about the adapter above. Download that wasi_snapshot_preview1.command.wasm file I linked to and pass it as an --adapter
argument to wasm-tools component new
.
I'm almost finished with the module version of cwasm-standalone
, BTW.
The conversion worked, thanks!
The modules branch uses WASI Preview 1 modules: https://github.com/dicej/cwasm-standalone/tree/modules
Let me try it. I can't figure out how to compile the component to cwasm: https://github.com/dicej/cwasm-standalone/issues/1#issuecomment-1738177031
You'll probably need a build of Wasmtime with the "component-model" feature enabled (e.g. cargo install wasmtime-cli --features component-model
) in order to compile components to cwasm files.
I did.
hmm, maybe the CLI doesn't support it yet
Regarding your modules
branch, here are my results: https://github.com/dicej/cwasm-standalone/issues/2
ah, that should be easy to address. An exit status of zero should just be ignored. I'll take a look
The wasm module works in wasmtime, so it should be good.
Yeah, I just pushed a change. Please try the latest on the modules
branch.
checking...
It works!!
$ cargo run --release
Compiling cwasm-standalone v0.1.0 (/Users/ondrej/repos/cwasm-standalone)
Finished release [optimized] target(s) in 4.27s
Running `/Users/ondrej/repos/cwasm-standalone/target/release/cwasm-standalone`
25
Thank you so much for this @Joel Dice ! I think this is solid:
$ otool -L /Users/ondrej/repos/cwasm-standalone/target/release/cwasm-standalone
/Users/ondrej/repos/cwasm-standalone/target/release/cwasm-standalone:
/usr/lib/libiconv.2.dylib (compatibility version 7.0.0, current version 7.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1311.0.0)
and
$ ll -h /Users/ondrej/repos/cwasm-standalone/target/release/cwasm-standalone
-rwxr-xr-x 1 ondrej staff 4.6M Sep 27 16:42 /Users/ondrej/repos/cwasm-standalone/target/release/cwasm-standalone
I am on Apple M1 Max.
could try using strip
to get it a bit smaller, maybe
I think this is the first time that we created a binary out of LFortran on M1 without LLVM.
And without our C backend.
After strip
:
$ ll -h /Users/ondrej/repos/cwasm-standalone/target/release/cwasm-standalone
-rwxr-xr-x 1 ondrej staff 3.3M Sep 27 16:45 /Users/ondrej/repos/cwasm-standalone/target/release/cwasm-standalone
@Joel Dice one more question --- for Debug builds that I just want as fast as possible to compile (but it's ok if it runs slower), this technology seems perfect. But for Release builds, where I want maximum performance at runtime --- can WASM, once the tooling is fully developed, match the performance of LLVM?
Someone on the Cranelift team, e.g. @Chris Fallin would be a better person to ask. I imagine "it depends" :smile:
(I guess this cwasm-standalone is being compiled via rust, so it uses LLVM. However, the main code itself is just an ARM binary that wasmtime compiled from WASM, without any LLVM involved.)
right, it's the cwasm performance you care about, presumably
For Debug build I care about the performance of generating this small /Users/ondrej/repos/cwasm-standalone/target/release/cwasm-standalone
binary.
Right now it's slow, since rustc is involved, but that's only the first step (the initial prototype).
This is nice, I think we can start testing with this at our CI.
What is wasmtime doing in this cwasm "driver"? Is it just just supplying the WASI imports?
And providing some "main" stub?
Cranelift won't perform optimizatuons at this time such as loop unrolling, vectorization, or inlining, so it's expected that llvm binaries will generally be faster.
Currently a lot of wasms are such that they are produced by llvm and cranelift takes optimized wasm and generates good code, and there we aim to be on the same level as llvm (somewhat). Definitely cranelift aims to match the optimizing tiers of peer jits like spidermonkey and v8
@Alex Crichton awesome. LFortran will eventually be able to produce optimized WASM binary. We are implementing unrolling, vectorization and inlining right now.
So I think with cranelift this might be quite competitive!
can WASM, once the tooling is fully developed, match the performance of LLVM?
Within some margin, yeah -- there are two separate sources of the perf gap:
@Chris Fallin thanks! Why not optionally turn off sandboxing when performance is required?
The latter can be improved, the former will always be there as "the cost of sandboxing"; I've seen it estimated that the fundamental cost of that is something like 10-30%, as a ceiling to perf
What is wasmtime doing in this cwasm "driver"? Is it just just supplying the WASI imports?
Wasmtime is setting up the WASI environment and doing (hopefully minimal) preparation of the cwasm bytes for execution, setting up linear memory, and calling into the guest. It's doing more work than a normal executable since it's sandboxing the guest, limiting its access to the host.
We are planning to have two Release modes. One ReleaseSafe, which would correspond to WASM+sandboxing. And then ReleaseFast, where all bounds checking are turned off.
@Ondřej Čertík because that... wouldn't be Wasm anymore. It's not that there are separate checks we can turn off, it's that the basic building blocks require it: for example, Wasm provides a separate heap, with offsets starting at 0, and the cost we pay is adding the base pointer. Once we do that, the bounds-checking is "free" (virtual memory)
I see, right.
So sandboxing is in some sense fundamental to WASM.
yes, exactly
So WASM is ideal for our ReleaseSafe mode.
Joel Dice said:
What is wasmtime doing in this cwasm "driver"? Is it just just supplying the WASI imports?
Wasmtime is setting up the WASI environment and doing (hopefully minimal) preparation of the cwasm bytes for execution, setting up linear memory, and calling into the guest. It's doing more work than a normal executable since it's sandboxing the guest, limiting its access to the host.
How hard would it be to do exactly this work, but without having to call rustc all the time? So that LFortran can do this quickly?
I think that's what Alex and Chris were talking about above, e.g. adding stuff to the cwasm to make it a position-independent executable. Again, I'm probably not the one to ask, since they know more about that stuff than I.
We do that in fact today, via our custom WASM->x64 backend. But I am hoping to find some solution that we can be compatible with the current WASM tooling.
For example, I like that cwasm that wasmtime produces --- for example LFortran could produce it directly (for speed reasons), but we can still use wasmtime to check our work.
Then we can collaborate on the final stub, the current cwasm-standalone that you just created, but in some ways that doesn't require constant rustc recompilation.
If you're able to accept a precompiled *.a source file with source you control it should be not too hard to whip up an equivalent to what Joel has
We currently can't accept .a yet, but we will eventually.
Where your build process would create a cwasm, postprocess it slightly, and then invoke the native linker
For the fast compilation mode we use our own linker.
But I think this can be all solved.
Step by step. :)
As long as we are compatible at each step.
I think we will focus on cwasm and see if we can be compatible with wasmtime, and for now we can use the rustc workaround to produce the final executable.
I'm imagining that the easiest thing would basically be to precompile what Joel already wrote where the final step is taking the *.cwasm
and shoving it in at link-time somehow. Unsure how well that could fit into your linking process
Yes, exactly.
Essentially doing the linking of Joel's code and our generated binary ourselves.
Effectively we just need to control the include_bytes
macro ourselves, without invoking rustc. At the binary level once everything is compiled.
I am imagining that the wasmtime rust library just needs a pointer to it, loaded in memory?
So if we can create some C interface that accepts this pointer, that might be all that is needed. We'll use rustc to create such a "C" library and then LFortran just links everything together, no rustc involved.
Do you think that's possible?
What format would this C library have? A .a file? A .so file? Something else?
i.e. what format would LFortran expect?
I think .a would be the best for static linking, .so would also work. Right now we generate a static binary out of WASM, but I think we have to extend it anyway to be able to link .a and .so libraries.
I've seen it estimated that the fundamental cost of that is something like 10-30%, as a ceiling to perf
FWIW I think that might be too high an estimate. While I can't find a source for it rioght now, IIRC RLBox as used in Firefox has a single-digit % overhead
Ondřej Čertík said:
So sandboxing is in some sense fundamental to WASM.
Yeah, I'd emphasize this a bit. It simply isn't WebAssembly if it doesn't have the sandbox (though there are other requirements). So you can do this! And it could use the bytecode and the tooling you've discussed. It's possible. But you couldn't portray that as being "webassembly" -- no sandbox. It's really just native code processed through the spec in some sense, or it sure seems that way.... Still might be what you want to do technically, can't be sure.
Ondřej Čertík said:
I think .a would be the best for static linking, .so would also work. Right now we generate a static binary out of WASM, but I think we have to extend it anyway to be able to link .a and .so libraries.
Sure, if a .a works then we could definitely produce a libcwasm-standalone.a
that provides a C function which accepts a pointer to a cwasm (plus any environment variables, directory mounts, etc.) and runs it. Should be quite easy. Does LFortran have the ability to link .a files into a final executable now, or is that something you're planning to add eventually?
@Joel Dice perfect, let's go the .a
route. We use the system linker with our default LLVM backend, and we allow linking 3rd party libraries. Our WASM backend just produces the core module, and our WASM->x64 backend currently creates the executable directly, no linking. However, we need the ability to link 3rd party code anyway, so we'll have to implement it.
@Ralph we are considering at least three modes:
Right now we only have two modes, Debug and ReleaseFast, and we use LLVM for both.
WASM, with sandboxing, seems like a great fit for both Debug and ReleaseSafe modes. Based on the discussion above, it might do ok for ReleaseFast as well, but LLVM might end up being a better fit there.
@Joel Dice here I created a proof of concept how to create the .a
library: https://github.com/dicej/cwasm-standalone/pull/3, it works!! If you have suggestions how to improve it, please let me know.
Question: does it make sense for LFortran to generate .cwasm
directly? Is the interface API (conventions) how to call this binary relatively stable, or does it change with every wasmtime release? Is it documented somewhere?
If it changes too much, then perhaps we should collaborate at the .wasm
level, and we'll maintain our own binary generation, and we can check against the cwasm-standalone
above to ensure correctness, as well as against just wasmtime guest.wasm
.
.cwasm
only works with a single wasmtime version. It has a header with tge wasmtime version and the exact abi between compiled wasm modules and the runtime changes all the time.
Yes it would not make sense for LFortran to generate *.cwasm
directly, you'll want to generate *.wasm
if you can
@bjorn3, @Alex Crichton thank you. In that case, it seems the best way forward for us is to keep producing the .wasm
files as we do today, and then we'll maintain three paths forward:
.wasm
binary (and native binary) + all dependencies all work together well with regards to the standard WASM ecosystem.wasmtime a.wasm
; this is the most common/standard usage in WASM, ensuring that it works wellWe also provide a JS file that makes exactly the same wasm binary run in a browser, which we also want.
wasm2c
(WABT) can produce files that can be built standalone-ish, at least you will get a native object file, it has a runtime which you'd need to link against.
I saw a video showing that wasmer can take a wasm file and make a "native" executable out of it (like an .exe) by bundling the wasmer runtime into the executable, so that you don't need to have wasmer installed to run it.
Unfortunatelly, I cannot seem to find the video anymore or find any documentation on this.
Last updated: Dec 23 2024 at 13:07 UTC