Hello!
I am interested by loading and running in Java language a WebAssembly module created using the project wit-bindgen
. Actually I was looking for the Java bindings for this but I could not find any related project, or at least any equivalent project.
As mention in wasmtime.dev
, there are 2 options : the wasmtime-java
from kawamuray or the one from bluejekyll (and I already both thank them for their work). Be able to run a wit-bindgen WASM module
would allow us (and me :smile: ) to have a unique WASM module loaded by different languages (Python, Java, Javascript, C, etc.).
I took the option wasmtime-java
from bluejekyll (thank you @Benjamin Fry): it is WIP project that could interest a lots of people for sure. I tried to run a very basic wit-bindgen WASM
, I encounter 2 problems (those problems have been addressed to Benjamin and we agree to expose them here to find some help) :
The WASM module exports those 3 functions (through the Rust bindings generated with wit-bindgen
):
fn play_with_bytes(input: Vec<u8>) -> Vec<u8>;
fn play_with_result(input: Vec<u8>) -> Result<Vec<u8>, String>;
fn play_with_struct(abc: my::Abc) -> my::Abc;
wasmtime-java
, the WASM file needs to have the 2 exported functions __alloc_bytes
and __dealloc_bytes
). A quick workaround is to add those 2 functions in the WASM source code. Of course it would be better to use the existing wit-bindgen allocation functions but at least we can continue the tests.fn my_function(input: Vec<u8>) -> Vec<u8>
. I cannot find a way to load the return value. I confirm that I can "see" when debugging the expected vector in the WASM memory but I don't understand how to find the correct offset of this result in memory (and the related result size in bytes). The offset returned by the function call (which is correct) does not match the good offset of the result. In parallel, I run the same WASM module from python (with the python-bindings generate by wit-bindgen
) and we can compare the results: the output result is the same in both cases but shifting the memory to get the correct pointer of the expected ouptut does not work with wasmtime-java
.The following link is a very short fork of wasmtime-java
to include new tests with a wit-bindgen WASM module
: https://github.com/Manuthor/wasmtime-java/tree/feature/add_ref_return_value.
Any help will be greatly appreciated! :smile:
For some background. I started the wasmtime-java
project a while back and have been working on it pretty much solo. I have gotten a lot of help here, but I'm sure I've made some decisions that are not compliant or are not best practice. I've recently updated the project to the current wasmtime
version, but there are some issues in regards to the handling of returned complex types (like Vec/Arrays). I've only just started looking at the wit
ABI spec, but I think this will have a nice effect of normalizing a lot of this support.
A question I have in regards to the wit-bindgen
and wasm
is there a best practices spec for working with complex types in what is generated for those interfaces?
The intention with wit-bindgen
is that there would be a wasmtime
embedding in Java, but it wouldn't actually concern itself with anything interface-types related. For example the wasmtime-java
project wouldn't worry about vectors, abi, etc, or anything like that. It would instead be a pure binding of simply the C API in a Java-like fashion that feels idiomatic in Java.
On top of this hypothetical wasmtime-java
library the wit-bindgen
crate would then generate *.java
files (omg it's been so long since I wrote java, that's the source extension, right?) which would use wasmtime-java
and then use the interface types definitions of ABIs and such to translate back and forth between java-native representations and wasm representations. For example (if I recall the syntax correctly) a Vec<u8>
in the wasm module would be byte[]
in Java and the generated code by wit-bindgen
would do the translation
there's lots of details about how the precise ABI maps out (allocation functions, where to store return values, etc), but that's all theoretically handled by the code-generation-functionalities of wit-bindgen
(which currently doesn't have a Java mode or implementation)
Basically all that is to say that in the same way that wasmtime-py
knows nothing about interface types, yet wit-bindgen python
generates Python code, would be how wasmtime-java
would work. The wasmtime-java
library itself would have nothing interface-types-specific (yet) and then the generated code by wit-bindgen
would leverage wasmtime-java
and do the translation.
Thanks, @Alex Crichton. I think the confusing thing I have right now is return by-ref or multi-value return data (and subsequent ownership). I think the confusing thing I've always had with the idea that wasmtime-java
knows nothing about interface types is that it absolutely cares at the FFI boundary for complex types. For example, in the wasmtime-java
code, in order to support passing slices back to have, there's a complex pointer type that mimics a Rust slice passed by reference into the WASM, and then copied into the JVM as a byte[]
and the return by ref from WASM is then freed (via the __dealloc_bytes
). This has always felt wrong to me, and implied I've done something wrong, and it's the part of the ABI that I get confused in.
From wasmtime-java
's perspective there's not really any such thing as byte[]
because wasm has no notion of byte slices. In that sense wasmtime-java
knows nothing about either passing or returning byte slices in wasm functions
Once wit-bindgen
comes into the picture it uses the canonical abi for interface types to define how this happens, and the canonical abi is defined in terms of core wasm features, such as multiple return values at the wasm level and things like that
so wasmtime-java
needs to support calling arbitrary wasm functions with arbitrary values (in addition to modifying wasm memory)
but it doesn't need to support things like byte[]
in a sense wasmtime-java
is so simple that it won't be too useful to anyone, but it's a building block to build something more advanced if that makes sense
(in the same way that wasmtime-py
and the wasmtime
crate are quite "primitive")
Maybe I can share an example of the function call from/to Java to help clarify my point of concern, because I think we're not quite but almost talking past each other, here's a call from Java into the wasmtime runtime and into a WASM module with slices and returned data by ref: https://github.com/bluejekyll/wasmtime-java/blob/main/src/test/java/net/bluejekyll/wasmtime/tests/SliceTests.java#L115 then here is the other side of that in a Rust function with the C ABI for the support in WASM: https://github.com/bluejekyll/wasmtime-java/blob/main/tests/slices/src/lib.rs#L39-L43
ah yeah so these are details that wouldn't be implemented by wasmtime-java
so for example the Rust crate has no way of doing this
"wouldn't" be, I mean, they kinda are now :smile:
from Rust you can call a wasm function, but there's no concept of a byte slice in wasm, so you can't call wasm with a byte slice (or wasm can't return to you a byte slice)
well my point is that if you're baking in an ABI like this that's basically a design choice, and it's not that interface types is intending todo
there basically is no standard answer to the question "how do I get a slice of bytes from wasm" if all you can use is core wasm
once you involve interface types that question can be answered, however, but it's a distinct answer (and layered as such)
Am I asking the wrong question then? Maybe I should reask a question I had long ago... what is the best practice for passing a byte array into WASM and vis/versa returning one? Or better yet, what's the best practice for accessing that data if not by passing arrays?
in a sense this is kind of the wrong question , kind of isn't. Like this is a perfectly valid thing to ask since it's the first thing any embedded wasm module wants to do. The tl;dr is that this is a struggle and this can't be done in a standard way so everyone is left to their own devices. The longer answer is "interface types will solve this", which is what wit-bindgen
is. When wit-bindgen
gets a Java implementation it will not rely on what you have today to implement byte[]
but instead it will do its own thing, similar to what all other wit-bindgen
generation modes are doing
To answer "what's the best practice" it's "there is none"
which isn't a great answer, hence the focus on work on interface types
If I might jump in here, in a shared nothing model, which I understood interface types to be, does this imply copying the byte[]
?
this sounds a little bit like semantics though... i.e. if wit-bindgen
generates interop between byte[]
and the target function vs. wastime-java
doing that, right? i.e. something needs to translate between those points, right now I do a bunch of reflection in Java to build the bridging code between the two envs...
yeah it's true that precisely where implements what doesn't really matter, I don't mean to say one particular way is correct vs another
one of the focuses of interface types is cross-language interop, which means we can't focus really all that much on one particular binding, and so far I've been trying to do things that work everywhere
and @Scott Waye that's correct, for interface types the byte[]
type would be copied in/out of wasm, there's no concept of a "mutable slice buffer"
@Scott Waye for correctness reasons in the JVM there is byte[]
which must be copied to/from the WASM memory. There is ByteBuffer
which can reference data directly in WASM theoretically, but I haven't figured out a good ownership model for that, so stopped trying to support it.
It might be useless, and I expect it is, but I did start a c# wit-bindgen to see how hard it would be. Java I think has a lot of the same concerns. I did some of the string bit and left it at that as it looked solvable with just "some hours" more work. https://github.com/yowl/witx-bindgen/blob/csharp/crates/gen-csharp/src/lib.rs
Right, @Alex Crichton, and I still owe you a beer :smile:. But I guess my higher level question is, if we don't do this in wasmtime-java
then we'll do it in the generated code from wit-bindgen
which sounds like will be something similar to what we're already doing. :smile: And maybe I can work on the wit-bindgen
Java outputs, but I'd still (I think) need to support the calling convention in WASM.
that's true yeah and w/e shape of things you're doing is the exact same shape of what wit-bindgen would otherwise generate (aka the canonical abi), it's probably just a few minor differences in the details
Calling-convention-wise what wit-bindgen
(and interface types and/or the canonical abi) need is "call this wasm with these types and get these types as a result" where temporarily you know that the number of returned values is either 0 or 1
plus the ability to read/write bytes in memory
@Scott Waye , I think one thing you're lucky in CSharp with is that it has decent native type and FFI support with C, Java and JNI is bleh.
@Alex Crichton, yeah, I think I'm grokking you. I'm guessing there is still something that I'll need in Java though to allocate bytes for return by ref values, etc... right? today I do that in wasmtim-java
when doing the function calls into WASM. So there needs to be some amount of support in wasmtime-java
for that functionality, right? and after return freeing the associated memory?
I'll take a look at implementing the Java bindings to wit-bindgen
for java and see what needs to happen there.
you might want to look at the stuff generated by wit-bindgen
today, e.g. the wasmtime-py
python bindings
that may help show what wit-bindgen does and what's expected of the binding library right now
Yeah, I'll do that. I only perused it at a high-level so far. Thanks for all your answers and help (again).
@Benjamin Fry You _might_ find https://medium.com/@scottwaye/experiments-in-c-and-webassembly-interface-types-b7a3a85ce966 interesting, its how I spiked c# with the component ABI to get an idea about what would be required for wit-bindgen
Thanks, @Scott Waye , I'll look at that.
btw, @Manuthor , give me a moment to follow Alex's suggestions and see about working with the wit and generating some Java bindings, though, it might take a while.
Thank you @Alex Crichton , @Benjamin Fry and @Scott Waye for all your insights. Let me know Benjamin how to help you.
@Alex Crichton , I think I see the direction this is going in with your Python impl. It appears that you've exposed a huge number of primitives to the Python environment for working with Memory/Store and other contstructs in the wasmtime runtime, in order to allocate and deallocate on the WASM heap, etc. I was trying to avoid exposing the entire wastime surface area in this MVP, but looking at your code here it makes me realize that might not be possible. I still don't feel like trying to work with the C api in Java is the correct direction, because JNI is a little funky, but I at least better understand the air gap you have in mind between interface types and the wasmtime engine. So thank you for your answers to the questions yesterday.
@Manuthor, I think there is a lot of work to be done to start exposing all the layers from Wasmtime needed by the Interface Types implementations. Might mean a bunch of refactoring to make this work correctly.
@Benjamin Fry ah ok makes sense! I must admit I've never bound a C library in Java before so I'm quite ignorant as to the difficulties and the nuances there as well
what works for python may not work well for java for sure
@Alex Crichton , I've been reviewing the details in the wit-bindgen output. I've found the rust generated code to be the most useful for understanding wit-bindgen. First, I really like the use of a static for the return pointer, I think this removes the need for the allocation I'm doing in the Java for the trampoline between wasm and the JVM. But I'm confused about something... and that's the difference in the way these bindings are generated in Rust for the export vs. the import.
Example, for the wit list-return: function() -> list<u32>
, the generated import Rust C FFI is fn wit_import(_: i32);
which I export (return by ref) while the generated export C FFI is unsafe extern "C" fn __wit_bindgen_list_return() -> i32
, which is returning the pointer directly from the fn.
Either works, but I don't quite understand the calling convention in that case? Am I understanding this correctly?
Benjamin Fry said:
Example, for the wit
list-return: function() -> list<u32>
, the generated import Rust C FFI isfn wit_import(_: i32);
which I export (return by ref) while the generated export C FFI isunsafe extern "C" fn __wit_bindgen_list_return() -> i32
, which is returning the pointer directly from the fn.
"...which I export..." => ...which I did expect...
@Benjamin Fry oh that's where the import and export ABI of the same-signatured-function isn't the same
there's always going to be an adapter between the two anyway which will translate the ABI as well
Alex Crichton said:
Benjamin Fry oh that's where the import and export ABI of the same-signatured-function isn't the same
gah, hit enter too soon. Doesn't that mean that there isn't a consistent FFI calling convention for these types?
I guess I would expect the signature to be the same for the FFI so that all the language wrappers would work in a consistent way, regardless of import or export...
For example, with the current ABI, doesn't this imply that if there are two Rust WASM modules, one with a wit-bindgen export and the other with the same wit-bindgen import, then those two modules wouldn't be able to be linked at runtime? This is where I'm currently confused.
Sorry for reaching you, but where can I find a simple "hello world" Cranelift example, that just builds a simple (adder?) function, JIT compile it and calls it. The simplest example I could find is the "toy" which has an entire programming language and a lot of extra stuff. Really hard to get started on Cranelift right now.
That's correct, yeah, the ABI for each type differs depending on whether it's used in an import or an export. For most types it's the same but some slightly differ. The function signatures won't line up exactly
@Victor Maia I think you may want to start a new "topic" in the #cranelift channel for your question?
My bad, I never used Zulip. Will do. Can I delete the previous message?
I'm not actually sure! But no worries!
Alex Crichton said:
That's correct, yeah, the ABI for each type differs depending on whether it's used in an import or an export. For most types it's the same but some slightly differ. The function signatures won't line up exactly
I'm definitely missing something here as I can't figure out how this won't be a problem. But I'll roll with it as best I can :smile:
imports and exports are never glued directly to each other, there is always an adapter in between (which will eventually be customizable via adapter functions in interface types)
there is always an adapter in between
Is there an example of the adapter that I can see?
And will that need to be handled by the host language? or do we anticipate the adapter being built into wasmtime?
And thank you, I've been trying to read all the docs, it's just been hard to keep up with everything.
There's a proof-of-concept "linker" (really just an wasm adapter generator) implementation here, but I fully expect Wasmtime to eventually support generating canonical adapters for components or perhaps other tooling that can be used to compose together different components ahead-of-time
whereas wasmlink
generates a new wasm module with the adapter glue code inside it, the wasmtime bindings generator can be used to implement a custom host that generates similar "adapter glue" (at runtime) that sits between the host and the module (may be easier to see with the online demo than the wasm generated by wasmlink)
if it helps to clarify, adapter implementation is outside the scope of the language-specific wit bindings generators; they pass pointers and get back pointers into their own linear memory and it's their job to translate from/to the canonical ABI to/from whatever the implementation language uses for type representation. the adapter that ultimately sits between the caller and callee of an interface function is responsible for marshaling data (when necessary) between the linear memories, as well as ensuring parameters and results properly conform to the canonical ABI (e.g. a handle to a resource is valid prior to invoking the callee).
Thank you. I think this is the missing piece in my understanding.
This is nowhere near ready, but I've got at least got imports started. I have to do a lot of conversion. https://github.com/bluejekyll/wit-bindgen/tree/wasmtime-java
@Manuthor , thanks for sharing this. I've gotten pinged by some other folks with interest here as well. I've directed them to come back into this conversation to see if we can get more support here.
@Benjamin Fry , good idea! I must admit, due to WASM performances (being 6x times slower than native ELF binary in my particular case) I had to change my implementation and finally use FFI/JNA in Java. But still, having WASM working natively in Java would be awesome.
Last updated: Jan 24 2025 at 00:11 UTC