Stream: general

Topic: [Java/JNI] Load/Run WASM module created with wit-bindgen


view this post on Zulip Manuthor (Jan 04 2022 at 20:39):

Hello!

I am interested by loading and running in Java language a WebAssembly module created using the project wit-bindgen. Actually I was looking for the Java bindings for this but I could not find any related project, or at least any equivalent project.

As mention in wasmtime.dev, there are 2 options : the wasmtime-java from kawamuray or the one from bluejekyll (and I already both thank them for their work). Be able to run a wit-bindgen WASM module would allow us (and me :smile: ) to have a unique WASM module loaded by different languages (Python, Java, Javascript, C, etc.).

I took the option wasmtime-java from bluejekyll (thank you @Benjamin Fry): it is WIP project that could interest a lots of people for sure. I tried to run a very basic wit-bindgen WASM, I encounter 2 problems (those problems have been addressed to Benjamin and we agree to expose them here to find some help) :

The WASM module exports those 3 functions (through the Rust bindings generated with wit-bindgen):

    fn play_with_bytes(input: Vec<u8>) -> Vec<u8>;
    fn play_with_result(input: Vec<u8>) -> Result<Vec<u8>, String>;
    fn play_with_struct(abc: my::Abc) -> my::Abc;

The following link is a very short fork of wasmtime-java to include new tests with a wit-bindgen WASM module : https://github.com/Manuthor/wasmtime-java/tree/feature/add_ref_return_value.
Any help will be greatly appreciated! :smile:

Wasmtime bindings for Java. Contribute to Manuthor/wasmtime-java development by creating an account on GitHub.

view this post on Zulip Benjamin Fry (Jan 04 2022 at 20:46):

For some background. I started the wasmtime-java project a while back and have been working on it pretty much solo. I have gotten a lot of help here, but I'm sure I've made some decisions that are not compliant or are not best practice. I've recently updated the project to the current wasmtime version, but there are some issues in regards to the handling of returned complex types (like Vec/Arrays). I've only just started looking at the wit ABI spec, but I think this will have a nice effect of normalizing a lot of this support.

view this post on Zulip Benjamin Fry (Jan 04 2022 at 20:48):

A question I have in regards to the wit-bindgen and wasm is there a best practices spec for working with complex types in what is generated for those interfaces?

view this post on Zulip Alex Crichton (Jan 04 2022 at 20:56):

The intention with wit-bindgen is that there would be a wasmtime embedding in Java, but it wouldn't actually concern itself with anything interface-types related. For example the wasmtime-java project wouldn't worry about vectors, abi, etc, or anything like that. It would instead be a pure binding of simply the C API in a Java-like fashion that feels idiomatic in Java.

On top of this hypothetical wasmtime-java library the wit-bindgen crate would then generate *.java files (omg it's been so long since I wrote java, that's the source extension, right?) which would use wasmtime-java and then use the interface types definitions of ABIs and such to translate back and forth between java-native representations and wasm representations. For example (if I recall the syntax correctly) a Vec<u8> in the wasm module would be byte[] in Java and the generated code by wit-bindgen would do the translation

view this post on Zulip Alex Crichton (Jan 04 2022 at 20:56):

there's lots of details about how the precise ABI maps out (allocation functions, where to store return values, etc), but that's all theoretically handled by the code-generation-functionalities of wit-bindgen (which currently doesn't have a Java mode or implementation)

view this post on Zulip Alex Crichton (Jan 04 2022 at 20:58):

Basically all that is to say that in the same way that wasmtime-py knows nothing about interface types, yet wit-bindgen python generates Python code, would be how wasmtime-java would work. The wasmtime-java library itself would have nothing interface-types-specific (yet) and then the generated code by wit-bindgen would leverage wasmtime-java and do the translation.

view this post on Zulip Benjamin Fry (Jan 04 2022 at 21:06):

Thanks, @Alex Crichton. I think the confusing thing I have right now is return by-ref or multi-value return data (and subsequent ownership). I think the confusing thing I've always had with the idea that wasmtime-java knows nothing about interface types is that it absolutely cares at the FFI boundary for complex types. For example, in the wasmtime-java code, in order to support passing slices back to have, there's a complex pointer type that mimics a Rust slice passed by reference into the WASM, and then copied into the JVM as a byte[] and the return by ref from WASM is then freed (via the __dealloc_bytes). This has always felt wrong to me, and implied I've done something wrong, and it's the part of the ABI that I get confused in.

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:07):

From wasmtime-java's perspective there's not really any such thing as byte[] because wasm has no notion of byte slices. In that sense wasmtime-java knows nothing about either passing or returning byte slices in wasm functions

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:08):

Once wit-bindgen comes into the picture it uses the canonical abi for interface types to define how this happens, and the canonical abi is defined in terms of core wasm features, such as multiple return values at the wasm level and things like that

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:08):

so wasmtime-java needs to support calling arbitrary wasm functions with arbitrary values (in addition to modifying wasm memory)

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:08):

but it doesn't need to support things like byte[]

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:09):

in a sense wasmtime-java is so simple that it won't be too useful to anyone, but it's a building block to build something more advanced if that makes sense

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:09):

(in the same way that wasmtime-py and the wasmtime crate are quite "primitive")

view this post on Zulip Benjamin Fry (Jan 04 2022 at 21:15):

Maybe I can share an example of the function call from/to Java to help clarify my point of concern, because I think we're not quite but almost talking past each other, here's a call from Java into the wasmtime runtime and into a WASM module with slices and returned data by ref: https://github.com/bluejekyll/wasmtime-java/blob/main/src/test/java/net/bluejekyll/wasmtime/tests/SliceTests.java#L115 then here is the other side of that in a Rust function with the C ABI for the support in WASM: https://github.com/bluejekyll/wasmtime-java/blob/main/tests/slices/src/lib.rs#L39-L43

Wasmtime bindings for Java. Contribute to bluejekyll/wasmtime-java development by creating an account on GitHub.
Wasmtime bindings for Java. Contribute to bluejekyll/wasmtime-java development by creating an account on GitHub.

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:16):

ah yeah so these are details that wouldn't be implemented by wasmtime-java

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:16):

so for example the Rust crate has no way of doing this

view this post on Zulip Benjamin Fry (Jan 04 2022 at 21:17):

"wouldn't" be, I mean, they kinda are now :smile:

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:17):

from Rust you can call a wasm function, but there's no concept of a byte slice in wasm, so you can't call wasm with a byte slice (or wasm can't return to you a byte slice)

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:17):

well my point is that if you're baking in an ABI like this that's basically a design choice, and it's not that interface types is intending todo

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:18):

there basically is no standard answer to the question "how do I get a slice of bytes from wasm" if all you can use is core wasm

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:18):

once you involve interface types that question can be answered, however, but it's a distinct answer (and layered as such)

view this post on Zulip Benjamin Fry (Jan 04 2022 at 21:19):

Am I asking the wrong question then? Maybe I should reask a question I had long ago... what is the best practice for passing a byte array into WASM and vis/versa returning one? Or better yet, what's the best practice for accessing that data if not by passing arrays?

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:24):

in a sense this is kind of the wrong question , kind of isn't. Like this is a perfectly valid thing to ask since it's the first thing any embedded wasm module wants to do. The tl;dr is that this is a struggle and this can't be done in a standard way so everyone is left to their own devices. The longer answer is "interface types will solve this", which is what wit-bindgen is. When wit-bindgen gets a Java implementation it will not rely on what you have today to implement byte[] but instead it will do its own thing, similar to what all other wit-bindgen generation modes are doing

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:25):

To answer "what's the best practice" it's "there is none"

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:25):

which isn't a great answer, hence the focus on work on interface types

view this post on Zulip Scott Waye (Jan 04 2022 at 21:27):

If I might jump in here, in a shared nothing model, which I understood interface types to be, does this imply copying the byte[] ?

view this post on Zulip Benjamin Fry (Jan 04 2022 at 21:27):

this sounds a little bit like semantics though... i.e. if wit-bindgen generates interop between byte[] and the target function vs. wastime-java doing that, right? i.e. something needs to translate between those points, right now I do a bunch of reflection in Java to build the bridging code between the two envs...

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:29):

yeah it's true that precisely where implements what doesn't really matter, I don't mean to say one particular way is correct vs another

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:29):

one of the focuses of interface types is cross-language interop, which means we can't focus really all that much on one particular binding, and so far I've been trying to do things that work everywhere

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:30):

and @Scott Waye that's correct, for interface types the byte[] type would be copied in/out of wasm, there's no concept of a "mutable slice buffer"

view this post on Zulip Benjamin Fry (Jan 04 2022 at 21:30):

@Scott Waye for correctness reasons in the JVM there is byte[] which must be copied to/from the WASM memory. There is ByteBuffer which can reference data directly in WASM theoretically, but I haven't figured out a good ownership model for that, so stopped trying to support it.

view this post on Zulip Scott Waye (Jan 04 2022 at 21:34):

It might be useless, and I expect it is, but I did start a c# wit-bindgen to see how hard it would be. Java I think has a lot of the same concerns. I did some of the string bit and left it at that as it looked solvable with just "some hours" more work. https://github.com/yowl/witx-bindgen/blob/csharp/crates/gen-csharp/src/lib.rs

A language binding generator for `witx` (a precursor to WebAssembly interface types) - witx-bindgen/lib.rs at csharp · yowl/witx-bindgen

view this post on Zulip Benjamin Fry (Jan 04 2022 at 21:34):

Right, @Alex Crichton, and I still owe you a beer :smile:. But I guess my higher level question is, if we don't do this in wasmtime-java then we'll do it in the generated code from wit-bindgen which sounds like will be something similar to what we're already doing. :smile: And maybe I can work on the wit-bindgen Java outputs, but I'd still (I think) need to support the calling convention in WASM.

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:35):

that's true yeah and w/e shape of things you're doing is the exact same shape of what wit-bindgen would otherwise generate (aka the canonical abi), it's probably just a few minor differences in the details

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:36):

Calling-convention-wise what wit-bindgen (and interface types and/or the canonical abi) need is "call this wasm with these types and get these types as a result" where temporarily you know that the number of returned values is either 0 or 1

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:36):

plus the ability to read/write bytes in memory

view this post on Zulip Benjamin Fry (Jan 04 2022 at 21:38):

@Scott Waye , I think one thing you're lucky in CSharp with is that it has decent native type and FFI support with C, Java and JNI is bleh.

view this post on Zulip Benjamin Fry (Jan 04 2022 at 21:41):

@Alex Crichton, yeah, I think I'm grokking you. I'm guessing there is still something that I'll need in Java though to allocate bytes for return by ref values, etc... right? today I do that in wasmtim-java when doing the function calls into WASM. So there needs to be some amount of support in wasmtime-java for that functionality, right? and after return freeing the associated memory?

view this post on Zulip Benjamin Fry (Jan 04 2022 at 21:42):

I'll take a look at implementing the Java bindings to wit-bindgen for java and see what needs to happen there.

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:42):

you might want to look at the stuff generated by wit-bindgen today, e.g. the wasmtime-py python bindings

view this post on Zulip Alex Crichton (Jan 04 2022 at 21:42):

that may help show what wit-bindgen does and what's expected of the binding library right now

view this post on Zulip Benjamin Fry (Jan 04 2022 at 21:43):

Yeah, I'll do that. I only perused it at a high-level so far. Thanks for all your answers and help (again).

view this post on Zulip Scott Waye (Jan 04 2022 at 21:44):

@Benjamin Fry You _might_ find https://medium.com/@scottwaye/experiments-in-c-and-webassembly-interface-types-b7a3a85ce966 interesting, its how I spiked c# with the component ABI to get an idea about what would be required for wit-bindgen

view this post on Zulip Benjamin Fry (Jan 04 2022 at 21:44):

Thanks, @Scott Waye , I'll look at that.

view this post on Zulip Benjamin Fry (Jan 05 2022 at 03:05):

btw, @Manuthor , give me a moment to follow Alex's suggestions and see about working with the wit and generating some Java bindings, though, it might take a while.

view this post on Zulip Manuthor (Jan 05 2022 at 09:21):

Thank you @Alex Crichton , @Benjamin Fry and @Scott Waye for all your insights. Let me know Benjamin how to help you.

view this post on Zulip Benjamin Fry (Jan 05 2022 at 18:58):

@Alex Crichton , I think I see the direction this is going in with your Python impl. It appears that you've exposed a huge number of primitives to the Python environment for working with Memory/Store and other contstructs in the wasmtime runtime, in order to allocate and deallocate on the WASM heap, etc. I was trying to avoid exposing the entire wastime surface area in this MVP, but looking at your code here it makes me realize that might not be possible. I still don't feel like trying to work with the C api in Java is the correct direction, because JNI is a little funky, but I at least better understand the air gap you have in mind between interface types and the wasmtime engine. So thank you for your answers to the questions yesterday.

view this post on Zulip Benjamin Fry (Jan 05 2022 at 18:59):

@Manuthor, I think there is a lot of work to be done to start exposing all the layers from Wasmtime needed by the Interface Types implementations. Might mean a bunch of refactoring to make this work correctly.

view this post on Zulip Alex Crichton (Jan 05 2022 at 19:05):

@Benjamin Fry ah ok makes sense! I must admit I've never bound a C library in Java before so I'm quite ignorant as to the difficulties and the nuances there as well

view this post on Zulip Alex Crichton (Jan 05 2022 at 19:05):

what works for python may not work well for java for sure

view this post on Zulip Benjamin Fry (Jan 06 2022 at 01:25):

@Alex Crichton , I've been reviewing the details in the wit-bindgen output. I've found the rust generated code to be the most useful for understanding wit-bindgen. First, I really like the use of a static for the return pointer, I think this removes the need for the allocation I'm doing in the Java for the trampoline between wasm and the JVM. But I'm confused about something... and that's the difference in the way these bindings are generated in Rust for the export vs. the import.

view this post on Zulip Benjamin Fry (Jan 06 2022 at 01:27):

Example, for the wit list-return: function() -> list<u32>, the generated import Rust C FFI is fn wit_import(_: i32); which I export (return by ref) while the generated export C FFI is unsafe extern "C" fn __wit_bindgen_list_return() -> i32, which is returning the pointer directly from the fn.

view this post on Zulip Benjamin Fry (Jan 06 2022 at 01:28):

Either works, but I don't quite understand the calling convention in that case? Am I understanding this correctly?

view this post on Zulip Benjamin Fry (Jan 06 2022 at 02:30):

Benjamin Fry said:

Example, for the wit list-return: function() -> list<u32>, the generated import Rust C FFI is fn wit_import(_: i32); which I export (return by ref) while the generated export C FFI is unsafe extern "C" fn __wit_bindgen_list_return() -> i32, which is returning the pointer directly from the fn.

"...which I export..." => ...which I did expect...

view this post on Zulip Alex Crichton (Jan 06 2022 at 15:28):

@Benjamin Fry oh that's where the import and export ABI of the same-signatured-function isn't the same

view this post on Zulip Alex Crichton (Jan 06 2022 at 15:28):

there's always going to be an adapter between the two anyway which will translate the ABI as well

view this post on Zulip Benjamin Fry (Jan 06 2022 at 17:11):

Alex Crichton said:

Benjamin Fry oh that's where the import and export ABI of the same-signatured-function isn't the same

view this post on Zulip Benjamin Fry (Jan 06 2022 at 17:12):

gah, hit enter too soon. Doesn't that mean that there isn't a consistent FFI calling convention for these types?

view this post on Zulip Benjamin Fry (Jan 06 2022 at 17:13):

I guess I would expect the signature to be the same for the FFI so that all the language wrappers would work in a consistent way, regardless of import or export...

view this post on Zulip Benjamin Fry (Jan 06 2022 at 17:56):

For example, with the current ABI, doesn't this imply that if there are two Rust WASM modules, one with a wit-bindgen export and the other with the same wit-bindgen import, then those two modules wouldn't be able to be linked at runtime? This is where I'm currently confused.

view this post on Zulip Victor Maia (Jan 06 2022 at 19:11):

Sorry for reaching you, but where can I find a simple "hello world" Cranelift example, that just builds a simple (adder?) function, JIT compile it and calls it. The simplest example I could find is the "toy" which has an entire programming language and a lot of extra stuff. Really hard to get started on Cranelift right now.

view this post on Zulip Alex Crichton (Jan 06 2022 at 19:12):

That's correct, yeah, the ABI for each type differs depending on whether it's used in an import or an export. For most types it's the same but some slightly differ. The function signatures won't line up exactly

view this post on Zulip Alex Crichton (Jan 06 2022 at 19:13):

@Victor Maia I think you may want to start a new "topic" in the #cranelift channel for your question?

view this post on Zulip Victor Maia (Jan 06 2022 at 19:14):

My bad, I never used Zulip. Will do. Can I delete the previous message?

view this post on Zulip Alex Crichton (Jan 06 2022 at 19:14):

I'm not actually sure! But no worries!

view this post on Zulip Benjamin Fry (Jan 06 2022 at 20:40):

Alex Crichton said:

That's correct, yeah, the ABI for each type differs depending on whether it's used in an import or an export. For most types it's the same but some slightly differ. The function signatures won't line up exactly

I'm definitely missing something here as I can't figure out how this won't be a problem. But I'll roll with it as best I can :smile:

view this post on Zulip fitzgen (he/him) (Jan 06 2022 at 21:15):

imports and exports are never glued directly to each other, there is always an adapter in between (which will eventually be customizable via adapter functions in interface types)

view this post on Zulip Benjamin Fry (Jan 06 2022 at 22:45):

there is always an adapter in between

Is there an example of the adapter that I can see?

view this post on Zulip Benjamin Fry (Jan 06 2022 at 22:47):

And will that need to be handled by the host language? or do we anticipate the adapter being built into wasmtime?

view this post on Zulip Benjamin Fry (Jan 06 2022 at 22:48):

And thank you, I've been trying to read all the docs, it's just been hard to keep up with everything.

view this post on Zulip Peter Huene (Jan 06 2022 at 23:29):

There's a proof-of-concept "linker" (really just an wasm adapter generator) implementation here, but I fully expect Wasmtime to eventually support generating canonical adapters for components or perhaps other tooling that can be used to compose together different components ahead-of-time

A language binding generator for WebAssembly interface types - wit-bindgen/crates/wasmlink at main · bytecodealliance/wit-bindgen

view this post on Zulip Peter Huene (Jan 06 2022 at 23:50):

whereas wasmlink generates a new wasm module with the adapter glue code inside it, the wasmtime bindings generator can be used to implement a custom host that generates similar "adapter glue" (at runtime) that sits between the host and the module (may be easier to see with the online demo than the wasm generated by wasmlink)

view this post on Zulip Peter Huene (Jan 07 2022 at 00:08):

if it helps to clarify, adapter implementation is outside the scope of the language-specific wit bindings generators; they pass pointers and get back pointers into their own linear memory and it's their job to translate from/to the canonical ABI to/from whatever the implementation language uses for type representation. the adapter that ultimately sits between the caller and callee of an interface function is responsible for marshaling data (when necessary) between the linear memories, as well as ensuring parameters and results properly conform to the canonical ABI (e.g. a handle to a resource is valid prior to invoking the callee).

view this post on Zulip Benjamin Fry (Jan 07 2022 at 00:18):

Thank you. I think this is the missing piece in my understanding.

view this post on Zulip Benjamin Fry (Jan 11 2022 at 02:01):

This is nowhere near ready, but I've got at least got imports started. I have to do a lot of conversion. https://github.com/bluejekyll/wit-bindgen/tree/wasmtime-java

A language binding generator for WebAssembly interface types - GitHub - bluejekyll/wit-bindgen at wasmtime-java

view this post on Zulip Benjamin Fry (May 17 2022 at 20:16):

@Manuthor , thanks for sharing this. I've gotten pinged by some other folks with interest here as well. I've directed them to come back into this conversation to see if we can get more support here.

view this post on Zulip Manuthor (May 19 2022 at 17:07):

@Benjamin Fry , good idea! I must admit, due to WASM performances (being 6x times slower than native ELF binary in my particular case) I had to change my implementation and finally use FFI/JNA in Java. But still, having WASM working natively in Java would be awesome.


Last updated: Dec 23 2024 at 12:05 UTC