Stream: general

Topic: Java with The Foreign Function and Memory (FFM) API


view this post on Zulip Benjamin Fry (Sep 04 2025 at 22:46):

I am finally coming back to experiment with getting Wasmtime running in Java with the now stabilized FFM API and jextract. I'm currently running into an issue with getting the the hello.c C api demo working (in Java). It looks like I'm running into a poisoned RwLock in the TypeRegistry, clearing the poisoned value gets passed the issue, but then just runs into another panic after in RegisteredType::register_singleton_rec_group. This is all happening during the call to wasmtime_func_new matching the logic in hello.c. Anyone have any tips on how track down the issue and why this lock is getting poisoned? (this is in the current release-36.0.0 branch)

view this post on Zulip Pat Hickey (Sep 04 2025 at 22:57):

If a lock is poisioned it means the thread it was running on died somehow (unwound via an exception)

view this post on Zulip Pat Hickey (Sep 04 2025 at 22:59):

You can use the wasmtime c api from multiple threads safely, because internally wasmtime is thread safe, but if you unwind across wasmtime in one of your threads its quite possible to break everything for all threads

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:00):

I'm not currently spawning any additional threads. Is there a background process in Wasmtime?

view this post on Zulip Pat Hickey (Sep 04 2025 at 23:01):

wasmtime will spawn threads in order to parallelize cranelift codegen of wasm

view this post on Zulip Pat Hickey (Sep 04 2025 at 23:01):

but it manages those itself

view this post on Zulip Pat Hickey (Sep 04 2025 at 23:02):

and if you're using wasmtime-wasi in there, it will start its own multi-threaded tokio to perform io

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:02):

yeah, this is literally just the hello demo, https://docs.wasmtime.dev/examples-c-hello-world.html, nothing fancy yet.

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:04):

I'll see if I cut out the junit test runner if that removes some of the potential issues.

view this post on Zulip Pat Hickey (Sep 04 2025 at 23:04):

sorry, I really don't have any other ideas. I don't know the jvm or FFM well enough to guess what might be getting you there.

view this post on Zulip Chris Fallin (Sep 04 2025 at 23:07):

If you're able to set an environment variable WASMTIME_LOG=trace, the trace-log output from Wasmtime might give some hints as to what's going wrong

view this post on Zulip Chris Fallin (Sep 04 2025 at 23:07):

(feel free to put that in a gist and link here -- I can't guarantee I'll see anything but someone might)

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:08):

ah, cool, I was wondering about that. I hacked up the code a bunch in wasmtime already to try and trace all of this in more detail.

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:09):

Oh it'll be RUST_LOG for the c api

A lightweight WebAssembly runtime that is fast, secure, and standards-compliant - bytecodealliance/wasmtime

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:09):

could you gist a backtrace as well?

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:10):

if you're running into a poisoned lock that means that something panicked earlier and that's the interesting backtrace in theory

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:10):

I'm not aware of anything which would cause this so it sounds like a bug

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:14):

Here's the backtrace: https://gist.github.com/bluejekyll/ff28e00ed03c9a3fc689291be2a8fd0b

Poisoned RwLock in Wasmtime. GitHub Gist: instantly share code, notes, and snippets.

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:14):

let me get the full traced logs as well.

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:14):

oh

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:14):

no I got it

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:14):

wait no nvmd

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:15):

is this all you see, nothing else? There in theory should be some other backtrace b/c poisoning a lock should require a panic somewhere

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:24):

So far, the engine, store, and context all are created without errors. I can compile the wat -> wasm without error. so things are "working" up to that point. I had a theory that the func_ty I was creating for the C callback was bad, but that appears to be fine as well.

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:24):

I have the full logs now...

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:25):

so you sort of got this far in a sense? (converted to Java of course)

A lightweight WebAssembly runtime that is fast, secure, and standards-compliant - bytecodealliance/wasmtime

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:27):

yeah, exactly.

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:30):

I added a comment to that gist that has the log output.

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:30):

wow but really no other panicking backtrace?

view this post on Zulip Jacob Lifshay (Sep 04 2025 at 23:30):

you could try running wasmtime inside gdb and adding catch points for rust panics, that should show wherever the panic is happening

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:31):

running in gdb will be hard... I'm going to try and remove some of the maven and junit overhead to make it just a raw java execution. that will take me a minute...

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:32):

how certain are you that the bindings are right? b/c this could also be random corruption of memory or something like that

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:32):

e.g. you just happen to flip the "this is poisoned lock" bit

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:32):

albeit implausible

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:33):

Oh, it totally could be corruption. I'm definitely new to this Java FFM API. I've done a lot of double checking, but definitely could be something there. I did try clearing the bit, but things are clearly in a bad state at that point.

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:34):

a smoking gun for wasmtime would be a reproduction with just the C API (e.g. a C file repro)

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:34):

but I realize that's probably difficult to create in this case

view this post on Zulip Jacob Lifshay (Sep 04 2025 at 23:34):

actually in gdb you'll want to use rbreak rust_panic according to https://github.com/rust-lang/rust/issues/21102#issuecomment-3080599300

Expected behavior: when I use gdb, gdb should catch the panic and I should be able to use bt to analyze the stack. when I use RUST_BACKTRACE=1 I should see source files and line numbers in the back...

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:35):

Yeah, I assume at this point that I'm screwing something up with Java, but I've not found that yet.

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:36):

are you able to push up some code to glance over?

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:36):

I also know nothing of java or ffm but something might stick out still

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:37):

I can post a gist of the demo code that I have in Java, if you want more I could push it...

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:37):

yeah just w/e you got so far interacting with the C API

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:39):

Here's the impl: https://gist.github.com/bluejekyll/cd0f53f2c701ce6f879c305048b1da73

WasmtimeJavaTest.java. GitHub Gist: instantly share code, notes, and snippets.

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:41):

I have my old JNI stuff lying around this codebase, I'd prefer not to push all of that until I get it cleaned up.

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:44):

unsure if this would affect things but helloCallbackDesc doesn't look quite right

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:44):

or is one of the first arguments the return value?

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:44):

return type*

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:47):

Doesn't that match this C API? https://github.com/bytecodealliance/wasmtime/blob/1047b51183f5906ded5d82ec375f77e586485b5f/examples/hello.c#L19-L21

A lightweight WebAssembly runtime that is fast, secure, and standards-compliant - bytecodealliance/wasmtime

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:47):

I think the whole thing is crashing before that call though...

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:48):

no I think I'm just confused, it looks like the first parameter is actually the return type, then it's all the param types

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:48):

I thought it was just the param types but then the return type wouldn't otherwise be specified anywhere

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:49):

ok wild and crazy guess: tls is super borked

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:50):

IIRC panicking/poisoning goes through TLS infrastructure in the rust standard library and maybe something about that is super broken in this context

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:50):

so, e.g., when the lock is originally unlocked it mistakenly thinks the thread is panicking because the implementation of TLS is broken

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:51):

to confirm/deny this since it looks like you have a custom build of Wasmtime already you might be able to print this function's result in various places throughout wasmtime

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:51):

that should always return false but if it prints true then something is gone wrong

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:51):

yeah, I can do that.

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:51):

give me a minute.

view this post on Zulip Alex Crichton (Sep 04 2025 at 23:52):

how is wasmtime linked? I presume it's not statically linked so is java dlopen'ing the libwasmtime.so somewhere?

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:54):

yeah, dylib.

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:57):

here's the hs_err log file from java that has a bunch of state captured, if you're interested... (which is a SEGFAULT that I thought was partially due to the panic and the poisoned lock as I dug deeper). https://gist.github.com/bluejekyll/760a232f39c651647552e095f2451e24

Wasmtime C API with Java Heap. GitHub Gist: instantly share code, notes, and snippets.

view this post on Zulip Benjamin Fry (Sep 04 2025 at 23:59):

that get's handled by Java's generated functions from the jextract tool.

view this post on Zulip Alex Crichton (Sep 05 2025 at 00:02):

ok I think that may still be explainable with broken tls

view this post on Zulip Alex Crichton (Sep 05 2025 at 00:02):

notably RegisteredType::new is on the stack which does rwlock things which hits tls

view this post on Zulip Alex Crichton (Sep 05 2025 at 00:06):

https://stackoverflow.com/questions/51116820/c-11-thread-local-and-foreign-threads seems semi-related but also not helpful

I would like to use C++ 11 thread_local, but our application embeds a JVM, and sometimes C++ methods are called from Java-created thread via JNI. This is essentially the same problem as if an exter...

view this post on Zulip Benjamin Fry (Sep 05 2025 at 00:07):

wow, good find, let me read through that.

view this post on Zulip Alex Crichton (Sep 05 2025 at 00:09):

apart from this though I'm all out of ideas :(

view this post on Zulip Benjamin Fry (Sep 05 2025 at 00:11):

Yeah, I'll keep digging. I think some of these hints have been good so far, and at least give me some things to experiment with.

view this post on Zulip David Lloyd (Sep 05 2025 at 13:00):

I'd be interested in hearing more about how/if the JVM is trashing the TLS context if you find out anything specific

view this post on Zulip Benjamin Fry (Sep 05 2025 at 22:48):

I found something about some potential issues with signals for traps, disabling signals_based_traps (which btw, is not exposed to the c-api) seems to have "helped". I'm now getting to a more consistent failure at a slightly different location. But this is progress.

view this post on Zulip Jacob Lifshay (Sep 05 2025 at 22:58):

that would make sense since I'd expect both the JVM and wasmtime use SIGSEGV or similar for catching illegal memory accesses (null references for Java)

view this post on Zulip Chris Fallin (Sep 05 2025 at 23:01):

Wasmtime does have logic to forward on to an already-registered signal handler (see here); so this isn't a slam-dunk obvious conflict, at least, though there could still be weird interactions of course.

A lightweight WebAssembly runtime that is fast, secure, and standards-compliant - bytecodealliance/wasmtime

view this post on Zulip Alex Crichton (Sep 05 2025 at 23:15):

ok I know I'm a broken record but wasmtime's signal handler accesses TLS, and if we assume that the JVM sort of randomly gets signals for GC and whatnot and/or for other threads, and if we assume that accessing TLS in Rust is an issue, then that would explain a why a nondeterministic error with signal handling would be replaced by a deterministic error without signal handling. (but perhaps still point a smoking gun at tls...)

view this post on Zulip Benjamin Fry (Sep 05 2025 at 23:27):

Yeah, I'm continuing to try and track that down. But disabling the signal handling gives me a consistent failure scenario, whereas before it was hard to track down.

view this post on Zulip Benjamin Fry (Sep 05 2025 at 23:30):

Other things I need to double check somehow is if the Arena based allocations in the Java layer are somehow not playing nice in Rust, like somehow having different layouts or something.

view this post on Zulip Pat Hickey (Sep 05 2025 at 23:31):

rust is going to be pulling in an allocator from libc

view this post on Zulip Pat Hickey (Sep 05 2025 at 23:34):

both your libc and the jvm are going to be implementing their allocators by asking the OS for pages through mmap, i wouldnt be too suspicious about that compared to the red flags around TLS


Last updated: Dec 06 2025 at 05:03 UTC