I am finally coming back to experiment with getting Wasmtime running in Java with the now stabilized FFM API and jextract. I'm currently running into an issue with getting the the hello.c C api demo working (in Java). It looks like I'm running into a poisoned RwLock in the TypeRegistry, clearing the poisoned value gets passed the issue, but then just runs into another panic after in RegisteredType::register_singleton_rec_group. This is all happening during the call to wasmtime_func_new matching the logic in hello.c. Anyone have any tips on how track down the issue and why this lock is getting poisoned? (this is in the current release-36.0.0 branch)
If a lock is poisioned it means the thread it was running on died somehow (unwound via an exception)
You can use the wasmtime c api from multiple threads safely, because internally wasmtime is thread safe, but if you unwind across wasmtime in one of your threads its quite possible to break everything for all threads
I'm not currently spawning any additional threads. Is there a background process in Wasmtime?
wasmtime will spawn threads in order to parallelize cranelift codegen of wasm
but it manages those itself
and if you're using wasmtime-wasi in there, it will start its own multi-threaded tokio to perform io
yeah, this is literally just the hello demo, https://docs.wasmtime.dev/examples-c-hello-world.html, nothing fancy yet.
I'll see if I cut out the junit test runner if that removes some of the potential issues.
sorry, I really don't have any other ideas. I don't know the jvm or FFM well enough to guess what might be getting you there.
If you're able to set an environment variable WASMTIME_LOG=trace, the trace-log output from Wasmtime might give some hints as to what's going wrong
(feel free to put that in a gist and link here -- I can't guarantee I'll see anything but someone might)
ah, cool, I was wondering about that. I hacked up the code a bunch in wasmtime already to try and trace all of this in more detail.
Oh it'll be RUST_LOG for the c api
could you gist a backtrace as well?
if you're running into a poisoned lock that means that something panicked earlier and that's the interesting backtrace in theory
I'm not aware of anything which would cause this so it sounds like a bug
Here's the backtrace: https://gist.github.com/bluejekyll/ff28e00ed03c9a3fc689291be2a8fd0b
let me get the full traced logs as well.
oh
no I got it
wait no nvmd
is this all you see, nothing else? There in theory should be some other backtrace b/c poisoning a lock should require a panic somewhere
So far, the engine, store, and context all are created without errors. I can compile the wat -> wasm without error. so things are "working" up to that point. I had a theory that the func_ty I was creating for the C callback was bad, but that appears to be fine as well.
I have the full logs now...
so you sort of got this far in a sense? (converted to Java of course)
yeah, exactly.
I added a comment to that gist that has the log output.
wow but really no other panicking backtrace?
you could try running wasmtime inside gdb and adding catch points for rust panics, that should show wherever the panic is happening
running in gdb will be hard... I'm going to try and remove some of the maven and junit overhead to make it just a raw java execution. that will take me a minute...
how certain are you that the bindings are right? b/c this could also be random corruption of memory or something like that
e.g. you just happen to flip the "this is poisoned lock" bit
albeit implausible
Oh, it totally could be corruption. I'm definitely new to this Java FFM API. I've done a lot of double checking, but definitely could be something there. I did try clearing the bit, but things are clearly in a bad state at that point.
a smoking gun for wasmtime would be a reproduction with just the C API (e.g. a C file repro)
but I realize that's probably difficult to create in this case
actually in gdb you'll want to use rbreak rust_panic according to https://github.com/rust-lang/rust/issues/21102#issuecomment-3080599300
Yeah, I assume at this point that I'm screwing something up with Java, but I've not found that yet.
are you able to push up some code to glance over?
I also know nothing of java or ffm but something might stick out still
I can post a gist of the demo code that I have in Java, if you want more I could push it...
yeah just w/e you got so far interacting with the C API
Here's the impl: https://gist.github.com/bluejekyll/cd0f53f2c701ce6f879c305048b1da73
I have my old JNI stuff lying around this codebase, I'd prefer not to push all of that until I get it cleaned up.
unsure if this would affect things but helloCallbackDesc doesn't look quite right
or is one of the first arguments the return value?
return type*
Doesn't that match this C API? https://github.com/bytecodealliance/wasmtime/blob/1047b51183f5906ded5d82ec375f77e586485b5f/examples/hello.c#L19-L21
I think the whole thing is crashing before that call though...
no I think I'm just confused, it looks like the first parameter is actually the return type, then it's all the param types
I thought it was just the param types but then the return type wouldn't otherwise be specified anywhere
ok wild and crazy guess: tls is super borked
IIRC panicking/poisoning goes through TLS infrastructure in the rust standard library and maybe something about that is super broken in this context
so, e.g., when the lock is originally unlocked it mistakenly thinks the thread is panicking because the implementation of TLS is broken
to confirm/deny this since it looks like you have a custom build of Wasmtime already you might be able to print this function's result in various places throughout wasmtime
that should always return false but if it prints true then something is gone wrong
yeah, I can do that.
give me a minute.
how is wasmtime linked? I presume it's not statically linked so is java dlopen'ing the libwasmtime.so somewhere?
yeah, dylib.
here's the hs_err log file from java that has a bunch of state captured, if you're interested... (which is a SEGFAULT that I thought was partially due to the panic and the poisoned lock as I dug deeper). https://gist.github.com/bluejekyll/760a232f39c651647552e095f2451e24
that get's handled by Java's generated functions from the jextract tool.
ok I think that may still be explainable with broken tls
notably RegisteredType::new is on the stack which does rwlock things which hits tls
https://stackoverflow.com/questions/51116820/c-11-thread-local-and-foreign-threads seems semi-related but also not helpful
wow, good find, let me read through that.
apart from this though I'm all out of ideas :(
Yeah, I'll keep digging. I think some of these hints have been good so far, and at least give me some things to experiment with.
I'd be interested in hearing more about how/if the JVM is trashing the TLS context if you find out anything specific
I found something about some potential issues with signals for traps, disabling signals_based_traps (which btw, is not exposed to the c-api) seems to have "helped". I'm now getting to a more consistent failure at a slightly different location. But this is progress.
that would make sense since I'd expect both the JVM and wasmtime use SIGSEGV or similar for catching illegal memory accesses (null references for Java)
Wasmtime does have logic to forward on to an already-registered signal handler (see here); so this isn't a slam-dunk obvious conflict, at least, though there could still be weird interactions of course.
ok I know I'm a broken record but wasmtime's signal handler accesses TLS, and if we assume that the JVM sort of randomly gets signals for GC and whatnot and/or for other threads, and if we assume that accessing TLS in Rust is an issue, then that would explain a why a nondeterministic error with signal handling would be replaced by a deterministic error without signal handling. (but perhaps still point a smoking gun at tls...)
Yeah, I'm continuing to try and track that down. But disabling the signal handling gives me a consistent failure scenario, whereas before it was hard to track down.
Other things I need to double check somehow is if the Arena based allocations in the Java layer are somehow not playing nice in Rust, like somehow having different layouts or something.
rust is going to be pulling in an allocator from libc
both your libc and the jvm are going to be implementing their allocators by asking the OS for pages through mmap, i wouldnt be too suspicious about that compared to the red flags around TLS
Last updated: Dec 06 2025 at 05:03 UTC