Hi everybody, I recently got the opportunity to write a code generator for OCaml that emits wasm (https://github.com/remixlabs/wasicaml/). Besides that this is already the news, is there some interest in the experience I made with that? E:g. I can already tell that GC-aware code limits the speed somewhat, and that I had to do workarounds because of the missing exception handling in wasm (and probably will continue to do even when the proposals are implemented because OCaml promises fast exceptions). And, of course, none of the engines of the bytecodealliance can run it yet. Where is the feedback channel for this?
Hi Gerd, thank you for sharing this here—quite exciting!
For both exception handling and GC support, I think it'd be quite valuable to provide feedback to the groups working on the relevant proposals. Perhaps by filing issues on the github repos for them.
Beyond that, it might even make sense to do a presentation at a WebAssembly Community Group meeting. Those happen every second Tuesday at 18h CEST. The previous link contains details on how to attend, and also on how to add something to the agenda.
But there's also another topic on which this could potentially be very interesting: we've just started conversations about doing differential fuzzing with the spec interpreter for Wasmtime: we'd run fuzz tests and compare both outputs, and the memory state between the spec interpreter and Wasmtime, and treat any differences as probable bugs. (In either implementation.)
We're currently doing this with a Wasm interpreter, but testing against the spec interpreter would be extremely nice. The main issue is that the spec interpreter is written in OCaml, so we need to have a good way to integrate it, as discussed by @Andrew Brown just yesterday. And I wonder if your project may help with that? :smile:
We're currently doing this with a Wasm interpreter, but testing against the spec interpreter would be extremely nice. The main issue is that the spec interpreter is written in OCaml, so we need to have a good way to integrate it, as discussed by @Andrew Brown just yesterday. And I wonder if your project may help with that? :smile:
This is indeed an interesting thought to explore. One concern I might have, though, in the specific context of fuzzing, is implementation diversity: if we are running the OCaml spec as a compiled Wasm module, executed with the same engine whose semantic behaviors we're testing, then there's a possibility we just have tautologies ("FP divide behavior in spec lowers to FP divide in OCaml lowers to Wasm FP divide, which is the same as Wasm FP divide in engine-under-test"). I admit this is somewhat unlikely -- I imagine the spec interpreter probably has explicit branches for e.g. FP corner cases? -- but the circularity of the definition does lessen confidence somewhat. It's still an interesting option IMHO if our fuzz-against-natively-run-interpreter option doesn't work out.
(This is not to distract from the awesomeness of an OCaml implementation on Wasm in general though! More thoughts in a followup comment)
@Gerd Stolpmann this is really exciting news! One particular question I had: you say that "none of the engines of the bytecodealliance can run it yet" -- which features do we need to implement to fix this? IMHO at least, real implementations that rely on features do add useful pressure & feedback. Or is there a bug? Either way, happy to discuss further
Also: we've been having some interesting discussions about what it would mean to support JavaScript executing quickly. You might have seen our SpiderMonkey port (collaboration with Igalia & Mozilla) but that's just an interpreter; in order to do better, in the long run, we will also probably need engine features like exceptions so that we don't have to do a "check every return and exit early" sort of hack. I'd love to hear your thoughts on which Wasm features would make life easier for you, and compare notes on this!
Regarding the fuzz testing. I guess the problem at the moment is really the exception mechanism missing in Wasmtime. I resorted to two alternate ways: one of is the "check every return and exit early" method you are mentioning, and the other is an emulation via Javascript. The first method turned out to be quite fast but it generates additional code. The second method is unfortunately needed because the OCaml runtime relies on long jumping for exceptions coming from C (e.g. for passing an I/O error back). It could turn out that the reference interpreter doesn't trigger any such exceptions of the latter type, in which case you could run it with Wasmtime. I can start an experiment and check whether this is true.
Other than that, my observation was that the OCaml code runs in a fairly deterministic way - occasionally I was seeing some strange shifts of all adresses by a small offset compared with the last test run, but I can't imagine not to find and eliminate the root cause of this.
Another point is that you could run into the "tail call problem". OCaml normally supports tail calls anywhere. In Wasicaml, however, I only handled certain types of tail calls, including self-recursion, for not having to rely on the tail call extension. Looks like the "eval" function in exec.ml of the spec interpreter is good.
Interesting! Yes, on the JS JIT-or-AOT-into-Wasm-bytecode side, longjmp()
for exception unwinding and support for tail calls (or really, not-properly-nested jumps between trampolines or e.g. inline-cache stubs) were the two major issues we saw in mapping to the Wasm model as well.
I had some interesting discussions with @Luke Wagner about this a few months ago; the general outcome was that we should eventually be able to rely on Wasm features for all of these (Wasm has a proper-tail-calls proposal too). At some point I'm personally interested in getting both PTC and EH support in Wasmtime (as it'll be useful for us too); hopefully at that point, it will be useful to experiment with as a target for your VM.
Anyway, echoing @Till Schneidereit above that I suspect the Wasm CG might be interested in hearing about your experiences, or at least comments on the relevant Wasm proposal repos (https://github.com/webassembly/exception-handling, https://github.com/WebAssembly/tail-call, maybe https://github.com/WebAssembly/stack-switching too) if there are any issues you foresee with their designs!
Regarding missing wasm features: number 1 is a fast exception mechanism. The minimum is throw
plus a catch
for one tag type (the more complicated constructions in the exceptions handling proposal are not needed). With "fast" I mean speed that comes close to longjmp, i.e. don't try to record stack frames or so (or do it at least like a backtrace where the work is only done after catching the exception, and only if requested).
There is then the issue of generating good code in the presence of a GC. The main problem is this: The GC must be able to traverse over the live values (for marking, and depending on the type of GC, also for relocating values). Usually the only place for that is the shadow stack. However, you also want to put values as often as possible into locals, to give the JIT engine more freedom. Within the body of a single function you can normally balance both requests out. The tricky thing is when you call sub functions. In wasicaml the values are then always written to the shadow stack, because the sub function could theoretically allocate memory, leading to GC activity. So you are essentially forced to pass values by shadow stack - and not only that, actually none of the locals is allowed to contain a GC-able value. This makes it almost impossible to optimize anything. - I don't really have a good suggestion how to improve this. (If you generate code for a real CPU there are no locals - registers are actually globals - allowing you to deal with the problem when it arises.)
Regarding GC pointers in locals: would it be possible for your compiler to reason about safepoints such that it can manipulate pointers/object references in locals, but ensure that they are all spilled to the shadow stack before any safepoint? This is the approach taken in e.g. SpiderMonkey -- the GC can only trace the stack, so pointers are spilled to the stack before any operation that could GC. (The word "spill" may be a slight abuse of terminology as you're surely not doing full regalloc, but e.g. one shadow-stack slot per variable/binding in the original program should be fairly simple to implement, I think?)
Also, while integration with Wasm GC would solve this really elegantly, it would also imply allocation outside your Wasm heap; I can imagine scenarios where it would be more desirable to use GC within the heap (e.g., snapshotting scenarios)
Regarding the safepoints: basically this is the right direction, and I'd like to do add more data flow analysis code to my compiler so that it exploits the freedom between safepoints. (I did not put this into my first version as it is not needed for an MVP. Also, I'm still a bit skeptical whether it is sufficient to generate really fast code, but let's see.) However, as far as I understand, there is still one type of function call that is problematic: calling small functions that don't interact with the GC, and which don't need to be called at a safepoint. You can recognize those functions only by global analysis (and only at link time if you support separate compilation) - simply because you need the information at the call site but you first know when compiling the callee. Unfortunately such functions have quite some influence on speed (in C you'd declare them as inline).
Btw, I'm not calling for the GC extensions - my guess it wouldn't help much for existing languages anyway, as you typically cannot change the memory layout without breaking tons of code.
I'm not member of the OCaml core team but well-known there, and talked to Xavier Leroy and Mark Shinwell about their thoughts. There's currently no dev capacity (they are busy with multicore), but they have rough plans about adding wasm support to the core compiler as a completely new backend where the codegen forks after the high-level optimizer. The interesting thing is that they don't want to reuse any of the existing backend code. I guess they understood that wasm is a different thing, and the existing functionality (like register allocator) cannot be easily changed to wasm.
Btw, I'm not calling for the GC extensions - my guess it wouldn't help much for existing languages anyway, as you typically cannot change the memory layout without breaking tons of code.
Indeed, was just noting in general I guess! That said, there is an interesting question about what to do when wasm-GC does exist, and can expose host objects (e.g., JS objects in a Wasm-in-JS-engine context) -- most languages with their own GCs I imagine would want to build "proxy objects" of sorts to hold the refs and allow interaction. Incidentally there was an interesting talk about integration between multiple GCs (specifically, solving the inter-heap-cycle problem) by Ross Tate in the Wasm CG, and there might be a writeup somewhere. Anyway, lots of open questions here...
they don't want to reuse any of the existing backend code. I guess they understood that wasm is a different thing, and the existing functionality (like register allocator) cannot be easily changed to wasm.
Another interesting parallel! We also came to this conclusion as far as a long-term port of SpiderMonkey to emit Wasm is concerned. I don't know the details of OCaml's optimizing backend, but in our case it's a bit too much of an impedance mismatch to try to adapt a compiler that deals with CFGs, does regalloc, assumes a software-visible (and manipulable) callstack, and branch-level control flow, etc. The other advantage of a closer-to-1:1-mapping between guest-language concepts (exceptions, etc) and native Wasm primitives is the possibility of better cross-language interaction. E.g., imagine OCaml calls into JS calls into OCaml, and an exception is thrown; perhaps the intermediate JS can catch it, or otherwise, it should proapagate through to the calling OCaml. There are probably a bunch of semantic details that this handwaves over but the general idea is cool :-)
FWIW apparently alon zakai has been experimenting with a "lowergc" pass in binaryen, to translate wasm+gc to mvp. could be an interesting option for wasi mvp deployments
Using the rust crate wasmtime-wasi
I want to change the WasiCtxBuilder
's stdout to log::info!(...)
. How do I do that?
You can use .set_stdout
and pass a custom type implementing WasiFile
. In this implementation you can line buffer and then use info!()
when the line gets flushed.
Last updated: Jan 24 2025 at 00:11 UTC