Anyone here using wasmtime-go?
Getting an interesting issue where, having done nothing but create a store, module, and wasi instance and then starting an http server, I reliably get a segfault in 2 minutes. I'm guessing the 2 minutes is a GC pause. If I actually use an instance of the module every so often I don't get the segfault.
Ok yeah, so the runtime forces a GC cycle every 2 minutes.
@Alex Vidal oh dear, do you have code I could peek at? I see CI for the repo still segfaults on macOS, but I unfortunately haven't been able to track down as to why
Also, have you tried the master branch of the git repo?
I haven't. I'm running 0.16.0 atm. I'll see if I can get a minimal reproduction.
Interesting...v0.16.0 isn't on the master timeline at all?
Oh, but it's ahead of master anyway
ah yeah it's a separate commit b/c it has binaries
roger. let me see if i can reproduce without all my bullshit
really annoying to iterate on this when it takes 2 minutes each time you want to reduce your test case
ah yeah there it goes
about to upload a gist, going to see if i can repro with an inline wat
@Alex Crichton https://gist.github.com/avidal/4710a963fbc8a80feae84663473ea971
Running another test without the wasi instance to see if it makes a difference. Wondering if a runtime.KeepAlive is missing in one of the wasi funcs?
Remove the wasi bits and it doesn't crash
oh I think I see what's wrong
ok, i can upload a segfault if you want it
nah that's ok
this is an ownership issue I forgot about
and didn't see it b/c I didn't write wasi tests :(
I noticed :)
But it follows the rust api very well so it wasn't difficult to figure out
Is it the wasi instance that needs a keep alive? I see that if you link the wasi instance in a linker, the linker calls keepalive on the wasi instance, but constructing the instance via NewWasiInstance
doesn't
(I know very little about cgo or generally about ffi stuff so this part is a bit over my head)
ok should be fixed in https://github.com/bytecodealliance/wasmtime-go/commit/e23af2e4825f3d9c67bddc329e16c954f75620e9
Very nice, will test it out right now
Oh, I'll wait for the tag and then get it instead of futzing with the compile step
@Alex Crichton when you get a chance could you push a v0.17.0 tag?
@Alex Vidal I've published 0.16.1 now!
sweet, thanks!
Left my program running for 5ish hours and no segfaults :) Thanks again Alex
Nice, thanks again for the report!
Submitted a PR to fix a segfault in wasi...but I also realized I need to figure out how to make this work reliably on macOS since I'll be using it at work. I personally don't have a macbook, but my coworkers do..
@Alex Vidal yeah I've been trying to investigate it on and off as to what's going wrong, but I can't figure anything out yet :(
I can reliably reproduce the issue on macos and can't reproduce at all on linux
just have no idea where to start with debugging
does it reproduce in a docker container with a macos host?
otherwise yeah i guess you have a non-zero chance of hitting it on every gc cycle which is a minimum of 2 minutes
you can probably reproduce faster with -tags debug
which forces a gc trigger way more often
(that's how I'm reproducing in tests)
and I'm not sure if docker would work b/c I don't think you virtualize macos, right?
yep, i ended up changing the actions file on my fork to run three passes with debug enabled to get it to repro more often
also, i'm not positive how docker for mac works. i know docker runs in a lightweight vm but past that...:shrug:
considering renting a mac from macincloud.com to try this out
I adjusted the workflow to only run mac, skip the download steps (inlined the dependencies), and run the debug test build a bunch of times to see if i can find a pattern in the failures...but nothing so far. https://github.com/avidal/wasmtime-go/pull/1
Can you run it in lldb? lldb -o run -o bt --batch -- executable arg1 arg2
should run it and show a backtrace when it crashes.
I would but no mac on my end. Not sure what Alex has tried
oh maybe i can do that in the action though
yes, that was what I meant.
FWIW I'm working to minimize this, so far I've got this program which will segfault:
package main
// #cgo CFLAGS:-I${SRCDIR}/build/include
// #cgo LDFLAGS:-lwasmtime -lm -ldl -L${SRCDIR}/build/macos-x86_64
// #include <shims.h>
import "C"
import "runtime"
func main() {
engine := C.wasm_engine_new()
store := C.wasm_store_new(engine)
var tys C.wasm_valtype_vec_t
ty := C.wasm_functype_new(&tys, &tys)
f := C.c_func_new_with_env(store, ty, 0, 0)
C.wasmtime_func_call(f, nil, 0, nil, 0, nil)
runtime.GC()
}
oh very nice
fwiw, that minimal example segfaults for me on C.wasmtime_func_call
(on linux though)
well kind of, i didn't make it exactly the same so i should just shut up
@Alex Vidal I opened up https://github.com/bytecodealliance/wasmtime-go/issues/10 as a dedicated issue for this
very nice, thanks. hopefully we can get some help!
I'm going to try to run the minimal example in your ticket on MacOS but building and executing it in a docker container. I'm going to be shipping a wasmtime-go based local development proxy to my team in the next couple of weeks and it'd be nice to know I have a MacOS workaround..
@Alex Crichton looks like it was a regression in Go 1.14. I can't reproduce in 1.13.10. I setup a matrix of Go versions in my fork of wasmtime-go to confirm: https://github.com/avidal/wasmtime-go/pull/2
nice find!
I had a coworker run the test suite on MacOS Catalina (with the debug tag to trigger the GC often) using Go 1.13.10 about a hundred times and he never saw a segfault.
Perhaps we could update wasmtime-go CI and also add a note to the README?
I can open a PR for that. Add go 1.13.10 to CI yeah?
sounds good to me yeah
opened. not sure how much detail you want in the README
your comment in the README is a lot better :)
oh no worries just tweaking things :)
This is (probably) unrelated, but if you run the test suite using go 1.14 with -race
, it'll enable the checkptr
feature which says that func.go#319 has unsafe pointer arithmetic (reproducible on linux)
just wanted to say thanks to you @Alex Crichton and the rest of the wasmtime team. My local proxy can trivially handle over 100 rps sustained. It takes the incoming http request, executes a wasm program which modifies the request and calls back to Go, which sends the request to another service and returns the response back into wasm which then sends it back into Go and back to the user.
Effectively the Go proxy is a runtime for middleware implemented (currently) in Rust, compiled to wasm.
100 rps with a 90th percentile latency of 11ms
(once I put a mutex around instantiating the module that is)
oh and that's with creating a new instance of the module for every single request which is how the middleware api is designed
If I have Rust code that imports a function whose "extern C" decl has an arg for a u32
, and I link that function via wasmtime-go
, I have to provide a function that accepts an int32
. What if, on the Rust side, I call that function with the max u32? That won't fit into the int32 on the Go side.
Ok. I messed with this a bit more. Looks like I can just cast the param to uint32 in Go, which makes sense.
Can you summarize the impact of the 0.17.0 change @Alex Crichton? The wasmtime change indicates module can be sent across threads. Do you still need to lock the store to a single thread at a time?
It looks like it's a change that could eventually unlock multiple instances on multiple threads, but we're not there yet, correct?
@Alex Vidal that's been fixed now! The API hasn't been updated yet but a Module
is tied to an Engine
now, not a Store
and Engine
is thread-safe (as before)
Module
is now thread safe as well
(I probably need to update the free code too...)
modules can be shared across threads, but instances cannot
instances will likely always be locked to one thread
Ok, so given a triple (module, store, engine) I can create (and use) N instances on N threads?
(and yeah, i don't mean to move an instance across threads)
I ended up rearchitecting to having a pool of instances that each have their own context (module, store, linker) and go back into the pool once they're done. But it sounds like I could instead go back to my original design where I have a single (module, store, linker) and create as many new instances as I want that are used on different threads?
you can have a single engine/module, but you still need a store-per-thread
and store represents the lifetime of an instance as well (in terms of memory allocations), so you likely want a store to be relatively short-lived
Ah, I didn't know that. I wasn't disposing the store.
note that this could all use better documentation as well
which is something I would like to get to!
Ok so. I can construct a module given an engine. And hold onto that module (I don't need to keep the engine around). And then I can construct a store, wasi instance, linker, and then an actual wasm instance on each request, using the same module
(didn't realize that about the store, probably because the module depended on it when i started, so i thought it was more of a general wasmtime context)
Just looked a bit more. I didn't realize your "That's been fixed!" comment referred to something past 0.17.0.
You may need to keep engine around to create stores though
it is uncharted territory when two modules from different engines with different settings are used (might work though :shrug: )
Yeah. It's just a bit confusing because the c-api change from your PR didn't change the signature for wasm_module_new
, it still takes a store, it just grabs the engine from the store and uses that instead. So I suppose that PR was internal (from the module c-api perspective).
But I suppose, for my purposes, the store that I supply isn't actually important? I can just create one to use to compile the module and then throw it away after?
wasm_module_new
will not change we are just API implementors, we can change wasmtime_module_new
and I did, but last minute HostRef refactoring changed it back
wasm-c-api requires usage of wasm_module_share wasm_module_obtain when transferred between stores (hence threads)
see e.g. https://github.com/WebAssembly/wasm-c-api/blob/master/example/threads.c#L32
but we can do whatever we want with rust api :)
Roger. Getting pretty far out of my domain but let me know if there's something I can help with in the Go library @Alex Crichton
Thanks for the example, that makes it a bit more clear!
nice work on reducing the macos bug report @Alex Crichton !
Last updated: Dec 23 2024 at 13:07 UTC