Stream: git-wasmtime

Topic: wasmtime / PR #9688 Show weirdness of Windows on CI


view this post on Zulip Wasmtime GitHub notifications bot (Nov 26 2024 at 21:56):

alexcrichton opened PR #9688 from alexcrichton:show-windows-weirdness to bytecodealliance:main:

Work-in-progress while I get to this to the point where I can share with folks. Not intended for merge

view this post on Zulip Wasmtime GitHub notifications bot (Nov 26 2024 at 21:57):

alexcrichton updated PR #9688.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 26 2024 at 21:57):

alexcrichton updated PR #9688.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 26 2024 at 21:59):

alexcrichton updated PR #9688.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 26 2024 at 22:05):

alexcrichton updated PR #9688.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 26 2024 at 22:09):

alexcrichton updated PR #9688.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 26 2024 at 22:46):

alexcrichton updated PR #9688.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 26 2024 at 22:50):

alexcrichton updated PR #9688.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 26 2024 at 22:57):

alexcrichton commented on PR #9688:

@jsturtevant if you don't mind and if you're willing we could use some more Windows-specific advice/help on this. For background on this I was working on https://github.com/bytecodealliance/wasmtime/pull/9675 recently and was having a tough time landing it as it was consistently failing on CI. The failure was only on MinGW and not on other Windows targets (e.g. MSVC) or other platforms (e.g. macOS or Linux). The actual PR itself turned out to be largely unrelated to the crashes on MinGW since the reproduction was quite simple.

I've setup the PR here to showcase what's going on. Specifically we have a bit of C code in a helpers.c file around calling setjmp. We can't call setjmp from Rust as the Rust compiler has no concept of a function that returns twice (e.g. setjmp), so we have to rely on C for this. The current version I linked here is the one with the workaround I landed in #9675. Basically we want to write a function that looks like:

  platform_jmp_buf buf;
  if (platform_setjmp(buf) != 0) {
    return false;
  }
  *buf_storage = &buf;
  return body(payload, callee);

but this crashes on MinGW.

I've arranged a number of CI jobs here to showcase the various results of what's happening. Each job is run in release/debug mode to show the effect of a "fix" with/without optimizations. Each CI job is running cargo run test.wat which is executing the test.wat file in this PR in the CLI, basically just instantiating it. The test.wat file looks like:

(module (func unreachable) (start 0))

so a small wasm module that traps on instantiation. The longjmp in our trap handler (as unreachable compiles to ud2 so we execute a vectored exception handling on Windows) is what's crashing.

The results of CI look like:

build mode -DGOOD -DBAD -DBAD_V1 -DBAD -DBAD_V2
debug :check: :cross_mark: :check:
release :check: :cross_mark: :cross_mark:

For a bit easier to digest this is what we have for diffs:

-DBAD_V1

diff --git a/crates/wasmtime/src/runtime/vm/helpers.c b/crates/wasmtime/src/runtime/vm/helpers.c
index 84711450a..58ffea805 100644
--- a/crates/wasmtime/src/runtime/vm/helpers.c
+++ b/crates/wasmtime/src/runtime/vm/helpers.c
@@ -110,16 +110,16 @@ static bool wasmtime_setjmp_inverted(void **buf_storage,
                                      void *payload, void *callee) {
   platform_jmp_buf buf;
   if (platform_setjmp(buf) != 0) {
-    return true;
+    return false;
   }
   *buf_storage = &buf;
-  return !body(payload, callee);
+  return body(payload, callee);
 }

 bool VERSIONED_SYMBOL(wasmtime_setjmp)(void **buf_storage,
                                        bool (*body)(void *, void *),
                                        void *payload, void *callee) {
-  return !wasmtime_setjmp_inverted(buf_storage, body, payload, callee);
+  return wasmtime_setjmp_inverted(buf_storage, body, payload, callee);
 }

 void VERSIONED_SYMBOL(wasmtime_longjmp)(void *JmpBuf) {

-DBAD_V2

diff --git a/crates/wasmtime/src/runtime/vm/helpers.c b/crates/wasmtime/src/runtime/vm/helpers.c
index 84711450a..f40745099 100644
--- a/crates/wasmtime/src/runtime/vm/helpers.c
+++ b/crates/wasmtime/src/runtime/vm/helpers.c
@@ -110,16 +110,17 @@ static bool wasmtime_setjmp_inverted(void **buf_storage,
                                      void *payload, void *callee) {
   platform_jmp_buf buf;
   if (platform_setjmp(buf) != 0) {
-    return true;
+    return false;
   }
   *buf_storage = &buf;
-  return !body(payload, callee);
+  bool ret = body(payload, callee);
+  return ret;
 }

 bool VERSIONED_SYMBOL(wasmtime_setjmp)(void **buf_storage,
                                        bool (*body)(void *, void *),
                                        void *payload, void *callee) {
-  return !wasmtime_setjmp_inverted(buf_storage, body, payload, callee);
+  return wasmtime_setjmp_inverted(buf_storage, body, payload, callee);
 }

 void VERSIONED_SYMBOL(wasmtime_longjmp)(void *JmpBuf) {

the really weird thing is that -DBAD_V2, the only real change is "put the return value in a local variable", causes the test to be "fixed". In release mode this breaks down though.

So overall, is this something you'd be able to help us out debugging? We have a workaround for now so it's not too urgent, but the workaround is pretty wonky and feels quite brittle, so it seems like it'd be best to get a better understanding of it.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 02 2024 at 17:22):

jsturtevant commented on PR #9688:

I will take look at it this week

view this post on Zulip Wasmtime GitHub notifications bot (Dec 07 2024 at 00:33):

jsturtevant commented on PR #9688:

Got some time to look into this today. I am able to reproduce it locally. This is a bit out of my expertise zone but doing some research I found a couple possible issues:

I have an env set up now (I hadn't used rust MinGW target before) and will spend some more time next week doing some comparisons in gdb and also see if I can get some help.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 07 2024 at 01:10):

alexcrichton commented on PR #9688:

Thanks for taking a look! And to be clear this isn't urgent in the sense that the current workaround seems to at least not cause CI to fall over and MinGW isn't a critical platform of ours (e.g. tier 1).


Last updated: Dec 23 2024 at 12:05 UTC