Stream: git-wasmtime

Topic: wasmtime / Issue #2678 x86-64 Windows fastcall ABI support.


view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2021 at 09:06):

bjorn3 commented on Issue #2678:

I got a panic at cranelift/codegen/src/isa/x64/abi.rs:508 when trying this branch out. There is an unimplemented!() there under the assumption that no XMM registers are callee-save, which is an incorrect assumption for WindowsFastcall.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2021 at 18:57):

cfallin commented on Issue #2678:

Ah, I had missed that -- fixed now and test added.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2021 at 20:07):

bjorn3 commented on Issue #2678:

The sysroot compiled successfully. The only missing things for full Windows support in cg_clif are TLS support and a few small things on the cg_clif side.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 27 2021 at 02:58):

cfallin commented on Issue #2678:

One thing I didn't see (and maybe I missed) is how "args above four" are pushed in a right-to-left fashion to the stack

I went back and looked to be sure -- this is somewhat confusing phrasing, but "pushed right-to-left" is functionally the same as "args in order are at ascending stack addresses", which is what this PR does. This quick experiment on Compiler Explorer shows arg 5 is at [rsp+40] and arg 6 is at [rsp+48] (40 because return address plus 32-byte shadow space). So I think we're OK here, and e.g. %f6 in the tests verifies this order (returns first of two on-stack args).

[CI test]

I'll look into adding this! We may need to temporarily disable some tests to make it work, since unwind info for Windows is not implemented yet.

An interesting question is: how do we actually validate that we match what other compilers provide and expect? Even for System V, we don't (I think) have an ABI test that explicitly links Cranelift-produced code with other native code. I suppose we implicitly get that when we run tests that make calls into the Wasmtime runtime; perhaps we should add a few "test weird ABI corners" VM calls just for this purpose.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 03 2021 at 06:51):

cfallin commented on Issue #2678:

An update on this: I've been pulling on a thread the last few days in order to implement fully correct "non-Rust" i128 support and I'm starting to wonder if this is the right path.

This has sort of turned into a refactor of the entire ABI machinery in order to support by-ref values and cross-XMM/integer movement, and it is maddeningly complex. Basically, because we don't have a legalization-based approach wherein we can "reduce to a previously solved problem", we end up handling an "arg within an arg" (the by-ref pointer) which contorts the code quite a bit. By-ref support adds another stack area on the caller side whose layout we have to compute, and tweaks a bunch of the ABI traits because we need some temp registers in various places. It's asymmetric wrt the return case: fastcall passes 128-bit return values, even integers (__m128), in XMM registers; so our two-I64-reg integer handling that is consistent throughout the rest of the backend needs special-cased split/concat code for return values only.

To recap, this is necessary only for extern "C" calls from cg_clif with i128 values. Everything with cg_clif + Rust ABI works correctly; this is only for the hypothetical case that one links with a C library and passes 128-bit integers to it. I'm currently at a 1K-line diff and it's ~50% done.

This is the complexity I was worried about when I had suggested "just panicing on i128 in the non-Rust case" above; @bjorn3, your pushback then was that this is not Fastcall-compliant, and indeed that is true! However, I am wondering how much this corner case really matters at this point in time, versus the disproportionate cost of building it (and the opportunity cost of not working on other things); and perhaps if it is very important later, we can talk about properly allocating time for implementing this.

Thoughts?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 03 2021 at 10:49):

bjorn3 commented on Issue #2678:

As far as I know the following pass modes are used for SystemV and Fastcall with cg_llvm:

what sysv pass mode sysv abi fastcall pass mode fastcall abi
argument
i128 Rust abi cast -> i128 two integer registers cast -> i128 two integer registers
i128 C abi cast -> i128 two integer registers indirect pass pointer in register
__m128 Rust abi indirect pass pointer in register indirect pass pointer in register
__m128 C abi vector i64x2 xmm register indirect pass pointer in register
128bit struct Rust abi cast -> i128 pass in two registers cast -> i128 pass in two registers
128bit struct C abi cast -> i128 pass in two registers indirect pass pointer in register
return value
i128 Rust abi cast -> i128 two integer registers cast -> i128 two integer registers
i128 C abi cast -> i128 two integer registers indirect pass return pointer in register
__m128 Rust abi indirect pass return pointer in register indirect pass return pointer in register
__m128 C abi vector i64x2 xmm register vector i64x2 xmm register
128bit struct Rust abi cast -> i128 pass in two registers cast -> i128 pass in two registers
128bit struct C abi cast -> i128 pass in two registers indirect pass return pointer in register

The pass mode here is the PassMode computed by rustc. When cast -> i128 is used, the value is always passed in two integer registers, even when the calling convention theoretically doesn't support it. If the calling convention doesn't support it, for the C abi, rustc will use an indirect pass mode.

I think Cranelift should follow LLVM and always pass i128 in two integer registers. If it needs to be passed as pointer or on the stack, the abi adjustement code of the user of Cranelift can do this. It is impossible to fully handle the abi in Cranelift anyway as it doesn't support structs. Cranelift should have the building blocks based on which the calling convention can be followed.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 03 2021 at 18:40):

cfallin commented on Issue #2678:

Thank you for the detailed comparison table, @bjorn3!

So, yes, it seems we are in agreement: the by-ref argument passing can exist at a layer above the CLIF as finally seen by the codegen backend. This seems to be a better factorization of complexity.

Given this plan, I'll fix up the PR to address all of the other issues, then document that the i128-in-two-registers scheme is an LLVM/rustc-inspired extension. The open question is whether we want to do this by default (when the user just asks for WindowsFastcall) or only when explicitly indicated by a variant ABI (WindowsFastcallRust or WindowsFastcallLLVM for example) or by an ISA flag (fastcall_i128_in_integer_reg_pair for example). I think I lean toward the latter, to minimize confusion later -- it forces the Cranelift user to think about and understand these subtleties of Fastcall -- and we can link to this PR discussion as a fairly detailed exploration of the space :-)

view this post on Zulip Wasmtime GitHub notifications bot (Mar 03 2021 at 19:56):

bjorn3 commented on Issue #2678:

Maybe it could be called enable_llvm_abi_extensions?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 05:57):

cfallin commented on Issue #2678:

In the last few commits I fell down a bit of a CI rabbithole but now this PR also allows the Cranelift backend variant to be set via environment variable; I couldn't get a separate action/runner to work for Windows CI so this lets us add it to the main testing matrix instead. Let's see if this works...

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 06:07):

cfallin commented on Issue #2678:

Hmm, actually, scratch that -- that won't work because our filetests need to be aware of the dynamic choice of backend as well (to enable the right ones).

This seems to be another case where a simple request ("run it on Windows CI") has turned into a deep rabbithole of maddening complexity. Given that we are planning to switch the default Wasmtime backend soon (at which point all of our tests will provide coverage for this), given the increasingly uncomfortable maze of features and environment-variable switches, and given that filetests are covering fastcall with golden-output matching for now, I would strongly prefer not to dive further down this rabbithole, instead just waiting until we switch over to have actual run-test coverage -- @abrown, what do you think?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 17:39):

cfallin commented on Issue #2678:

Updated again after consulting with alexcrichton and abrown on Zulip -- runs-on: windows and runs-on: windows-latest are entirely different things apparently :-) Let's see if this works better...

@sunfishcode, see the wasi-related test exclusion in the latest commit -- any thoughts on that? I was seeing failures of some symlink-related WASI tests on my local Windows machine (MinGW64 / MSYS2, Win10, possible complication: space in username/path (?)) but the shell script passes cleanly when excluding these.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 04 2021 at 23:49):

cfallin commented on Issue #2678:

The segfault in CI is odd -- I can reproduce something similar locally but it seems to be a nondeterministic failure; it is always in __report_gsfailure in the Visual Studio runtime library, but I'm not yet able to trace how we're getting there (there is no rr on Windows sadly) -- though execution is through longjmp and for some very odd reason the disassembly also shows a call to __telemetry_main_return_trigger(), which appears to be telemetry (!) that MSVC inserts via the standard library. It's entirely possible that all of this is a red herring and the real issue is an ABI mismatch somewhere else; will continue to dig.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 05 2021 at 02:58):

cfallin commented on Issue #2678:

More info: the segfault happens whenever a library call raises a trap (e.g. try running the table_fill Wasm spec tests); it results in the unit-test process exiting via abort in __report_gsfailure(). This is a failure of the "GS" stack-check mechanism in MSVC, which stores and checks stack cookies. Unfortunately it does not reproduce under gdb consistently (also MinGW gdb does not load symbols from the .exe, and MinGW LLDB hangs, so I'm debugging somewhat blind) -- I can get it to appear or not appear by causing the frame that calls setjmp to be a tail cail or not (with a debugging println). If not tail call, then failure still happens except under GDB.

I'm trying to learn more about the GS stack-cookie mechanism; it doesn't seem that it would have any sort of action-at-a-distance (check other stack frames, etc) behavior, and the frames on either side of the setjmp/longjmp are in rustc-compiled native code. It's possible that the fastcall impl is going way out of bounds, of course. Does anyone know more about this mechanism (@sunfishcode or @peterhuene maybe)?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 05 2021 at 03:07):

cfallin commented on Issue #2678:

Hmm, actually, things go awry after an rdsspq instruction and mismatch in __report_gsfailure; this appears to be Intel CET (Control-flow Enforcement Technology?) which is a shadow stack thing. I'm wondering if we need to do something special in our generated code to keep this mechanism happy...

view this post on Zulip Wasmtime GitHub notifications bot (Mar 05 2021 at 03:29):

cfallin commented on Issue #2678:

I think I've worked out what's going on:

So our possible avenues forward seem to be:

If the third option is not possible, I'd favor #1, so this PR doesn't grow unboundedly; we can then include Windows CI when we add unwind (next) and get coverage of everything at once. Thoughts?

view this post on Zulip Wasmtime GitHub notifications bot (Mar 05 2021 at 18:03):

cfallin commented on Issue #2678:

Sounds good -- thanks for the review efforts!


Last updated: Jan 24 2025 at 00:11 UTC