unwind on Windows · cranelift · Zulip Chat Archive

Hi all -- I've been having an adventure learning about structured exception handling and unwind info on Windows, in an attempt to get the fastcall implementation (and Wasmtime generally) working in the new backend. I am starting to tend toward wondering if we might be able to avoid SEH altogether, by design; but would like feedback from others who know better (I rarely touch Win32 stuff):

By default, it seems we need unwind info if we touch longjmp/setjmp at all -- this seems to have come up in lots of different contexts online (earlier in wasmtime #291; also e.g. Ocaml as a random example, and other compilers/JITs).
That's somewhat surprising, and makes me wonder if we might be able to get away with our own longjmp/setjmp. Wasm frames have no destructors that need to run, and in fact it would be more efficient to skip over them in O(1) than to unwind the linked list via RtlUnwind.
Some sources seem to suggest using __builtin_setjmp()/__builtin_longjmp() for this. These intrinsics exist in gcc (mingw) and clang, but not CL.exe (MSVC). It seems we need to be able to build with the latter -- or can we depend on Clang too even for windows-msvc builds?
In general, metadata generation that is in the "correctness critical path" rather than simply best-effort makes me very nervous; for best security/quality, I want a small, simple compiler core I can reason about, and don't want to have to trust the SEH unwind opcode translation and invariants / subtle connections to the prologue generation.
However, if we have to support SEH, then we have to do it. So: are there other things that force us to need correct unwind info? E.g. trap handling? I see we use AddVectoredExceptionHandler and it seems this runs before RtlUnwind would, but I'm not sure.
On that same note, does our current GC stack-walking rely on this unwind info? (The smoke test seems to pass without it, but...) If it does, we have a simpler possible scheme (#2459) with explicit rooting frames, that would also be faster, and (for correctness/security) would make me much less nervous.

So basically my thought at the top level is: do we really need SEH? If not, what are folks' thoughts on doing simpler/more explicit/more predictable things (simple longjmp, explicit GC rooting) instead?

(bonus round for another time: the above "let's make it simpler" but for debuginfo generation...)

Mingw64 + threads + system exception raised through longjmp() = crash · Issue #7638 · ocaml/ocaml

Original bug ID: 7638 Reporter: @xavierleroy Status: resolved (set by @xavierleroy on 2017-09-28T09:44:41Z) Resolution: fixed Priority: normal Severity: major Platform: Mingw64 OS: Windows 64 OS Ve...

Chris Fallin (Mar 05 2021 at 08:42):

@Peter Huene it seems you were involved last time we built this, for the old backend ^^

Chris Fallin (Mar 05 2021 at 08:45):

(also fwiw I have an almost working unwind generator in my x64-fastcall-unwind branch but it's currently failing in an inscrutable way when stack guard failures occur)

Carlo Kok (Mar 05 2021 at 08:59):

@Chris Fallin wouldn't rustc eventually need it? I know my compiler needs full SEH and landingpad before I can consider using cranelift as a backend.

Alex Crichton (Mar 05 2021 at 15:00):

I was under the impression that it's required because sometimes the kernel will do unwinding for some kinds of exceptions, but other than that we do unwind for a number of other purposes like backtrace generation and native debugging. I'm pretty surprised that the gc tests passed without unwind info because they definitely rely on unwinding...

Chris Fallin (Mar 05 2021 at 17:24):

Hmm, I think the GC smoke test was actually passing on my WIP-sorta-unwind branch, now that I look again.

OK, so it seems the consensus here is that we'll need SEH -- no problem, I'll keep debugging my WIP :-) I still remain somewhat unconfortable relying on it for correctness-critical paths. I'll admit I'm influenced here by a JIT worldview: I'm placing relatively more emphasis on security, and assuming we'll build our own runtime mechanisms anyway. If we could make GC independent of libunwind (#2459) that would go a long way. I'm still curious if we can get O(1) unwind by doing our own longjmp across Wasm frames; @Peter Huene any win32 insights on that?

Alex Crichton (Mar 05 2021 at 17:26):

FWIW I think we basically rely on correctness of dwarf unwind information as well b/c of our usage of the backtrace crate to generate backtraces, if we get that wrong it segfaults at runtime

Chris Fallin (Mar 05 2021 at 17:28):

In my dream future, we generate backtraces from our own very simple metadata (line numbers only) walking our own very simple linked list of frames. But I'll grant that that may be a ways off...

fitzgen (he/him) (Mar 05 2021 at 17:29):

isn't SEH needed in situations where we have: host catching frame -> wasm -> host exception throwing frame? is this something we intend to support?

also general ecosystem stuff, like Alex mentions about backtraces, but also sampling profilers and such

fitzgen (he/him) (Mar 05 2021 at 17:29):

or like attaching random debuggers, and them being able to get a stack trace without runtime help from us

Chris Fallin (Mar 05 2021 at 17:30):

hmm, yeah, I hadn't thought about embedder code e.g. tossing a C++ exception over all wasmtime and wasm frames

fitzgen (he/him) (Mar 05 2021 at 17:30):

agreed that for on-stack GC root identification, we want the thing we've talked about a bunch, but I think we still want to play nice with the existing OS/ecosystem

Alex Crichton (Mar 05 2021 at 17:31):

we sort of do and don't want jit frames to be native-code-unwindable, we don't actually throw rust panics across wasm, we catch the panic, longjmp, then rethrow the panic on the other side

Alex Crichton (Mar 05 2021 at 17:31):

and I don't disagree that our gc stack walking should probably use a custom linked list and such

Alex Crichton (Mar 05 2021 at 17:31):

but every time I talk to someone or read about windows stuff for some reason windows seems to require unwinding to work at all times (I forget why though)

Alex Crichton (Mar 05 2021 at 17:32):

and yeah as Nick mentioned it's really nice to be able to use native debuggers and native tools where we can (but this I don't think is a hard requirement)

Chris Fallin (Mar 05 2021 at 17:32):

I'm sort of curious what other JIT compilers (e.g. SpiderMonkey) do here -- I'll go look in a bit

Alex Crichton (Mar 05 2021 at 17:33):

IIRC last I checked it's registering stuff for windows, although I'm not sure if it does anything for linux -- but I know that breakpad is real important for firefox and it'd be a bummer if you got nothing after JS whenever something crashed

Chris Fallin (Mar 05 2021 at 17:34):

yeah, for sure, the ergonomics of having rich info on crashes are really valuable

Chris Fallin (Mar 05 2021 at 17:35):

anyway I think I need to stew on all of these thoughts more: I'm hoping to keep things simple and the easiest way to do that is to not target N programmable machines (the CPU, the dwarf parser, the SEH unwinder) with matching programs, but maybe the right answer is just to think hard and get it right :-)

Alex Crichton (Mar 05 2021 at 17:37):

I think this is what firefox does -- https://searchfox.org/mozilla-central/source/js/src/jit/ProcessExecutableMemory.cpp#147

Alex Crichton (Mar 05 2021 at 17:37):

which isn't quite the same as us

Alex Crichton (Mar 05 2021 at 17:37):

but I think is roughly the same

Chris Fallin (Mar 05 2021 at 17:38):

hmm, fascinating... so it's just immediately vectoring off to its own exception handler on any unwind

Alex Crichton (Mar 05 2021 at 17:38):

this looks like it's not maintaining seh tables?

Chris Fallin (Mar 05 2021 at 17:39):

the unwindInfo is part of SEH but its "prologue length" is zero so it doesn't have any unwind opcodes

Alex Crichton (Mar 05 2021 at 17:39):

er yeah, just maintaining "trivial" unwind info

Alex Crichton (Mar 05 2021 at 17:40):

which may just do something like "to unwind anything in this region you zorp immediately back to the native code"

Alex Crichton (Mar 05 2021 at 17:40):

it's been awhile since I poked around with this though

Chris Fallin (Mar 05 2021 at 17:40):

right

Chris Fallin (Mar 05 2021 at 17:41):

so the tradeoff there is that a debugger couldn't follow frames but JIT frames are a custom format anyway

Alex Crichton (Mar 05 2021 at 17:41):

that seems reasonable to me

Alex Crichton (Mar 05 2021 at 17:41):

a debugger doesn't know anything about symbols/debuginfo mappings anyway

Alex Crichton (Mar 05 2021 at 17:42):

unless we provide it a ton of extra data which we don't on windows and only do somewhat of sometimes on linux

Chris Fallin (Mar 05 2021 at 17:42):

we could almost get away with a "trivial" SEH record that just basically says "linked list of saved RBPs", except offsets are always off of RSP so we need to carefully translate rsp shifts into unwind opcodes

Chris Fallin (Mar 05 2021 at 17:43):

though now that we're going down this route -- this is more or less what "custom longjmp that doesn't touch RtlUnwind" would give us

Chris Fallin (Mar 05 2021 at 17:44):

if we caught SEH unwinds at the hostcall side ("under" wasm frames), longjmp'd over wasm, and then re-threw (RtlUnwind again) on the other side, we'd preserve embedder exception continuity

Chris Fallin (Mar 05 2021 at 17:46):

anyway, will get actual SEH working for now; all this is an optimization/security-confidence win but not strictly necessary

bjorn3 (Mar 05 2021 at 17:54):

Alex Crichton said:

but every time I talk to someone or read about windows stuff for some reason windows seems to require unwinding to work at all times (I forget why though)

On Windows kernel code can directly call user code, allowing kernel code to be sandwiched between user code. To make unwinding, signals and cpu exception handling working, the kernel needs to know how to unwind the stack. On Unix unwinding is completely a userspace concepts. Cpu exceptions cause signals and the kernel never calls into user code.

Jubilee (Apr 27 2021 at 01:53):

I found out today that Windows puts its exception stuff on the stack, too, instead of allocating.

Under Itanium, throwing an exception typically involves allocating thread local memory to hold the exception, and calling into the EH runtime. The runtime identifies frames with appropriate exception handling actions, and successively resets the register context of the current thread to the most recently active frame with actions to run...

The Windows EH model does not use these successive register context resets. Instead, the active exception is typically described by a frame on the stack. In the case of C++ exceptions, the exception object is allocated in stack memory and its address is passed to __CxxThrowException. General purpose structured exceptions (SEH) are more analogous to Linux signals, and they are dispatched by userspace DLLs provided with Windows.

Last updated: Apr 07 2025 at 18:04 UTC

Stream: cranelift

Topic: unwind on Windows

Chris Fallin (Mar 05 2021 at 08:42):

Chris Fallin (Mar 05 2021 at 08:42):

Chris Fallin (Mar 05 2021 at 08:45):

Carlo Kok (Mar 05 2021 at 08:59):

Alex Crichton (Mar 05 2021 at 15:00):

Chris Fallin (Mar 05 2021 at 17:24):

Alex Crichton (Mar 05 2021 at 17:26):

Chris Fallin (Mar 05 2021 at 17:28):

fitzgen (he/him) (Mar 05 2021 at 17:29):

fitzgen (he/him) (Mar 05 2021 at 17:29):

Chris Fallin (Mar 05 2021 at 17:30):

fitzgen (he/him) (Mar 05 2021 at 17:30):

Alex Crichton (Mar 05 2021 at 17:31):

Alex Crichton (Mar 05 2021 at 17:31):

Alex Crichton (Mar 05 2021 at 17:31):

Alex Crichton (Mar 05 2021 at 17:32):

Chris Fallin (Mar 05 2021 at 17:32):

Alex Crichton (Mar 05 2021 at 17:33):

Chris Fallin (Mar 05 2021 at 17:34):

Chris Fallin (Mar 05 2021 at 17:35):

Alex Crichton (Mar 05 2021 at 17:37):

Alex Crichton (Mar 05 2021 at 17:37):

Alex Crichton (Mar 05 2021 at 17:37):

Chris Fallin (Mar 05 2021 at 17:38):

Alex Crichton (Mar 05 2021 at 17:38):

Chris Fallin (Mar 05 2021 at 17:39):

Alex Crichton (Mar 05 2021 at 17:39):

Alex Crichton (Mar 05 2021 at 17:40):

Alex Crichton (Mar 05 2021 at 17:40):

Chris Fallin (Mar 05 2021 at 17:40):

Chris Fallin (Mar 05 2021 at 17:41):

Alex Crichton (Mar 05 2021 at 17:41):

Alex Crichton (Mar 05 2021 at 17:41):

Alex Crichton (Mar 05 2021 at 17:42):

Chris Fallin (Mar 05 2021 at 17:42):

Chris Fallin (Mar 05 2021 at 17:43):

Chris Fallin (Mar 05 2021 at 17:44):

Chris Fallin (Mar 05 2021 at 17:46):

bjorn3 (Mar 05 2021 at 17:54):

Jubilee (Apr 27 2021 at 01:53):