Hi all -- I've been having an adventure learning about structured exception handling and unwind info on Windows, in an attempt to get the fastcall implementation (and Wasmtime generally) working in the new backend. I am starting to tend toward wondering if we might be able to avoid SEH altogether, by design; but would like feedback from others who know better (I rarely touch Win32 stuff):
RtlUnwind
.__builtin_setjmp()
/__builtin_longjmp()
for this. These intrinsics exist in gcc (mingw) and clang, but not CL.exe
(MSVC). It seems we need to be able to build with the latter -- or can we depend on Clang too even for windows-msvc builds?AddVectoredExceptionHandler
and it seems this runs before RtlUnwind
would, but I'm not sure.So basically my thought at the top level is: do we really need SEH? If not, what are folks' thoughts on doing simpler/more explicit/more predictable things (simple longjmp, explicit GC rooting) instead?
(bonus round for another time: the above "let's make it simpler" but for debuginfo generation...)
@Peter Huene it seems you were involved last time we built this, for the old backend ^^
(also fwiw I have an almost working unwind generator in my x64-fastcall-unwind
branch but it's currently failing in an inscrutable way when stack guard failures occur)
@Chris Fallin wouldn't rustc eventually need it? I know my compiler needs full SEH and landingpad before I can consider using cranelift as a backend.
I was under the impression that it's required because sometimes the kernel will do unwinding for some kinds of exceptions, but other than that we do unwind for a number of other purposes like backtrace generation and native debugging. I'm pretty surprised that the gc tests passed without unwind info because they definitely rely on unwinding...
Hmm, I think the GC smoke test was actually passing on my WIP-sorta-unwind branch, now that I look again.
OK, so it seems the consensus here is that we'll need SEH -- no problem, I'll keep debugging my WIP :-) I still remain somewhat unconfortable relying on it for correctness-critical paths. I'll admit I'm influenced here by a JIT worldview: I'm placing relatively more emphasis on security, and assuming we'll build our own runtime mechanisms anyway. If we could make GC independent of libunwind (#2459) that would go a long way. I'm still curious if we can get O(1) unwind by doing our own longjmp across Wasm frames; @Peter Huene any win32 insights on that?
FWIW I think we basically rely on correctness of dwarf unwind information as well b/c of our usage of the backtrace
crate to generate backtraces, if we get that wrong it segfaults at runtime
In my dream future, we generate backtraces from our own very simple metadata (line numbers only) walking our own very simple linked list of frames. But I'll grant that that may be a ways off...
isn't SEH needed in situations where we have: host catching frame -> wasm -> host exception throwing frame? is this something we intend to support?
also general ecosystem stuff, like Alex mentions about backtraces, but also sampling profilers and such
or like attaching random debuggers, and them being able to get a stack trace without runtime help from us
hmm, yeah, I hadn't thought about embedder code e.g. tossing a C++ exception over all wasmtime and wasm frames
agreed that for on-stack GC root identification, we want the thing we've talked about a bunch, but I think we still want to play nice with the existing OS/ecosystem
we sort of do and don't want jit frames to be native-code-unwindable, we don't actually throw rust panics across wasm, we catch the panic, longjmp, then rethrow the panic on the other side
and I don't disagree that our gc stack walking should probably use a custom linked list and such
but every time I talk to someone or read about windows stuff for some reason windows seems to require unwinding to work at all times (I forget why though)
and yeah as Nick mentioned it's really nice to be able to use native debuggers and native tools where we can (but this I don't think is a hard requirement)
I'm sort of curious what other JIT compilers (e.g. SpiderMonkey) do here -- I'll go look in a bit
IIRC last I checked it's registering stuff for windows, although I'm not sure if it does anything for linux -- but I know that breakpad is real important for firefox and it'd be a bummer if you got nothing after JS whenever something crashed
yeah, for sure, the ergonomics of having rich info on crashes are really valuable
anyway I think I need to stew on all of these thoughts more: I'm hoping to keep things simple and the easiest way to do that is to not target N programmable machines (the CPU, the dwarf parser, the SEH unwinder) with matching programs, but maybe the right answer is just to think hard and get it right :-)
I think this is what firefox does -- https://searchfox.org/mozilla-central/source/js/src/jit/ProcessExecutableMemory.cpp#147
which isn't quite the same as us
but I think is roughly the same
hmm, fascinating... so it's just immediately vectoring off to its own exception handler on any unwind
this looks like it's not maintaining seh tables?
the unwindInfo is part of SEH but its "prologue length" is zero so it doesn't have any unwind opcodes
er yeah, just maintaining "trivial" unwind info
which may just do something like "to unwind anything in this region you zorp immediately back to the native code"
it's been awhile since I poked around with this though
right
so the tradeoff there is that a debugger couldn't follow frames but JIT frames are a custom format anyway
that seems reasonable to me
a debugger doesn't know anything about symbols/debuginfo mappings anyway
unless we provide it a ton of extra data which we don't on windows and only do somewhat of sometimes on linux
we could almost get away with a "trivial" SEH record that just basically says "linked list of saved RBPs", except offsets are always off of RSP so we need to carefully translate rsp shifts into unwind opcodes
though now that we're going down this route -- this is more or less what "custom longjmp that doesn't touch RtlUnwind" would give us
if we caught SEH unwinds at the hostcall side ("under" wasm frames), longjmp'd over wasm, and then re-threw (RtlUnwind again) on the other side, we'd preserve embedder exception continuity
anyway, will get actual SEH working for now; all this is an optimization/security-confidence win but not strictly necessary
Alex Crichton said:
but every time I talk to someone or read about windows stuff for some reason windows seems to require unwinding to work at all times (I forget why though)
On Windows kernel code can directly call user code, allowing kernel code to be sandwiched between user code. To make unwinding, signals and cpu exception handling working, the kernel needs to know how to unwind the stack. On Unix unwinding is completely a userspace concepts. Cpu exceptions cause signals and the kernel never calls into user code.
I found out today that Windows puts its exception stuff on the stack, too, instead of allocating.
Under Itanium, throwing an exception typically involves allocating thread local memory to hold the exception, and calling into the EH runtime. The runtime identifies frames with appropriate exception handling actions, and successively resets the register context of the current thread to the most recently active frame with actions to run...
The Windows EH model does not use these successive register context resets. Instead, the active exception is typically described by a frame on the stack. In the case of C++ exceptions, the exception object is allocated in stack memory and its address is passed to __CxxThrowException. General purpose structured exceptions (SEH) are more analogous to Linux signals, and they are dispatched by userspace DLLs provided with Windows.
Last updated: Dec 23 2024 at 12:05 UTC