Stream: git-wasmtime

Topic: wasmtime / PR #9689 Port wasmtime-fiber to `no_std` and a...


view this post on Zulip Wasmtime GitHub notifications bot (Nov 27 2024 at 02:02):

cfallin opened PR #9689 from cfallin:async-nostd to bytecodealliance:main:

… Wasmtime.

This PR allows a no_std Wasmtime build to be configured with the async feature. (Previously, a minimal no_std configuration could only run with sync entry points, without suspending of stacks.)

The main hurdle to this support was the wasmtime-fiber crate. Fortunately, the "unix" variant of fibers was almost entirely portable to a no_std environment, owing to the fact that it implements stack-switching manually in assembly itself. I moved the per-ISA implementations to a shared submodule and built the nostd platform backend for wasmtime-fiber with a stripped-down version of the unix backend.

The nostd backend does not support mmap'd stacks, does not support custom stack allocators, and does not propagate panics.

I've updated the min-platform example to ensure this builds but have not yet actually tested it (I am in the middle of a larger porting effort); hence, a draft PR for initial feedback.

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Nov 27 2024 at 02:02):

cfallin commented on PR #9689:

cc @alexcrichton -- I think you're on vacation now but I'd be very curious what you think of this in general!

view this post on Zulip Wasmtime GitHub notifications bot (Nov 27 2024 at 02:02):

cfallin edited PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 27 2024 at 02:03):

cfallin edited PR #9689:

This PR allows a no_std Wasmtime build to be configured with the async feature. (Previously, a minimal no_std configuration could only run with sync entry points, without suspending of stacks.)

The main hurdle to this support was the wasmtime-fiber crate. Fortunately, the "unix" variant of fibers was almost entirely portable to a no_std environment, owing to the fact that it implements stack-switching manually in assembly itself. I moved the per-ISA implementations to a shared submodule and built the nostd platform backend for wasmtime-fiber with a stripped-down version of the unix backend.

The nostd backend does not support mmap'd stacks, does not support custom stack allocators, and does not propagate panics.

I've updated the min-platform example to ensure this builds but have not yet actually tested it (I am in the middle of a larger porting effort); hence, a draft PR for initial feedback.

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Nov 27 2024 at 02:09):

cfallin updated PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 27 2024 at 10:32):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 27 2024 at 10:32):

bjorn3 created PR review comment:

If you do unwind when the std feature is disabled, are you guaranteed to hit an extern "C" boundary to abort unwinding? Or will you get arbitrary corruption?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 27 2024 at 16:29):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 27 2024 at 16:29):

cfallin created PR review comment:

Excellent question! (@alexcrichton when you're around it'd be great to get your canonical answer on this)

Currently in my porting context I have a #[panic_handler] that will abort, so I haven't thought through this in much detail. One option is certainly to say that panic unwinding is not supported in no_std builds with async, and the build must provide a custom aborting panic handler or specify panic=abort. However if we wanted to support propagation of unwinds I think we might be able to do something custom here (the catch_unwind/resume_unwind functions in std::panic don't exist in core::panic but one could roll one's own unwinding). I wonder if there's a way to make this safe (build only when panic=abort for example) in the meantime?

view this post on Zulip Wasmtime GitHub notifications bot (Nov 27 2024 at 17:04):

cfallin updated PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 27 2024 at 17:06):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 27 2024 at 17:06):

cfallin created PR review comment:

@sunfishcode helpfully pointed out that #[cfg(panic = "unwind")] and #[cfg(panic = "abort")] work, so I've added a compile_error!() to the fiber library when built with panic=unwind without std. So this is at least now safe!

view this post on Zulip Wasmtime GitHub notifications bot (Nov 27 2024 at 17:15):

cfallin updated PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 27 2024 at 17:17):

cfallin updated PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Nov 27 2024 at 17:52):

cfallin updated PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 02 2024 at 15:53):

alexcrichton submitted PR review:

This all looks quite reasonable to me, happy to see this merged :+1:.

Could you also add a cargo test -p wasmtime-fiber --no-default-features in CI? I generally try to make sure that if there's nontrivial code added with no_std/etc (as is the case for the nostd.rs file here) that we have some at least smoke testing of what's going on. I think that this should be testable when the std feature is disabled? (might have to conditionally enable/disable some panicking tests, and it's ok to use std in tests just want to be sure to test nostd.rs somehow)

view this post on Zulip Wasmtime GitHub notifications bot (Dec 02 2024 at 15:53):

alexcrichton created PR review comment:

Was this necessary because it was otherwise broken? Or just keeping the same behavior tested?

If broken, could wasmtime enable the std feature as part of the async feature for now? (until that's fixed)

Otherwise if just keeping the same behavior mind also adding -p wasmtime-fiber --features std and -p wasmtime-fiber to this matrix?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 02 2024 at 15:53):

alexcrichton created PR review comment:

It's ok to drop this if it's empty, and due to this being an internal crate I think it's reasonable to not have default features here anyway.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 02 2024 at 15:53):

alexcrichton created PR review comment:

I think it's ok to leave this out for now and mostly just ensure this is cargo check'd somewhere in CI (to avoid someone copy/pasting this Cargo.toml example and forgetting to turn this back off)

view this post on Zulip Wasmtime GitHub notifications bot (Dec 02 2024 at 15:53):

alexcrichton created PR review comment:

I might recommend wasmtime-fiber?/std (note the ?) to only enable the wasmtime-fiber dependency if it's already activated via some other means.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 02 2024 at 15:53):

alexcrichton created PR review comment:

Mind scoping this import to just where it's needed below? (avoid the duplicate #[cfg])

view this post on Zulip Wasmtime GitHub notifications bot (Dec 02 2024 at 15:53):

alexcrichton created PR review comment:

A nice trick to avoid core::result::Result here is to, in each module, change:

type Result<T> = some::Result<T>;

to

type Result<T, E = some::Error> = core::result::Result<T, E>;

That way you can use any Result type-alias as the actual real Result without worry

view this post on Zulip Wasmtime GitHub notifications bot (Dec 02 2024 at 15:53):

alexcrichton created PR review comment:

Could this be avoided by having a #[cfg] for std/unwind on the Panicked variant? That way we can statically provide that it's never constructed

view this post on Zulip Wasmtime GitHub notifications bot (Dec 02 2024 at 15:53):

alexcrichton created PR review comment:

Personally I think it's reasonable to skip the compile_error! here and just let this happen as normal. With panic = "abort" nothing should be unsafe, it just means the extra unwind bits here aren't too useful (and they'll get optimized out)

view this post on Zulip Wasmtime GitHub notifications bot (Dec 02 2024 at 15:54):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 02 2024 at 15:54):

alexcrichton created PR review comment:

I commented below as well, but I think it's fine to rely on unconditional aborts via extern "C" boundary in the panic=unwind std=false mode and skip the compile_error!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 03 2024 at 10:17):

bjorn3 submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 03 2024 at 10:17):

bjorn3 created PR review comment:

This error only happens when using panic = "unwind", which is unsound if not using catch_unwind.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 03 2024 at 16:50):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 03 2024 at 16:50):

alexcrichton created PR review comment:

Could you clarify the unsoundness? Looking at this disassembly it looks like even in no_std mode unwinds are caught and abort correctly

view this post on Zulip Wasmtime GitHub notifications bot (Dec 03 2024 at 20:43):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 03 2024 at 20:43):

cfallin created PR review comment:

@alexcrichton just so I understand: that example uses the C-unwind ABI; currently (and in this PR) the stackswitching bits in assembly are called with the C ABI. In a no_std world with panic=unwind and a custom unwinder (not my use-case, but what this error statically disables for now) I think we could otherwise get an unwind going off the top of a fiber stack. Or it's possible I'm missing something? I don't have all the compiler configuration options here fully internalized.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 03 2024 at 20:57):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 03 2024 at 20:57):

alexcrichton created PR review comment:

Ah yeah sorry that's just "faking" a non-inlined call to something that can theoretically unwind. You don't actually need C-unwind but with panic=unwind all normal rust functions can possibly unwind so a catch-and-panic should be inserted for all calls. Basically I'm still not sure how #![no_std] + -Cpanic=unwind is unsound myself

view this post on Zulip Wasmtime GitHub notifications bot (Dec 03 2024 at 23:22):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 03 2024 at 23:22):

cfallin created PR review comment:

I see: my confusion is the dual to your confusion I think, that is, I'm still not sure how #![no_std] + -Cpanic=unwind is sound :-)

Restating mostly for my own benefit, but to verify we're on the same page: so with the current state of main, we wrap a call to the entrypoint on the new stack in a call to catch_unwind (e.g. here). If the fiber's body panics while on the fiber stack, system libunwind reaches this point, the panic is reified as a RunResult, and that RunResult is passed back to the caller on the original stack and rethrown with resume_unwind.

With this PR, in no_std, we don't have catch_unwind/resume_unwind. So a panic happens in a fiber's body; say that there is a custom unwinder (there must be I think, for one to be doing unwinding at all in no_std?); this unwinder iterates up the stack frames on the fiber stack. It reaches the top, there's no callsite in a catch_unwind; this is an unwind error / UB / something bad. Or is this prevented some other way?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 03 2024 at 23:43):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 03 2024 at 23:43):

alexcrichton created PR review comment:

Oh sorry! Good point getting on the same page (can also talk more during the Cranelift meeting tomorrow if helpful)

I think the main thing to add is that extern "C" fn foo() { panic!() } unconditionally, and soundly, aborts the process. That happens irrespective of -Cunwind=... settings. Starting in Rust 1.81 all extern "C" functions have an implicit catch-and-abort. That means that even if we were to remove the catch_unwind happening in wasmtime-fiber the crate would still be sound, just not too helpful on panics. The catch_unwind is mostly still there to assist with debugging panics by shepherding the panic across the boundary. That's always optional though since it's sound to abort the process at any time (just better to not do so if possible).

That to me is why it's currently sound if we skip the catch_unwind. A bit of a historical note though is that Rust 1.81 is very recent and prior versions of rustc didn't catch-and-abort at extern "C" boundaries. That means that historically catch_unwind was indeed required for soundness but that's no longer the case.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 03 2024 at 23:50):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 03 2024 at 23:50):

cfallin created PR review comment:

Ah! OK, that's the detail I had missed; thanks for your patience here. I guess since our MSRV is now 1.81, we can rely on this behavior unconditionally. I'll remove the compile-error and add a comment detailing this.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:22):

cfallin updated PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:22):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:22):

cfallin created PR review comment:

Done!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:22):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:22):

cfallin created PR review comment:

Done.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:22):

cfallin created PR review comment:

Done.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:22):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:23):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:23):

cfallin created PR review comment:

Done!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:23):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:23):

cfallin created PR review comment:

I think I had added it at some point while debugging failures but indeed it should work with just async, no std -- so, restored that, added std separately, and added the wasmtime-fiber variants.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:23):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:23):

cfallin created PR review comment:

Done.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:23):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:23):

cfallin created PR review comment:

Done -- refactored a bit in Suspend::execute() to statically avoid constructing the panic case as well.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:25):

cfallin updated PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:26):

cfallin has marked PR #9689 as ready for review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:26):

cfallin requested alexcrichton for a review on PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:26):

cfallin requested wasmtime-default-reviewers for a review on PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:26):

cfallin requested wasmtime-core-reviewers for a review on PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 00:26):

cfallin commented on PR #9689:

Updated and out of draft mode now; thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 01:15):

alexcrichton submitted PR review:

Is it possible to add a `cargo test -p wasmtime-fiber --no-default-features to CI as well?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 05:35):

cfallin updated PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 05:46):

cfallin updated PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 05:50):

cfallin updated PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 05:50):

cfallin commented on PR #9689:

Sure thing, done!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 05:52):

cfallin has enabled auto merge for PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 06:01):

cfallin updated PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 06:02):

cfallin has enabled auto merge for PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 06:53):

cfallin commented on PR #9689:

The CI failures are fairly perplexing -- the only failing job I see is "report failure on cancelation", and a bunch of other jobs are canceled midway through their runs. @alexcrichton any ideas?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 15:36):

alexcrichton commented on PR #9689:

This happened on another PR the other night and I couldn't figure out what was happening. I then re-queued in the morning and it merged so I assumed it was a transient failure in CI infrastructure... Let's see if that still holds true?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 15:37):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 15:37):

alexcrichton created PR review comment:

Oh when adding a new job like this you'll need to also add a "join" to the ci-status below so it's handled for failure-vs-not.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 17:29):

alexcrichton updated PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 17:29):

alexcrichton commented on PR #9689:

I've pushed a commit to run full CI on this PR where the PR run can't cancel itself, so let's see what happens...

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 17:34):

alexcrichton commented on PR #9689:

Bleh ok so this is a known issue (at least to me) on github actions. For the merge queue there's no way to cancel a failed job so our main.yml self-cancels whenever a job fails. This works fine on all platforms except Windows. On Windows self-cancelling sometimes gives you a big red X on the job and sometimes gives you the icon of "this was cancelled by someone else". Basically there seems to be a race of the self-cancel and the job actually failing.

In any case it's the Windows build. Just some minor things to update.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 18:42):

cfallin updated PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 18:43):

cfallin updated PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 18:43):

cfallin created PR review comment:

Done!

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 18:43):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 18:43):

cfallin commented on PR #9689:

Ah, that was entirely unintuitive, GitHub CI user-interface! Thanks for working that out.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 18:51):

cfallin updated PR #9689.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 04 2024 at 19:48):

cfallin merged PR #9689.


Last updated: Dec 23 2024 at 12:05 UTC