cfallin opened PR #9689 from cfallin:async-nostd
to bytecodealliance:main
:
… Wasmtime.
This PR allows a
no_std
Wasmtime build to be configured with theasync
feature. (Previously, a minimalno_std
configuration could only run with sync entry points, without suspending of stacks.)The main hurdle to this support was the
wasmtime-fiber
crate. Fortunately, the "unix" variant of fibers was almost entirely portable to ano_std
environment, owing to the fact that it implements stack-switching manually in assembly itself. I moved the per-ISA implementations to a shared submodule and built the nostd platform backend forwasmtime-fiber
with a stripped-down version of the unix backend.The nostd backend does not support mmap'd stacks, does not support custom stack allocators, and does not propagate panics.
I've updated the
min-platform
example to ensure this builds but have not yet actually tested it (I am in the middle of a larger porting effort); hence, a draft PR for initial feedback.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
cfallin commented on PR #9689:
cc @alexcrichton -- I think you're on vacation now but I'd be very curious what you think of this in general!
cfallin edited PR #9689.
cfallin edited PR #9689:
This PR allows a
no_std
Wasmtime build to be configured with theasync
feature. (Previously, a minimalno_std
configuration could only run with sync entry points, without suspending of stacks.)The main hurdle to this support was the
wasmtime-fiber
crate. Fortunately, the "unix" variant of fibers was almost entirely portable to ano_std
environment, owing to the fact that it implements stack-switching manually in assembly itself. I moved the per-ISA implementations to a shared submodule and built the nostd platform backend forwasmtime-fiber
with a stripped-down version of the unix backend.The nostd backend does not support mmap'd stacks, does not support custom stack allocators, and does not propagate panics.
I've updated the
min-platform
example to ensure this builds but have not yet actually tested it (I am in the middle of a larger porting effort); hence, a draft PR for initial feedback.<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
cfallin updated PR #9689.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
If you do unwind when the std feature is disabled, are you guaranteed to hit an
extern "C"
boundary to abort unwinding? Or will you get arbitrary corruption?
cfallin submitted PR review.
cfallin created PR review comment:
Excellent question! (@alexcrichton when you're around it'd be great to get your canonical answer on this)
Currently in my porting context I have a
#[panic_handler]
that will abort, so I haven't thought through this in much detail. One option is certainly to say that panic unwinding is not supported inno_std
builds withasync
, and the build must provide a custom aborting panic handler or specifypanic=abort
. However if we wanted to support propagation of unwinds I think we might be able to do something custom here (thecatch_unwind
/resume_unwind
functions instd::panic
don't exist incore::panic
but one could roll one's own unwinding). I wonder if there's a way to make this safe (build only whenpanic=abort
for example) in the meantime?
cfallin updated PR #9689.
cfallin submitted PR review.
cfallin created PR review comment:
@sunfishcode helpfully pointed out that
#[cfg(panic = "unwind")]
and#[cfg(panic = "abort")]
work, so I've added acompile_error!()
to the fiber library when built withpanic=unwind
withoutstd
. So this is at least now safe!
cfallin updated PR #9689.
cfallin updated PR #9689.
cfallin updated PR #9689.
alexcrichton submitted PR review:
This all looks quite reasonable to me, happy to see this merged :+1:.
Could you also add a
cargo test -p wasmtime-fiber --no-default-features
in CI? I generally try to make sure that if there's nontrivial code added with no_std/etc (as is the case for thenostd.rs
file here) that we have some at least smoke testing of what's going on. I think that this should be testable when thestd
feature is disabled? (might have to conditionally enable/disable some panicking tests, and it's ok to usestd
in tests just want to be sure to testnostd.rs
somehow)
alexcrichton created PR review comment:
Was this necessary because it was otherwise broken? Or just keeping the same behavior tested?
If broken, could
wasmtime
enable thestd
feature as part of the async feature for now? (until that's fixed)Otherwise if just keeping the same behavior mind also adding
-p wasmtime-fiber --features std
and-p wasmtime-fiber
to this matrix?
alexcrichton created PR review comment:
It's ok to drop this if it's empty, and due to this being an internal crate I think it's reasonable to not have default features here anyway.
alexcrichton created PR review comment:
I think it's ok to leave this out for now and mostly just ensure this is
cargo check
'd somewhere in CI (to avoid someone copy/pasting thisCargo.toml
example and forgetting to turn this back off)
alexcrichton created PR review comment:
I might recommend
wasmtime-fiber?/std
(note the?
) to only enable thewasmtime-fiber
dependency if it's already activated via some other means.
alexcrichton created PR review comment:
Mind scoping this import to just where it's needed below? (avoid the duplicate
#[cfg]
)
alexcrichton created PR review comment:
A nice trick to avoid
core::result::Result
here is to, in each module, change:type Result<T> = some::Result<T>;
to
type Result<T, E = some::Error> = core::result::Result<T, E>;
That way you can use any
Result
type-alias as the actual realResult
without worry
alexcrichton created PR review comment:
Could this be avoided by having a
#[cfg]
for std/unwind on thePanicked
variant? That way we can statically provide that it's never constructed
alexcrichton created PR review comment:
Personally I think it's reasonable to skip the
compile_error!
here and just let this happen as normal. Withpanic = "abort"
nothing should be unsafe, it just means the extra unwind bits here aren't too useful (and they'll get optimized out)
alexcrichton submitted PR review.
alexcrichton created PR review comment:
I commented below as well, but I think it's fine to rely on unconditional aborts via
extern "C"
boundary in the panic=unwind std=false mode and skip thecompile_error!
bjorn3 submitted PR review.
bjorn3 created PR review comment:
This error only happens when using
panic = "unwind"
, which is unsound if not usingcatch_unwind
.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Could you clarify the unsoundness? Looking at this disassembly it looks like even in
no_std
mode unwinds are caught and abort correctly
cfallin submitted PR review.
cfallin created PR review comment:
@alexcrichton just so I understand: that example uses the
C-unwind
ABI; currently (and in this PR) the stackswitching bits in assembly are called with theC
ABI. In ano_std
world withpanic=unwind
and a custom unwinder (not my use-case, but what this error statically disables for now) I think we could otherwise get an unwind going off the top of a fiber stack. Or it's possible I'm missing something? I don't have all the compiler configuration options here fully internalized.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Ah yeah sorry that's just "faking" a non-inlined call to something that can theoretically unwind. You don't actually need
C-unwind
but withpanic=unwind
all normal rust functions can possibly unwind so a catch-and-panic should be inserted for all calls. Basically I'm still not sure how#![no_std]
+-Cpanic=unwind
is unsound myself
cfallin submitted PR review.
cfallin created PR review comment:
I see: my confusion is the dual to your confusion I think, that is, I'm still not sure how
#![no_std]
+-Cpanic=unwind
is sound :-)Restating mostly for my own benefit, but to verify we're on the same page: so with the current state of
main
, we wrap a call to the entrypoint on the new stack in a call tocatch_unwind
(e.g. here). If the fiber's body panics while on the fiber stack, systemlibunwind
reaches this point, the panic is reified as aRunResult
, and thatRunResult
is passed back to the caller on the original stack and rethrown withresume_unwind
.With this PR, in
no_std
, we don't havecatch_unwind
/resume_unwind
. So a panic happens in a fiber's body; say that there is a custom unwinder (there must be I think, for one to be doing unwinding at all inno_std
?); this unwinder iterates up the stack frames on the fiber stack. It reaches the top, there's no callsite in acatch_unwind
; this is an unwind error / UB / something bad. Or is this prevented some other way?
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Oh sorry! Good point getting on the same page (can also talk more during the Cranelift meeting tomorrow if helpful)
I think the main thing to add is that
extern "C" fn foo() { panic!() }
unconditionally, and soundly, aborts the process. That happens irrespective of-Cunwind=...
settings. Starting in Rust 1.81 allextern "C"
functions have an implicit catch-and-abort. That means that even if we were to remove thecatch_unwind
happening in wasmtime-fiber the crate would still be sound, just not too helpful on panics. Thecatch_unwind
is mostly still there to assist with debugging panics by shepherding the panic across the boundary. That's always optional though since it's sound to abort the process at any time (just better to not do so if possible).That to me is why it's currently sound if we skip the
catch_unwind
. A bit of a historical note though is that Rust 1.81 is very recent and prior versions of rustc didn't catch-and-abort atextern "C"
boundaries. That means that historicallycatch_unwind
was indeed required for soundness but that's no longer the case.
cfallin submitted PR review.
cfallin created PR review comment:
Ah! OK, that's the detail I had missed; thanks for your patience here. I guess since our MSRV is now 1.81, we can rely on this behavior unconditionally. I'll remove the compile-error and add a comment detailing this.
cfallin updated PR #9689.
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
Done.
cfallin created PR review comment:
Done.
cfallin submitted PR review.
cfallin submitted PR review.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin created PR review comment:
I think I had added it at some point while debugging failures but indeed it should work with just
async
, nostd
-- so, restored that, addedstd
separately, and added thewasmtime-fiber
variants.
cfallin submitted PR review.
cfallin created PR review comment:
Done.
cfallin submitted PR review.
cfallin created PR review comment:
Done -- refactored a bit in
Suspend::execute()
to statically avoid constructing the panic case as well.
cfallin updated PR #9689.
cfallin has marked PR #9689 as ready for review.
cfallin requested alexcrichton for a review on PR #9689.
cfallin requested wasmtime-default-reviewers for a review on PR #9689.
cfallin requested wasmtime-core-reviewers for a review on PR #9689.
cfallin commented on PR #9689:
Updated and out of draft mode now; thanks!
alexcrichton submitted PR review:
Is it possible to add a `cargo test -p wasmtime-fiber --no-default-features to CI as well?
cfallin updated PR #9689.
cfallin updated PR #9689.
cfallin updated PR #9689.
cfallin commented on PR #9689:
Sure thing, done!
cfallin has enabled auto merge for PR #9689.
cfallin updated PR #9689.
cfallin has enabled auto merge for PR #9689.
cfallin commented on PR #9689:
The CI failures are fairly perplexing -- the only failing job I see is "report failure on cancelation", and a bunch of other jobs are canceled midway through their runs. @alexcrichton any ideas?
alexcrichton commented on PR #9689:
This happened on another PR the other night and I couldn't figure out what was happening. I then re-queued in the morning and it merged so I assumed it was a transient failure in CI infrastructure... Let's see if that still holds true?
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Oh when adding a new job like this you'll need to also add a "join" to the
ci-status
below so it's handled for failure-vs-not.
alexcrichton updated PR #9689.
alexcrichton commented on PR #9689:
I've pushed a commit to run full CI on this PR where the PR run can't cancel itself, so let's see what happens...
alexcrichton commented on PR #9689:
Bleh ok so this is a known issue (at least to me) on github actions. For the merge queue there's no way to cancel a failed job so our
main.yml
self-cancels whenever a job fails. This works fine on all platforms except Windows. On Windows self-cancelling sometimes gives you a big red X on the job and sometimes gives you the icon of "this was cancelled by someone else". Basically there seems to be a race of the self-cancel and the job actually failing.In any case it's the Windows build. Just some minor things to update.
cfallin updated PR #9689.
cfallin updated PR #9689.
cfallin created PR review comment:
Done!
cfallin submitted PR review.
cfallin commented on PR #9689:
Ah, that was entirely unintuitive, GitHub CI user-interface! Thanks for working that out.
cfallin updated PR #9689.
cfallin merged PR #9689.
Last updated: Dec 23 2024 at 12:05 UTC