posborne opened PR #11003 from posborne:stack-switching-cranelift to bytecodealliance:main:
These changes pull in the cranelift changes from #10177 with some additional stacked changes to resolve conflicts, align with previous changes in the stack-switching series, and address feedback items which were raised on previous iterations of the PR (but mostly not changing anything of significant substance). Tracking Issue: #10248.
The
stack-switchingfeature flag is retained and used minimally in this changeset in order to avoid compilation problems, but not really used beyond that. There is at least one item in the tracking issue already related to likely finding a way to drop these compilation flags in most places but I think it is worth deferring that here as it will required touching code more broadly.CC @frank-emrich @dhil
posborne requested wasmtime-compiler-reviewers for a review on PR #11003.
posborne requested wasmtime-core-reviewers for a review on PR #11003.
posborne requested alexcrichton for a review on PR #11003.
posborne submitted PR review.
posborne created PR review comment:
@frank-emrich @dhil I put this in here as passing as the only use of
allow_smallerhere is in the debug assertions which seemed like a potential defect, though I didn't do a deep dive into the full usage.
fitzgen requested fitzgen for a review on PR #11003.
fitzgen submitted PR review:
Thanks! Lots of comments below but for the most part these should be pretty straightforward to fix.
fitzgen created PR review comment:
This is fine to add, but fwiw the
sized_stack_slotsmember ispuband implementsIndex<StackSlot>so you can always do&func.sized_stack_slots[slot]
fitzgen created PR review comment:
Can we fold the
is_vmgcref_type()branch into thematchhere? Something like this:match ty.top() { WasmHeapTopType::Any | WasmHeapTopType::Extern => ..., WasmHeapTopType::Func => ..., ... }This should allow us to exhaustively match and remove the
_ => panic!(...)arm.
fitzgen created PR review comment:
If we move the
stack_switchingmodule from the top-level to being a submodule offunc_environ(similar to thefunc_environ::gcmodule) then we don't need to make all those members ofFuncEnvironmentbepub(crate), which is a little thing but a nice thing (especially if we consider all the future evolutions of this code and any changes toFuncEnvironmentmembers that will also need or not need to becomepub(crate)).
fitzgen created PR review comment:
And this could be a
smallvectoo.
fitzgen created PR review comment:
Instead of looking at the value's CLIF type, can you modify
translate_ref_is_nullto pass in the actual wasm type as an argument? We already map multiple Wasm types onto the same CLIF type, it just so happens that is not currently the case for the particular type used for pointers to stacks, but this code as-written is pretty fragile to rely on that incidental property.
fitzgen created PR review comment:
Nice to avoid the allocation when possible:
let mut args: SmallVec<[_; 6]> = smallvec![vmctx, table_index_arg, delta];(I think I counted the max args correctly but feel free to fix the inline capacity if I messed up)
fitzgen created PR review comment:
I'd prefer defining a type containing the fat pointer's parts, something like
pub struct ContObj { revision: ir::Value, contref: ir::Value, } impl ContObj { /// ... pub(crate) new(...) -> Self { /* ... */ } /// getters for the parts... }rather than trying to use
ir::types::I128here.Then we could also add
ContObj::loadandContObj::storehelper methods as needed.This lets us avoid the sketchy
transmutealluded to above (which I don't believe actually exists anyways, because we passVMContObjs by parts to libcalls already) and we also don't need to do all the packing and unpacking and extending/reducing that packing into and unpacing fromir::types::I128requires either.The only reason
ir::types::I128exists is for ABI compatibility with LLVM code, which isn't a concern we have here.
fitzgen created PR review comment:
My suggestion for a
ContObjtype instead ofir::types::I128is basically to do the same as what is happening here.
fitzgen created PR review comment:
TODO
fitzgen created PR review comment:
Similar here regarding exhaustively matching on the top type.
fitzgen created PR review comment:
Can we leave this as an error instead of a panic?
fitzgen created PR review comment:
Can we put this into a helper function something like this:
fn control_context_size(triple: &target_lexicon::Triple) -> WasmResult<usize> { match (triple.architecture, triple.operating_system) { (target_lexicon::Architecture::X86_64, target_lexicon::OperatingSystem::Linux) => Ok(24), _ => Err(wasm_unsupported!("stack switching is not supported on {triple}")), } }This will future proof the code and give us a nice place to slot in updates when we expand platform support.
fitzgen created PR review comment:
This is not a
VMHostArray, it is a pointer to aVMHostArray, right? Can we update the name to reflect that?VMHostArrayPtrorVMHostArrayRefperhaps?Also, I don't think we need to separate the static offset and dynamic base address because nothing is (for example) determining whether we need to emit explicit bounds checks or can rely on guard pages for these accesses, so we can just have a single
address: ir::Valuefield and trust Cranelift's instruction selection choose the best addressing mode it can given the input IR.
fitzgen created PR review comment:
These should be able to be
SmallVecs too.
fitzgen created PR review comment:
SmallVec
fitzgen created PR review comment:
Can you make sure there is an item for this in the tracking issue?
fitzgen created PR review comment:
Can we name the local something else? Having both a
resume_tableandresumetablevariable in scope at the same time is pretty confusing. Maybe we can havewasm_resume_tableandclif_resume_table?
fitzgen created PR review comment:
SmallVechere too.
fitzgen created PR review comment:
SmallVec
dhil submitted PR review.
dhil created PR review comment:
I think this code is harmless. It seems to be doing some confidence testing that the byte-encoding of the values will fit into the allocated space. If I read the above comment correctly (and the code below) then each encoded element is stored as if it had type
T, so for elements of typeSsuch that|S| < |T|, then there will be some padding between each element.I am not sure
allow_smalleris useful, perhaps we can just havedebug_assert!(size <= std::mem::size_of::<T>());, or perhaps we can dispense with this particular confidence checking altogether.
dhil created PR review comment:
/// The Cranelift type used to represent all of the following:
dhil created PR review comment:
Yes, good shout! I think I am to blame for the current state of affairs.
posborne updated PR #11003.
posborne submitted PR review.
posborne created PR review comment:
I think 7 here, will do a commit with the smallvec changes.
posborne edited PR review comment.
posborne created PR review comment:
For args and returns, I guessed at a smallvec max of 8; I didn't see anywhere where we are bound to this, but smallvec will still fall back on heap allocations if we go beyond this.
posborne submitted PR review.
posborne created PR review comment:
Added to tracking.
posborne submitted PR review.
posborne submitted PR review.
posborne created PR review comment:
For now, I'll keep a debug_assert! in place to check
size <= std::mem::size_of::<T>().
posborne submitted PR review.
posborne created PR review comment:
Tracing through this, just wanted to confirm that in order to do this we'll need to update wasm-tools/wasmparser to pass through the heap type information (similar to RefNull)? Or am I off in the weeds in tracking through the changes required?
fitzgen submitted PR review.
fitzgen created PR review comment:
Argh. I forgot that
ref.is_nulldoesn't have a type parameter. I'll see how hard it would be to add type information that can be used here (and elsewhere).
fitzgen submitted PR review.
fitzgen created PR review comment:
https://github.com/bytecodealliance/wasmtime/pull/11030 passes the wasm ref type through to
translate_ref_is_null. A couple things need to happen before that can land tho, see the PR description.
posborne updated PR #11003.
posborne submitted PR review.
posborne created PR review comment:
Just removed this for now as there's already an error at where this callsite will eventually be and its addition should be obvious. Probably to follow the work that @cfallin does to land other exception bits.
I confirmed the tracking issues already has an item to track
resume.throwaddition (outside of the initial impl).
posborne commented on PR #11003:
Pushed most of the updates but still need to make the suggested updates around the fat pointer stuff and resolve conflicts with upstream.
posborne updated PR #11003.
posborne updated PR #11003.
posborne submitted PR review.
posborne created PR review comment:
Updated to use the wasm type as part of merging upstream back into the branch.
posborne submitted PR review.
posborne created PR review comment:
Implemented this now.
posborne created PR review comment:
Discussed this change a bit more with @fitzgen and played around with an impl. Currently, we are thinking that to facilitate having a single wasm value be modeled by multiple ir values (as we want here), we'll need to do some updates to the code translator to support having a single translated wasm value map to multiple ir values.
This will be a larger change, so I am proposing that we move forward with changing the stack-switching code to be updated to use that mechanism when introduced in a later change.
posborne submitted PR review.
posborne updated PR #11003.
posborne commented on PR #11003:
@fitzgen I believe this is ready for review again, let me know if there's pieces I missed in my previous updates. As noted, I chose to defer fatpointer changes for now to be potentially addressed as part of (or following) changes to the translation stack.
fitzgen submitted PR review:
LGTM modulo a few final comments below, we should be able to merge this and move on to follow ups after they are addressed.
Thanks!
fitzgen created PR review comment:
Ditto
fitzgen created PR review comment:
And then construction would become something like this:
assert!(env.pointer_type().bits() <= 64); let contref_addr = pos.ins().uextend(ir::types::I28, contref_addr); let revision_counter = pos.ins().uextend(ir::types::I128, revision_counter); let shifted_counter = pos.ins().ishl_imm(revision_counter, 64); pos.ins().bor(shifted_counter, contref_addr)Note: if you want to switch the order of the contref pointer and the counter in the fat pointer, that's fine (I think I unwittingly switched it from what is in the code now), but the important point is that we should be able do all of this fat pointer construction and destruction without branching on endianness and pointer width.
fitzgen created PR review comment:
I think it makes sense to add
disastests for these operations intests/disas/stack-switching/*. Should be as easy as writing a couple.watfiles and then runningWASMTIME_TEST_BLESS=1 cargo test --test disasto initialize the golden output.Doesn't need to be in this PR, but if not in this PR, then should be added to the tracking issue.
fitzgen created PR review comment:
Can this just be something like
let ptr_ty = env.pointer_type(); assert!(ptr_ty.bits() <= 64); let contref = pos.ins().ireduce(ptr_ty, contobj); let shifted = pos.ins().ushr_imm(contobj, 64); let revision_counter = pos.ins().ireduce(it::types::I64, shifted); (revision_counter, contref)?
That is, we define the fat pointer as 128 bits where the upper 64 are the revision counter and the bottom
sizeof(pointer)bits are the pointer to the continuation. This way I don't think we ever have to branch on endianness or pointer width.
fitzgen created PR review comment:
This shouldn't need to be
pub(crate)after thestack_switchingmodule moved tofunc_environ::stack_switching, right?
posborne submitted PR review.
posborne created PR review comment:
Yeah, this makes sense to me, will adopt this approach (with a quick round of testing) until we can split into individual ir values.
posborne updated PR #11003.
posborne commented on PR #11003:
@fitzgen I am hunting down one regression with the latest, probably around the changes to table ops based on wasm types (though I haven't found a smoking gun yet). These tests passed prior to some of the latest changes on this branch and the test is doing a table.grow of continuations followed by a resume of the first added to the table which traps on a null reference.
posborne updated PR #11003.
posborne updated PR #11003.
posborne commented on PR #11003:
Pulled in and resolved conflicts (some pretty mild ones from https://github.com/bytecodealliance/wasmtime/pull/11216. @fitzgen Review would be appreciated; I think in our last convos I had played around with a different fatpointer impl but I'm fine with this as-is (possibly to revisit later on).
posborne updated PR #11003.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
This field is already public.
bjorn3 created PR review comment:
Why is this necessary? Wasm doesn't have 128bit integers.
bjorn3 submitted PR review.
posborne submitted PR review.
posborne created PR review comment:
At least in this iteration of the changes, the stack switching code is using an i128 for it's vmcontobj fat pointer (consisting of vmcontref pointer and revision). @fitzgen and I have discussed a bit about how we can get rid of this and just have two ir values through the transformation, but it will require some additional changes to how we model the relationship between wasm and clif values to support having one of the former map to two distinct ir values.
posborne submitted PR review.
posborne created PR review comment:
I'll remove this in favor of using the pub field directly; probably just a miss on the visibility when the wasmfx contributors originally authored these changes (looks like field has been pub for a long time).
posborne updated PR #11003.
posborne updated PR #11003.
fitzgen submitted PR review:
LGTM! Couple final nitpicks before this lands, but once those are addressed, feel free to add it to the merge queue or ask me to do that (I forget if you have those permissions or not).
Thanks for sticking through with this one!
fitzgen created PR review comment:
This field is only accessed in a submodule, so it does not need to be
pub(crate).
fitzgen created PR review comment:
Similarly, this is only accessed in submodules as well, and not across crate boundaries at all, so it does not need to be
pub.
fitzgen created PR review comment:
And then this should be something like
u8::try_from(align( u32::from(self.vmcontobj_revision()) + u32::try_from(core::mem::size_of::<u64>()).unwrap(), u32::from(self.size()), )) .unwrap()
fitzgen created PR review comment:
To work across non-64-bit ISAs, which we support today via, for example, the 32-bit Pulley interpreter, this needs to be
self.size()
fitzgen created PR review comment:
I don't think this align(16) needs to be here anymore, as described in the deleted
FIXMEcomment? But if we are leaving it, then we should have a follow up item to investigate in the meta issue and also should adjust thesize_of_vmcontobjmethod above to align to 16 instead of the natural alignment that my suggestion comment sketched out.
posborne submitted PR review.
posborne created PR review comment:
I wasn't sure if we wanted to pad out this struct to always be packed as 128 bits or as two words. I think two words makes more sense and aligns with what you are saying but means that the cranelift generated code will definitely need to be updated to be able to support 32-bit target architectures (probably along with changes discussed before to change how we do the fatpointer handling).
I'll land changes after thinking this a bit more and update the meta-issue with any notes on 32-bit targets.
posborne updated PR #11003.
fitzgen commented on PR #11003:
@posborne is this ready for review now?
posborne commented on PR #11003:
@fitzgen Yes, should be ready for review -- I'll just push a fixup for the CI issues. I sent a side note in zulip but I'll repost here:
Ok, I got around to updating the branch with (hopefully) a final set of changes. Since we had been going back-and-forth on VMContObj alignment a bit I did end up changing this to natural alignment with revision now being defined as a usize.
Along with that I updated the cranelift fatpointer code and libcalls to work with the usize and to use an appropriate fat pointer container ir type for two words -- for this I had to plumb through a size type for that layer matching what exists for components (https://github.com/bytecodealliance/wasmtime/pull/11003/commits/078002aa024fe825f148f9dbcf864049bcf708ba).
Since the changes were a bit more substantive I will wait for a fresh review of that -- there were some merge conflicts related to the other changes in func_environ but they were pretty straightforward to address.
In parallel, I'll work on a rebase of the test changes to get a bit of extra validation as there isn't a lot in tree to cover things end-to-end on its own.
posborne updated PR #11003.
posborne commented on PR #11003:
@fitzgen I did run the test changes now -- a subset of those tests are now failing as it looks like there are some codepaths hit due to upstream changes where we end up calling
Val::null_topwhich currently panics for continuation types.
posborne updated PR #11003.
posborne updated PR #11003.
posborne commented on PR #11003:
@fitzgen I did run the test changes now -- a subset of those tests are now failing as it looks like there are some codepaths hit due to upstream changes where we end up calling Val::null_top which currently panics for continuation types.
To get tests going with the upstream stuff, I did end up adding a minimal
Val::ContRefmember which doesn't seem avoidable. I tried to keep this as small and I could bundle it with the test changes, but it does appear to be required in order to have ref.null globals for continuations and some other cases.
github-actions[bot] commented on PR #11003:
Subscribe to Label Action
cc @fitzgen
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "wasmtime:api", "wasmtime:ref-types"Thus the following users have been cc'd because of the following labels:
- fitzgen: wasmtime:ref-types
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
posborne requested wasmtime-fuzz-reviewers for a review on PR #11003.
posborne requested alexcrichton for a review on PR #11003.
posborne updated PR #11003.
github-actions[bot] commented on PR #11003:
Subscribe to Label Action
cc @fitzgen
<details>
This issue or pull request has been labeled: "fuzzing", "wasmtime:c-api"Thus the following users have been cc'd because of the following labels:
- fitzgen: fuzzing
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
fitzgen submitted PR review:
LGTM with the below addressed, thanks!
fitzgen created PR review comment:
This should still zero the global's definition, no?
fitzgen created PR review comment:
And similar here
posborne submitted PR review.
posborne created PR review comment:
Yeah, definitely.
posborne updated PR #11003.
posborne has enabled auto merge for PR #11003.
posborne updated PR #11003.
posborne updated PR #11003.
posborne updated PR #11003.
fitzgen submitted PR review.
fitzgen created PR review comment:
This stuff should be a method on the
PtrSizetrait that we define invmoffsets.rs(maybefn vm_host_array_layout(&self) -> (u8, u32)) and ultimately we should have access to that method here viaenv.offsets.ptr.We should definitely avoid any kind of
std::mem::{size,align}_ofcalls, as that ends up being pretty fragile (doesn't work forf64across 32- and 64-bit architectures, for example, where the alignment is different despite size being the same).
fitzgen submitted PR review.
fitzgen created PR review comment:
To clarify a little more:
To avoid the
std::mem::{size,align}_ofcalls, you may have to have differentPtrSizemethods for each concreteTthat is used, and then boundTbyT: HostArrayLayoutand definetrait HostArrayLayout { fn host_array_layout<P: PtrSize>(p: &P) -> (u8, u32); }and implement that for each concreate
Tsuch that it calls the appropriatePtrSizemethod.
posborne submitted PR review.
posborne created PR review comment:
Makes sense, I'll take a look at making changes to revise VMHostArray and any other problematic parts I can find (or see if @dhil can help out). I wanted to get a basic version of the commit up to get this kind of feedback as doing operations on the generic T here (from the original impl) seemed problematic for the reasons you describe.
posborne updated PR #11003.
posborne requested fitzgen for a review on PR #11003.
posborne submitted PR review.
posborne created PR review comment:
Addressed in 9343ab99f2c07fb3f2a4dd778142422da9202d3f if my interpretation was correct.
fitzgen submitted PR review.
fitzgen created PR review comment:
Looks good, thanks!
fitzgen submitted PR review.
posborne merged PR #11003.
Last updated: Dec 06 2025 at 07:03 UTC