alexcrichton requested wasmtime-core-reviewers for a review on PR #10134.
alexcrichton requested dicej for a review on PR #10134.
alexcrichton opened PR #10134 from alexcrichton:riscv32
to bytecodealliance:main
:
This commit enables Wasmtime to build on platforms without 64-bit atomic instructions, such as Rust's
riscv32imac-unknown-none-elf
target. There are only two users of 64-bit atomics right now which are epochs and allocation ofStoreId
. This commit adds#[cfg]
to epoch-related infrastructure in the runtime to turn that all of if 64-bit atomics aren't available. The thinking is that epochs more-or-less don't work without 64-bit atomics so it's easier to just remove them entirely.Allocation of
StoreId
is trickier though because it's so core to Wasmtime and it basically can't be removed. I've opted to change the allocator to 32-bit indices instead of 64-bit indices. Note thatStoreId
requires unique IDs to be allocated for safety which means that while a 64-bit integer won't overflow for a few thousand years a 32-bit integer will overflow in a few hours from quickly creating stores. The rough hope though is that embeddings on platforms like this aren't churning through stores. Regardless if this condition is triggered it'll result in a panic rather than unsoundness, so we hopefully have at least that going for us.Closes #8768
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
alexcrichton requested wasmtime-default-reviewers for a review on PR #10134.
alexcrichton commented on PR #10134:
On a more abstract note this is part of my quest to "let's port everything everywhere" so I don't have anything specific in mind for the use case here. Instead it's more of "maybe others can do interesting things with this". The bit about
StoreId
I'm not particularly happy with because it's something where your tiny embedded device will crash after days online and you'll have no way to easily reproduce unless you can easily debug it. That seems like a pretty bad silent failure mode but at the same time I'm not sure what else to do...Mostly saying that I'm not 100% sold this is a good idea, but I also don't know of a different way to run without 64-bit atomics right now.
pchickey submitted PR review:
I agree this tradeoff feels a little unsavory but I can't think of a better alternative. We don't want to refuse to support a platform at all because one of the guarantees wasmtime provides isn't well suited to a particularly intense corner case on that platform.
alexcrichton updated PR #10134.
alexcrichton commented on PR #10134:
Ah I forgot a usage in component model resources. The usage there is only intended to have interior mutability and still be Send/Sync so this updates the type to using two
AtomicU32
values. No atomicity is required in this use case since bigger bugs are afoot if there are concurrent reads/writes to this value. The only thing we're trying to prevent is UB, which twoAtomicU32
values work just fine for.
hanna-kruppe commented on PR #10134:
Would a lock protecting a (non-atomic)
u64
would be an option? Of course in a generalno_std
context this would have to be a spinlock, which is bad for all sorts of reasons, but perhaps slightly better than the failure mode of running out ofStoreId
s after a few days of normal operation.
hanna-kruppe edited a comment on PR #10134:
Would a lock protecting a (non-atomic)
u64
would be an option forStoreId
? Of course in a generalno_std
context this would have to be a spinlock, which is bad for all sorts of reasons, but perhaps slightly better than the failure mode of running out ofStoreId
s after a few days of normal operation.
alexcrichton commented on PR #10134:
Yeah if
std
could be relied on I'd replace it with aMutex<u64>
on non-64-bit platforms. Personally I'd prefer to avoid spin locks or panic-on-contention for something so core as creating a new store. The failure mode here specifically is creating manyStore<T>
instances and destroying them, where eventually that'll panic. I'm not sure if low-end devices that don't have 64-bit atomics will be doing since since I'd otherwise suspect that they'd probably create a small handful of stores (if more than 1) and use those for their lifetime.Overall I don't know what the best tradeoff is. As-is
Store<T>::new
will panic if called rapidly. With a spin-lock you run the risk of long pauses if you get preempted while the lock is held. With panic-on-contention you risk periodic panics if there are multiple threads/actors in the system. None seem like a great option :(
cfallin commented on PR #10134:
Slightly wild idea inspired by number theory that might be close to working:
If we have two atomic u32s, and we increment them by relatively prime amounts (say, 3 and 5) that are also relatively prime to the modulus of the ring we're operating in (2^32), then the periodicity of the pair of them is the product of the periodicity of each. In other words, we walk through 2^64 possible tuples. They're not contiguous in ordinary numeric order but we don't need that -- only uniqueness.
The slight wrinkle is that if we do the atomic increments separately, the two halves of the tuple sequence can "slide" relative to each other -- and this reduces the periodicity / can lead to repeats. I think one can fix that by incrementing the first, the second, then the first again, and using the 3-tuple. My intuition is that that bounds the "slide" / recovers uniqueness, but I'm having some trouble thinking through it at the moment. Maybe it inspires someone else though?
cfallin commented on PR #10134:
Ah, OK, here is something that should work. It's not constant-time (it can retry) but it is non-blocking (in the lockless programming sense): there are no spinloops.
struct PairCounter { x: AtomicU32, y: AtomicU32 } impl PairCounter { fn next(&self) -> (u32, u32) { loop { let x = self.x.fetch_add(3, Ordering::SeqCst); let y = self.x.fetch_add(5, Ordering::SeqCst); let x_ = self.x.load(Ordering::SeqCst); if x.wrapping_add(3) == x_ { // Common case: no one else bumped self.x in the meantime. return (x, y); } } fn next_u64(&self) -> u64 { let (x, y) = self.next(); (u64::from(x) << 32) | u64::from(y) } }
relaxing the
SeqCst
to acquire/release left as an exercise to the (smarter than me) reader...
cfallin edited a comment on PR #10134:
Ah, OK, here is something that should work. It's not constant-time (it can retry) but it is non-blocking (in the lockless programming sense): there are no spinloops.
struct PairCounter { x: AtomicU32, y: AtomicU32 } impl PairCounter { fn next(&self) -> (u32, u32) { loop { let x = self.x.fetch_add(3, Ordering::SeqCst); let y = self.x.fetch_add(5, Ordering::SeqCst); let x_ = self.x.load(Ordering::SeqCst); if x.wrapping_add(3) == x_ { // Common case: no one else bumped self.x in the meantime. return (x, y); } } } fn next_u64(&self) -> u64 { let (x, y) = self.next(); (u64::from(x) << 32) | u64::from(y) } }
relaxing the
SeqCst
to acquire/release left as an exercise to the (smarter than me) reader...
cfallin edited a comment on PR #10134:
Ah, OK, here is something that should work. It's not constant-time (it can retry) but it is non-blocking (in the lockless programming sense): there are no spinloops.
struct PairCounter { x: AtomicU32, y: AtomicU32 } impl PairCounter { fn next(&self) -> (u32, u32) { loop { let x = self.x.fetch_add(3, Ordering::SeqCst); let y = self.y.fetch_add(5, Ordering::SeqCst); let x_ = self.x.load(Ordering::SeqCst); if x.wrapping_add(3) == x_ { // Common case: no one else bumped self.x in the meantime. return (x, y); } } } fn next_u64(&self) -> u64 { let (x, y) = self.next(); (u64::from(x) << 32) | u64::from(y) } }
relaxing the
SeqCst
to acquire/release left as an exercise to the (smarter than me) reader...
alexcrichton commented on PR #10134:
@cfallin would you feel confident enough in that to have it be the default algorithm? From a speed perspective I'm not too worried about that being significantly slower than the current allocation (I'd expect
Store<T>
allocation in general to dwarf the atomic ops). What I would be worried about though is that if something as complex as that is not on-by-default then we won't catch any possible issues with it
cfallin commented on PR #10134:
Maybe? I'm by no means a niche expert in lockless algorithms -- I can try to sketch a correctness proof though. I don't want to block your work here so by all means land with 32-bit atomics and the (unfortunate but best option) "time bomb" panic and I can do this as followup.
alexcrichton updated PR #10134.
alexcrichton commented on PR #10134:
Nah makes sense, but on reflection I've switched this to using
crate::sync::RwLock<T>
which is basically "put a lock around it". Ourno_std
locks currently all panic on contention under the assumption that you probably only have on thread in no_std contexts. In any case this is no worse than what Wasmtime had before this change and it's arguably more correct with respect toStoreId
. This means that we'll still preserve 64-bit uniqueness guarantees and it'll just be a bit slower without 64-bit atomics.If the panic-on-contention bit becomes a problem then it's already one we have to solve anyway, so this way if that's a problem we still only have one problem to solve as opposed to two.
alexcrichton updated PR #10134.
alexcrichton updated PR #10134.
alexcrichton has enabled auto merge for PR #10134.
alexcrichton updated PR #10134.
alexcrichton has enabled auto merge for PR #10134.
alexcrichton merged PR #10134.
Last updated: Feb 28 2025 at 02:27 UTC