wasmtime / issue #5026 Remove `PackedOption` and change `... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #5026 Remove `PackedOption` and change `...

Wasmtime GitHub notifications bot (Oct 06 2022 at 13:08):

jakubDoka opened issue #5026:

Feature

Removal of PackedOption. Making EntityRef a generic struct, where T would be the type it is pointing to.

Benefit

PackedOption is making code verbose and unnatural. This proposal allows using Option while preserving advantages of PackedOptions compactness. Also eliminates need for ReservedValue.

Changing EntityRef from trait to generic struct can greatly simplify cranelift_entity implementation and save us some names when defining entity data. The structure can also implement all of the useful traits on one place so no shorthand macro is needed.

Implementation

Here is a quick demonstration of how this can work. This solution requires (forever) unstable feature.

#![feature(rustc_attrs)]

use std::{marker::PhantomData, mem, ops::{Index, IndexMut}};

pub struct PrimaryMap<T> {
    data: Vec<T>,
}

impl<T> PrimaryMap<T> {
    pub fn insert(&mut self, value: T) -> EntityRef<T> {
        let index = self.data.len();
        self.data.push(value);
        EntityRef::new(index as u32).expect("no more entities can be spawned")
    }
}

impl<T> Index<EntityRef<T>> for PrimaryMap<T> {
    type Output = T;

    fn index(&self, index: EntityRef<T>) -> &Self::Output {
        &self.data[index.index()]
    }
}

impl<T> IndexMut<EntityRef<T>> for PrimaryMap<T> {
    fn index_mut(&mut self, index: EntityRef<T>) -> &mut Self::Output {
        &mut self.data[index.index()]
    }
}

// We might want to choose a different index representation but most common one is
// as default. Also the name is a bit long for something common, `ERef` or `VRef`
// (Virtual Reference) might be a better fit.
#[repr(transparent)]
pub struct EntityRef<T, R = NonMaxU32>(R, PhantomData<*const T>);

impl<T, R: EntityRefRepr> EntityRef<T, R> {
    pub fn new(args: R::InitArgs) -> Option<Self> {
        R::new(args).map(|r| Self(r, PhantomData))
    }

    pub fn index(&self) -> usize {
        self.0.index()
    }

    pub fn args(self) -> R::InitArgs {
        self.0.args()
    }
}

// Making representation as flexible as possible.
pub trait EntityRefRepr: Sized {
    type InitArgs;

    fn new(args: Self::InitArgs) -> Option<Self>;

    fn index(&self) -> usize;

    fn args(self) -> Self::InitArgs;
}

// Compiler is able to optimize the size
const _: () = assert!(std::mem::size_of::<Option<EntityRef<()>>>() == 4);
// Works even on structures that contain non-optional `EntityRef`
const _: () = assert!(std::mem::size_of::<Option<(EntityRef<(), u32)>>>() == 8);

// If we cannot use nightly, this solution is basically useless :(
#[rustc_layout_scalar_valid_range_end(4294967294)]
#[repr(transparent)]
pub struct NonMaxU32(u32);

impl NonMaxU32 {
    pub fn new(value: u32) -> Option<Self> {
        unsafe { mem::transmute(Self(value)) } // might not be correct implementation
    }

    pub fn get(&self) -> u32 {
        self.0
    }
}

impl EntityRefRepr for NonMaxU32 {
    type InitArgs = u32;

    fn new(args: Self::InitArgs) -> Option<Self> {
        Self::new(args)
    }

    fn index(&self) -> usize {
        self.get() as usize
    }

    fn args(self) -> Self::InitArgs {
        self.get()
    }
}

More extensive demonstration can be found here.

Wasmtime GitHub notifications bot (Oct 06 2022 at 13:11):

bjorn3 commented on issue #5026:

rust // If we cannot use nightly, this solution is basically useless :(

Yeah, we can't use unstable.

Wasmtime GitHub notifications bot (Oct 06 2022 at 13:15):

jakubDoka commented on issue #5026:

Okay, was not sure about that, I'll close this.

Wasmtime GitHub notifications bot (Oct 06 2022 at 13:16):

jakubDoka closed issue #5026:

Feature

Removal of PackedOption. Making EntityRef a generic struct, where T would be the type it is pointing to.

Benefit

PackedOption is making code verbose and unnatural. This proposal allows using Option while preserving advantages of PackedOptions compactness. Also eliminates need for ReservedValue.

Implementation

Here is a quick demonstration of how this can work. This solution requires (forever) unstable feature.

#![feature(rustc_attrs)]

use std::{marker::PhantomData, mem, ops::{Index, IndexMut}};

pub struct PrimaryMap<T> {
    data: Vec<T>,
}

impl<T> PrimaryMap<T> {
    pub fn insert(&mut self, value: T) -> EntityRef<T> {
        let index = self.data.len();
        self.data.push(value);
        EntityRef::new(index as u32).expect("no more entities can be spawned")
    }
}

impl<T> Index<EntityRef<T>> for PrimaryMap<T> {
    type Output = T;

    fn index(&self, index: EntityRef<T>) -> &Self::Output {
        &self.data[index.index()]
    }
}

impl<T> IndexMut<EntityRef<T>> for PrimaryMap<T> {
    fn index_mut(&mut self, index: EntityRef<T>) -> &mut Self::Output {
        &mut self.data[index.index()]
    }
}

// We might want to choose a different index representation but most common one is
// as default. Also the name is a bit long for something common, `ERef` or `VRef`
// (Virtual Reference) might be a better fit.
#[repr(transparent)]
pub struct EntityRef<T, R = NonMaxU32>(R, PhantomData<*const T>);

impl<T, R: EntityRefRepr> EntityRef<T, R> {
    pub fn new(args: R::InitArgs) -> Option<Self> {
        R::new(args).map(|r| Self(r, PhantomData))
    }

    pub fn index(&self) -> usize {
        self.0.index()
    }

    pub fn args(self) -> R::InitArgs {
        self.0.args()
    }
}

// Making representation as flexible as possible.
pub trait EntityRefRepr: Sized {
    type InitArgs;

    fn new(args: Self::InitArgs) -> Option<Self>;

    fn index(&self) -> usize;

    fn args(self) -> Self::InitArgs;
}

// Compiler is able to optimize the size
const _: () = assert!(std::mem::size_of::<Option<EntityRef<()>>>() == 4);
// Works even on structures that contain non-optional `EntityRef`
const _: () = assert!(std::mem::size_of::<Option<(EntityRef<(), u32)>>>() == 8);

// If we cannot use nightly, this solution is basically useless :(
#[rustc_layout_scalar_valid_range_end(4294967294)]
#[repr(transparent)]
pub struct NonMaxU32(u32);

impl NonMaxU32 {
    pub fn new(value: u32) -> Option<Self> {
        unsafe { mem::transmute(Self(value)) } // might not be correct implementation
    }

    pub fn get(&self) -> u32 {
        self.0
    }
}

impl EntityRefRepr for NonMaxU32 {
    type InitArgs = u32;

    fn new(args: Self::InitArgs) -> Option<Self> {
        Self::new(args)
    }

    fn index(&self) -> usize {
        self.get() as usize
    }

    fn args(self) -> Self::InitArgs {
        self.get()
    }
}

More extensive demonstration can be found here.

Wasmtime GitHub notifications bot (Oct 24 2022 at 20:37):

jameysharp commented on issue #5026:

PackedOption has been bothering me too, so I was recently also thinking about ways to remove it.

I'd considered using NonZeroU32 and making the reserved value be 0 instead of u32::MAX. That would work on stable rustc, but I haven't gotten past idly thinking about it, so maybe there are other difficulties I haven't thought of. For example, does subtracting 1 to get a valid array index in SecondaryMap impose a significant compile-time cost? Individually that's very cheap, but we index into those maps a lot, so maybe it matters.

If you want to try a few things I'd be happy to answer questions and review PRs.

Regardless, I'm re-opening this issue because unless we have evidence that the current design is the best option we have, I think it's worth considering alternatives to PackedOption.

Wasmtime GitHub notifications bot (Oct 24 2022 at 20:37):

jameysharp reopened issue #5026:

Feature

Removal of PackedOption. Making EntityRef a generic struct, where T would be the type it is pointing to.

Benefit

PackedOption is making code verbose and unnatural. This proposal allows using Option while preserving advantages of PackedOptions compactness. Also eliminates need for ReservedValue.

Implementation

Here is a quick demonstration of how this can work. This solution requires (forever) unstable feature.

#![feature(rustc_attrs)]

use std::{marker::PhantomData, mem, ops::{Index, IndexMut}};

pub struct PrimaryMap<T> {
    data: Vec<T>,
}

impl<T> PrimaryMap<T> {
    pub fn insert(&mut self, value: T) -> EntityRef<T> {
        let index = self.data.len();
        self.data.push(value);
        EntityRef::new(index as u32).expect("no more entities can be spawned")
    }
}

impl<T> Index<EntityRef<T>> for PrimaryMap<T> {
    type Output = T;

    fn index(&self, index: EntityRef<T>) -> &Self::Output {
        &self.data[index.index()]
    }
}

impl<T> IndexMut<EntityRef<T>> for PrimaryMap<T> {
    fn index_mut(&mut self, index: EntityRef<T>) -> &mut Self::Output {
        &mut self.data[index.index()]
    }
}

// We might want to choose a different index representation but most common one is
// as default. Also the name is a bit long for something common, `ERef` or `VRef`
// (Virtual Reference) might be a better fit.
#[repr(transparent)]
pub struct EntityRef<T, R = NonMaxU32>(R, PhantomData<*const T>);

impl<T, R: EntityRefRepr> EntityRef<T, R> {
    pub fn new(args: R::InitArgs) -> Option<Self> {
        R::new(args).map(|r| Self(r, PhantomData))
    }

    pub fn index(&self) -> usize {
        self.0.index()
    }

    pub fn args(self) -> R::InitArgs {
        self.0.args()
    }
}

// Making representation as flexible as possible.
pub trait EntityRefRepr: Sized {
    type InitArgs;

    fn new(args: Self::InitArgs) -> Option<Self>;

    fn index(&self) -> usize;

    fn args(self) -> Self::InitArgs;
}

// Compiler is able to optimize the size
const _: () = assert!(std::mem::size_of::<Option<EntityRef<()>>>() == 4);
// Works even on structures that contain non-optional `EntityRef`
const _: () = assert!(std::mem::size_of::<Option<(EntityRef<(), u32)>>>() == 8);

// If we cannot use nightly, this solution is basically useless :(
#[rustc_layout_scalar_valid_range_end(4294967294)]
#[repr(transparent)]
pub struct NonMaxU32(u32);

impl NonMaxU32 {
    pub fn new(value: u32) -> Option<Self> {
        unsafe { mem::transmute(Self(value)) } // might not be correct implementation
    }

    pub fn get(&self) -> u32 {
        self.0
    }
}

impl EntityRefRepr for NonMaxU32 {
    type InitArgs = u32;

    fn new(args: Self::InitArgs) -> Option<Self> {
        Self::new(args)
    }

    fn index(&self) -> usize {
        self.get() as usize
    }

    fn args(self) -> Self::InitArgs {
        self.get()
    }
}

More extensive demonstration can be found here.

Wasmtime GitHub notifications bot (Oct 24 2022 at 20:55):

cfallin commented on issue #5026:

It's definitely worth thinking about options (... uh, pun definitely not intended) here. That said, I'd be a little concerned about an "off by one" invariant: it feels like a bug waiting to happen, or at the very least, a source of confusion when debugging. In addition to that I suspect it might actually carry a measurable cost, given how frequently we access entities given their IDs.

IMHO we probably need to wait until we can encode the invariant for the compiler so it can reason about the niche on its own (i.e. wait for something like the above to become possible in stable Rust).

Wasmtime GitHub notifications bot (Oct 24 2022 at 20:57):

sunfishcode commented on issue #5026:

An RFC for a way to encode these kinds of invariants for the compiler to reason about was just opened yesterday: https://github.com/rust-lang/rfcs/pull/3334

Wasmtime GitHub notifications bot (Jul 14 2024 at 18:25):

meithecatte commented on issue #5026:

I'd be a little concerned about an "off by one" invariant: it feels like a bug waiting to happen, or at the very least, a source of confusion when debugging.

The way I'd approach this is to encapsulate this into a NonMaxU32 type, that does the +/-1 internally during conversions. Then you get a seamless transition once Rust supports something like this natively, and in the meantime, you don't need to worry about correctness.

I am not sure how much actual debuggers are used by cranelift developers (as opposed to dbg!). Handling them properly would require some kind of debugger integration. But cranelift already uses many debugger-unfriendly things – such as data that can't really be interpreted without external context structs, so I imagine that dbg!-style debugging would be much more common – which can be addressed with a proper Debug implementation.

In addition to that I suspect it might actually carry a measurable cost, given how frequently we access entities given their IDs.

I wouldn't be too surprised if this turned out to be in the noise. When used as an index, the +/-1 will just be folded into the addressing mode. An extra byte to fetch, and probably zero cost when actually executing (this is me going purely off vibes, I don't know the exact numbers for address generation latency and I don't have the proficiency with the relevant documentation to look it up in reasonable time).

It might even be the case that it'll be a performance win, because the niche at 0 will be easier to test for than u32::MAX.

In fact, there is a chance that the biggest difference will happen whether Rust has support for custom niches or not – the current implementation is "unsafe", with the check for accidentally creating the reserved value by converting a u32 being only a debug_assert!. Switching to NonZeroU32 will force actually performing this bounds-check.

Wasmtime GitHub notifications bot (Jul 16 2024 at 11:35):

jakubDoka commented on issue #5026:

you could also use an array container optimized for 1-based indexing, meaning that the base pointer is always offset by -1 to cancel the index +1 offset, tho miri will be angry

Wasmtime GitHub notifications bot (Jul 16 2024 at 11:46):

jakubDoka edited a comment on issue #5026:

you could also use an array container optimized for 1-based indexing, meaning that the base pointer is always offset by -1 to cancel the index +1 offset, tho miri will be angry

edit: another way that surprisingly respects provenance, is offsetting the base by length and subtracting the index, which inverts the order of everything

Wasmtime GitHub notifications bot (Jan 07 2025 at 18:23):

hanna-kruppe commented on issue #5026:

The sign-bound crate demonstrates a neat trick that works on stable and avoids extra arithmetic on indexing as well as unusual addressing modes that are less efficient on some architectures:

Define a repr(u8) enum that has variants for every valid value of the most significant byte of your integers. sign-bound uses this for positive and negative integers, so it has two enums of 127 variants each, but for cranelift-entity's purpose the range can be 0..=254.

Define a a #[repr(C, align(4))] struct that represents an u32 with the MSB stored as this enum, plus three u8s for the remaining bytes. The correct order of these fields depends on cfg(target_endian). Doing this for all integer types, as sign-bound does, needs some serious macro-fu, but cranelift-entity only needs the u32 case.

I'll call this struct NonMaxU32 for brevity, although that's not quite correct. This type is guaranteed to have the same representation in memory as an u32 integer thanks to the repr annotations, so it can be transmuted into and from u32 -- as long as the u32 is less than 0xFF_00_00_00. This restriction on the MSB enables rustc to make Option<NonMaxU32> the same size. While this layout optimization is not guaranteed, it's fairly simple and has existed for a long time, so I wouldn't be worried about relying on it for performance. However, there's some other limitations:

There are no guarantees about the memory layout of Option<NonMaxU32>, nor about how it's handled by calling conventions (unlike repr(transparent) wrappers like NonZeroU32). So the guarantee about PackedOption's layout added in https://github.com/bytecodealliance/wasmtime/pull/9697 can't be salvaged with this approach.

Getting good performance relies on NonZeroU32 being passed around and operated on as an ordinary integer, not as an aggregate with four fields. This is not true for the C calling convention on several platforms. Empirically it seems to work fine with the Rust calling convention today (see https://rust.godbolt.org/z/3sd3K9sdK), but it's not guaranteed.

On some architectures, 0xFF_00_00_00 is a more awkward to encode as immediate operand than u32::MAX and 0. Thus, the comparisons done by EntityRef::new and by pattern matching on Option<NonMaxU32> may sometimes require an extra instruction or a longer instruction encoding.

Because the entire range 0xFF_00_00_00..=u32::MAX is invalid, a checked conversion u32 -> Option<NonMaxU32> has to do a comparison and normalize out-of-range values to the None representative (unlike u32 -> Option<NonZeroU32>, which is just a transmute).

It can't represent quite as many distinct values as a type that only excludes u32::MAX, but the difference is tiny, relatively speaking (4.278 billion vs 4.295 billion).

Wasmtime GitHub notifications bot (Jan 08 2025 at 11:35):

hanna-kruppe edited a comment on issue #5026:

The sign-bound crate demonstrates a neat trick that works on stable and avoids extra arithmetic on indexing as well as unusual addressing modes that are less efficient on some architectures:

Define a repr(u8) enum that has variants for every valid value of the most significant byte of your integers. sign-bound uses this for positive and negative integers, so it has two enums of 127 variants each, but for cranelift-entity's purpose the range can be 0..=254.

Define a a #[repr(C, align(4))] struct that represents an u32 with the MSB stored as this enum, plus three u8s for the remaining bytes. The correct order of these fields depends on cfg(target_endian). Doing this for all integer types, as sign-bound does, needs some serious macro-fu, but cranelift-entity only needs the u32 case.

I'll call this struct NonMaxU32 for brevity, although that's not quite correct. This type is guaranteed to have the same representation in memory as an u32 integer thanks to the repr annotations, so it can be transmuted into and from u32 -- as long as the u32 is less than 0xFF_00_00_00. This restriction on the MSB enables rustc to make Option<NonMaxU32> the same size. While this layout optimization is not guaranteed, it's fairly simple and has existed for a long time, so I wouldn't be worried about relying on it for performance. However, there's some other limitations:

There are no guarantees about the memory layout of Option<NonMaxU32>, nor about how it's handled by calling conventions (unlike repr(transparent) wrappers like NonZeroU32). So the guarantee about PackedOption's layout added in https://github.com/bytecodealliance/wasmtime/pull/9697 can't be salvaged with this approach.

Getting good performance relies on the struct being passed around and operated on as an ordinary integer, not as an aggregate with four fields. This is not true for the C calling convention on several platforms. Empirically it seems to work fine with the Rust calling convention today (see https://rust.godbolt.org/z/3sd3K9sdK), but it's not guaranteed.

On some architectures, 0xFF_00_00_00 is a more awkward to encode as immediate operand than u32::MAX and 0. Thus, the comparisons done by EntityRef::new and by pattern matching on Option<NonMaxU32> may sometimes require an extra instruction or a longer instruction encoding.

Because the entire range 0xFF_00_00_00..=u32::MAX is invalid, a checked conversion u32 -> Option<NonMaxU32> has to do a comparison and normalize out-of-range values to the None representative (unlike u32 -> Option<NonZeroU32>, which is just a transmute).

It can't represent quite as many distinct values as a type that only excludes u32::MAX, but the difference is tiny, relatively speaking (4.278 billion vs 4.295 billion).

Last updated: Apr 18 2025 at 12:05 UTC