Stream: cranelift

Topic: ISLE Lowering, Extractors, MemFlags


view this post on Zulip Notification Bot (Mar 04 2026 at 15:18):

10 messages were moved here from #cranelift > Silly questions by Akshanabha Chakraborty.

view this post on Zulip Akshanabha Chakraborty (Mar 04 2026 at 15:40):

/// Flags for AtomicCas instruction
///
/// Each of these flags introduce a limited form of undefined behavior. The flags each enable
/// certain optimizations that need to make additional assumptions. Generally, the semantics of a
/// program does not change when a flag is removed, but adding a flag will.
///
/// In addition, the flags determine the endianness of the memory access.  By default,
/// any memory access uses the native endianness determined by the target ISA.  This can
/// be overridden for individual accesses by explicitly specifying little- or big-endian
/// semantics via the flags.
#[derive(Clone, Copy, Debug, Hash, PartialEq, Eq)]
#[cfg_attr(feature = "enable-serde", derive(serde::Serialize, serde::Deserialize))]
pub struct AtomicCasMemFlags {
    // Initialized to all zeros to have all flags have their default value.
    // This is interpreted through various methods below. Currently the bits of
    // this are defined as:
    //
    // * 0/1/2/3/4/5/6/7 - trap code
    // * 8/9/10 - atomic ordering
    // * 11 - aligned flag
    // * 12 - little endian
    // * 13 - big endian
    // * 14/15 - alias region
    // * 16 (overflow) - checked bit (to be seen)
    //
    // Current properties upheld are:
    //
    // * only one of little/big endian is set
    // * only one alias region can be set - once set it cannot be changed
    bits: u32,
}

I have been able to squeeze it down to 17-bits. Are there any bits that can be shaved off from here

view this post on Zulip Chris Fallin (Mar 04 2026 at 16:13):

Do we need to merge MemFlags in as well? I'd really prefer not to lose that distinction if we can help it. We've already got the bits in AtomicRmwOp to represent the atomic ordering, which was the original goal, right?

view this post on Zulip Akshanabha Chakraborty (Mar 04 2026 at 16:20):

            atomic_rmw: Builder::new("AtomicRmw")
                .imm(&imm.memflags)
                .imm(&imm.atomic_rmw_data)
                .value()
                .value()
                .build(),

            atomic_cas: Builder::new("AtomicCas")
                .imm(&imm.atomic_cas_memflags)
                .value()
                .value()
                .value()
                .typevar_operand(2)
                .build(),

As i found out, AtomicCas was also saturated just like AtomicRmwOp due to 3 Values that it takes, so we have to merge MemFlags and AtomicOrdering as well (for just Cas). I too found out about it after finishing the Rmw Operation Merger.

Though, as I think separating MemFlags for AtomicCas is the best option since 2 bits (readonly, can_move) are always set to 0 and hence can be reused. We are exactly one bit off from packing it all in there.

view this post on Zulip Akshanabha Chakraborty (Mar 04 2026 at 16:56):

@Chris Fallin I think we pin somehow express it using permutations as, ordering (5 states) * alias region (3 states) * endianness (3 states) * Alias (2 status) * Checked (2 states) = 180 states, should mathematically be representable in 8-bits which means we can pivot away from binary into decimal expression

I have a proof of concept here : https://play.rust-lang.org/?version=stable&mode=release&edition=2024&gist=fe6c47600459c6f568f8e4138fb2ff44

view this post on Zulip Chris Fallin (Mar 04 2026 at 18:28):

I think that if we have to do that, then we're getting a little too complex for maintainability, especially for a feature that is a nice-to-have at this point but not essential for performance (and unlikely to have any perf impact in practice).

We definitely cannot steal more bits from MemFlags; we have plans to use more bits there in a way that could actually improve performance with more alias regions (i.e., it's actually load-bearing for Wasmtime plans).

When we do get to the point of doing code-motion to a degree that atomic orderings matter for codegen quality, I think probably the way to go for AtomicCas is to externalize the info somehow -- we can't box it (InstructionData is Copy) but we could define a new entity type or something. I don't think now is the time to build that though

view this post on Zulip Chris Fallin (Mar 04 2026 at 18:29):

So I guess what I'm coming to is: this exploration has been valuable (thank you!) but maybe it's not the right time to do it

view this post on Zulip Akshanabha Chakraborty (Mar 05 2026 at 03:03):

Exactly, you are right. I shall still record my findings for others.


Last updated: Mar 23 2026 at 16:19 UTC