wasmtime / issue #3205 Cranelift: formalize, document, an... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #3205 Cranelift: formalize, document, an...

Wasmtime GitHub notifications bot (Aug 18 2021 at 17:03):

In #3202 and a number of past issues as well, we've repeatedly encountered some uncertainty as to (i) how boolean values of varying widths are stored in registers, (ii) whether they can be loaded/stored or not and what representation they take in memory (see e.g. #3102 where we addressed this for trampoline args), and (iii) how this interacts with boolean<->integer conversion ops like bint and bmask.

The understanding we seem to generally agree on is that booleans of width > 1 bit have all bits set (true) or all bits cleared (false). And a b1 is stored in a register with the same invariant as other integer values: the upper bits (in this case everything above the LSB) are undefined.

However, this is far from being confusion-free and unambiguous; thus, it seems reasonable to (i) agree definitively on how bools are represented in registers and memory, and (ii) audit our compliance to whatever invariants we decide on.

Wasmtime GitHub notifications bot (Aug 18 2021 at 17:03):

cfallin labeled issue #3205:

In #3202 and a number of past issues as well, we've repeatedly encountered some uncertainty as to (i) how boolean values of varying widths are stored in registers, (ii) whether they can be loaded/stored or not and what representation they take in memory (see e.g. #3102 where we addressed this for trampoline args), and (iii) how this interacts with boolean<->integer conversion ops like bint and bmask.

The understanding we seem to generally agree on is that booleans of width > 1 bit have all bits set (true) or all bits cleared (false). And a b1 is stored in a register with the same invariant as other integer values: the upper bits (in this case everything above the LSB) are undefined.

However, this is far from being confusion-free and unambiguous; thus, it seems reasonable to (i) agree definitively on how bools are represented in registers and memory, and (ii) audit our compliance to whatever invariants we decide on.

Wasmtime GitHub notifications bot (Aug 18 2021 at 17:04):

cfallin commented on issue #3205:

cc @abrown @afonso360 @bjorn3 @alexcrichton @akirilov-arm for recent folks who've discussed this topic.

Wasmtime GitHub notifications bot (Aug 18 2021 at 17:22):

cfallin commented on issue #3205:

I'd like to propose the following for loads and stores:

Define loads and stores of boolean values. We haven't done this in the past, but it is a somewhat incongruous hole in our semantics, IMHO, and we do load and store booleans sometimes behind the scenes (see: trampoline args). I think it would be better if we formally say what it means to do so.

Let's define the size of a b1 to be one byte, and others (b8 and up) as the size of the equivalent integer type.

A store of a b1 stores 0 or 1 (i.e., it is zero-extended to 8 bits); a store of a b8..b128 stores all zeroes or all ones.

A load of a boolean type has undefined behavior if the value loaded from memory is not canonical as per the above bullet point.

The last is probably the most controversial; disallowing loads and stores of bools avoids the semantic question, which may have been why this was done (@sunfishcode comments?). But we already allow bitcasts, so we already have some notion of the bits underlying a bool, so IMHO it's better to allow loading/storing those bits.

An alternative is to canonicalize a boolean when loading. For a b1 this is a mask (and op); for a b8 and up we could just compare to zero and if the loaded value is nonzero (any bit is set), canonicalize to true (all bits set). I actually prefer this option, after a bit of thought: it gives well-defined semantics and avoids illegal bool values (e.g. b128 with half the bits set) that might cause other undefined behavior.

Then for bitcasts:

A raw_bitcast or bitcast from a bool to an int produces the same value that a store would produce.

A raw_bitcast or bitcast from an int to a bool canonicalizes just as a load does.

All of the above should fully pin down how we store the values in registers as well: we (i) have a well-defined integer interpretation of the bool, (ii) always store it in canonicalized form (all zeroes or all ones in defined bits), and (iii) keep to the same "upper bits undefined" invariant as for integers. With one exception: for a b1 type I think that we should maintain the lowest 8 bits defined, i.e. lowest byte of 0x00 or 0x01 only, so that a store can be done without a mask.

Thoughts?

Wasmtime GitHub notifications bot (Aug 18 2021 at 17:24):

cfallin edited a comment on issue #3205:

I'd like to propose the following for loads and stores:

Define loads and stores of boolean values. We haven't done this in the past, but it is a somewhat incongruous hole in our semantics, IMHO, and we do load and store booleans sometimes behind the scenes (see: trampoline args). I think it would be better if we formally say what it means to do so.

Let's define the size of a b1 to be one byte, and others (b8 and up) as the size of the equivalent integer type.

A store of a b1 stores 0 or 1 (i.e., it is zero-extended to 8 bits); a store of a b8..b128 stores all zeroes or all ones.

<strike>A load of a boolean type has undefined behavior if the value loaded from memory is not canonical as per the above bullet point.</strike> better option, see below

The last is probably the most controversial; disallowing loads and stores of bools avoids the semantic question, which may have been why this was done (@sunfishcode comments?). But we already allow bitcasts, so we already have some notion of the bits underlying a bool, so IMHO it's better to allow loading/storing those bits.

An alternative is to canonicalize a boolean when loading. For a b1 this is a mask (and op); for a b8 and up we could just compare to zero and if the loaded value is nonzero (any bit is set), canonicalize to true (all bits set). I actually prefer this option, after a bit of thought: it gives well-defined semantics and avoids illegal bool values (e.g. b128 with half the bits set) that might cause other undefined behavior.

Then for bitcasts:

A raw_bitcast or bitcast from a bool to an int produces the same value that a store would produce.

A raw_bitcast or bitcast from an int to a bool canonicalizes just as a load does.

All of the above should fully pin down how we store the values in registers as well: we (i) have a well-defined integer interpretation of the bool, (ii) always store it in canonicalized form (all zeroes or all ones in defined bits), and (iii) keep to the same "upper bits undefined" invariant as for integers. With one exception: for a b1 type I think that we should maintain the lowest 8 bits defined, i.e. lowest byte of 0x00 or 0x01 only, so that a store can be done without a mask.

Thoughts?

Wasmtime GitHub notifications bot (Aug 18 2021 at 20:20):

cfallin commented on issue #3205:

I've just added this to the agenda for Monday's public Cranelift biweekly meeting as well.

Wasmtime GitHub notifications bot (Aug 18 2021 at 20:38):

abrown commented on issue #3205:

I think a side effect of going doing the path above would be that raw_bitcast is no longer necessary. bitcast says: "A bitcast is equivalent to storing one type and loading the other type from the same address." If we can load and store booleans I think that eliminates the reason for raw_bitcast's existence (?).

Wasmtime GitHub notifications bot (Aug 18 2021 at 20:43):

cfallin commented on issue #3205:

That's a good point; the distinction between the two is pretty confusing, so if we could remove raw_bitcast as well and have only one kind of bitcast, all the better.

Wasmtime GitHub notifications bot (Aug 18 2021 at 22:17):

fitzgen commented on issue #3205:

A store of a b1 stores 0 or 1 (i.e., it is zero-extended to 8 bits); a store of a b8..b128 stores all zeroes or all ones.

Why not have b1 store all ones? The inconsistency between b1 and b8..b128 is a bit of a surprise.

Wasmtime GitHub notifications bot (Aug 18 2021 at 22:23):

cfallin commented on issue #3205:

Why not have b1 store all ones?

So my thought process is: b1 is, semantically, one bit wide; so "all ones" is just 0x01. But because memory is byte-oriented, we have to store a whole byte, so we zero-extend. I definitely recognize this is kind of subjective and "all ones in all bits that are written" is just as much a valid perspective though.

Pragmatic reasons: b1 will be most common, and generating a zero or one is slightly easier than zero or -1 (all ones), e.g. the one instruction cset on aarch64; and bint (bool to int) from a b1 with a zero-or-one value is a no-op (just a copy) while it's a masking operation with zero or -1.

Wasmtime GitHub notifications bot (Aug 18 2021 at 22:35):

cfallin commented on issue #3205:

I should probably add another possibility too: we could define all of b1..b128 to use 0 for false and 1 for true.

The main reason for the all-ones representation, to my understanding, was to be able to use e.g. a b128 as a SIMD mask after a raw_bitcast (or at least, that's the way it has been explained to me -- many issues back I think someone linked some code that had assumed this). But this is sort of getting a desired higher-level semantic behavior implicitly by setting up the low-level definitions in a certain way; it would be better to require the "turn this bool into an all-ones mask" use-case to use bmask, which already exists for this purpose. Then we get to use the (IMHO) slightly more intuitive 0/1 values which work well for most cases, and we have a fully consistent behavior. We would just need to look for use-cases that assume all-ones currently and ensure they use bmask instead.

This probably needs a sanity check from a SIMD perspective though -- @abrown / @jlb6740 / @akirilov-arm, thoughts?

Wasmtime GitHub notifications bot (Aug 18 2021 at 22:42):

cfallin commented on issue #3205:

Ah, here is one example of a dependence on all-ones in cg_clif: link (linked by @afonso360 in #3003)

Wasmtime GitHub notifications bot (Aug 18 2021 at 22:48):

abrown commented on issue #3205:

I would mention that the all-ones/all-zeroes representation for SIMD is not just because using the output of a comparison as a mask is convenient, it's also that the Wasm SIMD spec specifies that representation (but obviously without the boolean type) and that machines (x64, aarch64) represent things this way as well.

Wasmtime GitHub notifications bot (Aug 18 2021 at 23:35):

cfallin commented on issue #3205:

@abrown this has me thinking further: the existing Wasm-SIMD implementation (lowering in cranelift-wasm specifically) does not use b128 (or any other boolean type), right?

So one fundamental question I have is: what is the use-case for boolean types wider than 1 bit? If for an actual SIMD use-case we represent the mask as an I8X16 (that's what I gather from here and the implementation of optionally_bitcast_vector), then it seems that the question of what Wasm-SIMD masks require is separate from the question of what b128 means.

In other words, what we decide here doesn't impact Wasm-SIMD, because Wasm-SIMD doesn't depend on CLIF-level bool types; so it's better to base on decision on whatever makes the most semantic sense and leads to the least confusion, IMHO.

If b128 is just "a 1-bit-narrow value stored in a much wider space", then IMHO it makes as much sense to zero-extend it (as one would zero-extend an i8 stored in an i128 worth of space) as sign-extend it.

Argued from another perspective: If we say that booleans (abstractly) have two possible values, then storing such a value in anything wider than 1 bit grants us the freedom to choose the bit representation for each of those two possible values. In other words, b128 is still just semantically true or false; the bit-patterns that represent those are invisible at the CLIF-semantics level.

So I'm starting to convince myself a bit more than the 0/1 approach is cleanest, and then we need to look for uses of raw_bitcast (as in cg_clif) to peek under the hood and see the actual bits, and convert those uses into bmask or bint as appropriate. But I'm (still) very curious how that sits with others...

Wasmtime GitHub notifications bot (Aug 19 2021 at 06:21):

bjorn3 commented on issue #3205:

Why not have b1 store all ones? The inconsistency between b1 and b8..b128 is a bit of a surprise.

Rust stores bools as 0 or 1. Storing b1 as all zeros or all ones would require an extra instruction in the common case to convert it to 0 or 1 when storing it. The larger boolean types are pretty much only useful for vector types where it is common for masks to have all zeros or all ones as value.

Wasmtime GitHub notifications bot (Aug 23 2021 at 12:18):

akirilov-arm commented on issue #3205:

@cfallin I am a bit confused about your comments on B128 and its relation to SIMD - isn't that a scalar type? I.e. in the SIMD case wouldn't one use something like B16X8?

Pragmatic reasons: b1 will be most common, and generating a zero or one is slightly easier than zero or -1 (all ones), e.g. the one instruction cset on aarch64...

The all ones case is actually a single AArch64 instruction as well, CSETM (assuming that the upper bits for types like B16 are undefined, so that we can set them to all ones).

Wasmtime GitHub notifications bot (Aug 23 2021 at 15:25):

cfallin commented on issue #3205:

@akirilov-arm indeed, that was more or less my point as well -- in the past it has been described as all-ones "because SIMD" and e.g. see the raw_bitcast linked above that uses it as a mask, but really it's a true/false value still, and so the fundamental question is "what does a multi-bit scalar boolean mean".

Wasmtime GitHub notifications bot (May 04 2022 at 22:58):

cfallin labeled issue #3205:

In #3202 and a number of past issues as well, we've repeatedly encountered some uncertainty as to (i) how boolean values of varying widths are stored in registers, (ii) whether they can be loaded/stored or not and what representation they take in memory (see e.g. #3102 where we addressed this for trampoline args), and (iii) how this interacts with boolean<->integer conversion ops like bint and bmask.

The understanding we seem to generally agree on is that booleans of width > 1 bit have all bits set (true) or all bits cleared (false). And a b1 is stored in a register with the same invariant as other integer values: the upper bits (in this case everything above the LSB) are undefined.

However, this is far from being confusion-free and unambiguous; thus, it seems reasonable to (i) agree definitively on how bools are represented in registers and memory, and (ii) audit our compliance to whatever invariants we decide on.

Wasmtime GitHub notifications bot (May 04 2022 at 22:58):

cfallin labeled issue #3205:

In #3202 and a number of past issues as well, we've repeatedly encountered some uncertainty as to (i) how boolean values of varying widths are stored in registers, (ii) whether they can be loaded/stored or not and what representation they take in memory (see e.g. #3102 where we addressed this for trampoline args), and (iii) how this interacts with boolean<->integer conversion ops like bint and bmask.

The understanding we seem to generally agree on is that booleans of width > 1 bit have all bits set (true) or all bits cleared (false). And a b1 is stored in a register with the same invariant as other integer values: the upper bits (in this case everything above the LSB) are undefined.

However, this is far from being confusion-free and unambiguous; thus, it seems reasonable to (i) agree definitively on how bools are represented in registers and memory, and (ii) audit our compliance to whatever invariants we decide on.

Wasmtime GitHub notifications bot (Oct 04 2022 at 21:20):

elliottt commented on issue #3205:

Here's a link to the 2021-08-23 meeting notes where this was discussed originally.

Wasmtime GitHub notifications bot (Oct 04 2022 at 23:25):

elliottt commented on issue #3205:

There has been discussion recently about whether or not we need the boolean types at all, and we seem to be converging on the opinion that we should remove them rather than handle the issues surrounding load/store with b1 values. Effectively, this would mean that we will remove all boolean types, scalar and vector, replacing them with their bit-width integer equivalent (except for b1 that would be replaced with i8).

I'd like to propose the following semantics for instructions that return boolean values now:

Instructions that currently return b1 values will switch to returning i8 values in the range [0,1]. This doesn't immediately address the current issues surrounding load/store with b1 values, and I'm interested to hear how we should address cases that look like x = load.i8 ...; brz x ....

All other boolean types would return integer values of the same bit-width, where true would be represented as all 1s and false would be represented as all 0s. I believe this would be effectively no value-level change from the current behavior.

Wasmtime GitHub notifications bot (Oct 04 2022 at 23:48):

cfallin commented on issue #3205:

Thanks @elliottt -- I think the above is largely a good plan and will result in nice simplifications and removal of awkward special cases!

x = load.i8 ...; brz x ...

For this case specifically I think the semantics would be for brz to branch if x != 0; in fact I think it already works this way (accepted types defined here and the mnemonics are "branch if zero" and "branch if not zero"). So the 0/1 encodings of a bool work as always, and other nonzero values are also truthy.

Wasmtime GitHub notifications bot (Oct 05 2022 at 16:15):

abrown commented on issue #3205:

I agree. IIRC, there may be an optimization or two that rely on the boolean type but given the move towards e-graphs those optimizations will need to be re-worked anyway.

Wasmtime GitHub notifications bot (Oct 17 2022 at 23:00):

elliottt closed issue #3205:

In #3202 and a number of past issues as well, we've repeatedly encountered some uncertainty as to (i) how boolean values of varying widths are stored in registers, (ii) whether they can be loaded/stored or not and what representation they take in memory (see e.g. #3102 where we addressed this for trampoline args), and (iii) how this interacts with boolean<->integer conversion ops like bint and bmask.

The understanding we seem to generally agree on is that booleans of width > 1 bit have all bits set (true) or all bits cleared (false). And a b1 is stored in a register with the same invariant as other integer values: the upper bits (in this case everything above the LSB) are undefined.

However, this is far from being confusion-free and unambiguous; thus, it seems reasonable to (i) agree definitively on how bools are represented in registers and memory, and (ii) audit our compliance to whatever invariants we decide on.

Last updated: Apr 18 2025 at 18:04 UTC