wasmtime / PR #2488 Make endianness explicit in Cranelift... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #2488 Make endianness explicit in Cranelift...

Wasmtime GitHub notifications bot (Dec 08 2020 at 13:13):

uweigand opened PR #2488 from endian-memory-v4 to main:

WebAssembly memory operations are by definition little-endian even on
big-endian target platforms. However, other memory accesses will require
native target endianness (e.g. to access parts of the VMContext that is
also accessed by VM native code). This means on big-endian targets,
the code generator will have to handle both little- and big-endian
memory accesses. However, there is currently no way to encode that
distinction into the Cranelift IR that describes memory accesses.

This patch provides such a way by adding an explicit endianness marker
to each instance of MemFlags. Since each Cranelift IR instruction that
describes memory accesses already has an instance of MemFlags attached,
that will now automatically provide endianness information.

Beyond the core mechanics of adding the endianness marker to MemFlags,
this patch also updates all creators of MemFlags instances. In order
to ensure each MemFlags has the correct endianness marker, it is now
mandatory to pass such a marker whenever constructing a MemFlags.

This in turn requires all places constructing a MemFlags (except for
the WebAssembly translator) to pass in the native target endianness.
This can be determined from the TargetIsa where that is readily
available; front-end code wil retrieve the target endianness from a new
member of the TargetFrontendConfig structure filled by the back end.

In addition, the GlobalValueData::Load structure was changed to hold
a full MemFlags instead of just a "readonly" flag, which not only
simplifies related code a bit but now allows passing through of
endianness information as well.

Finally, IR parsing and debug printing routines were updated to
handle the new endianness marker; this in turn required updating
Cranelift "filetests" test cases that contain memory instructions.

This patch addresses issue #2124.

Wasmtime GitHub notifications bot (Dec 08 2020 at 13:31):

uweigand updated PR #2488 from endian-memory-v4 to main:

WebAssembly memory operations are by definition little-endian even on
big-endian target platforms. However, other memory accesses will require
native target endianness (e.g. to access parts of the VMContext that is
also accessed by VM native code). This means on big-endian targets,
the code generator will have to handle both little- and big-endian
memory accesses. However, there is currently no way to encode that
distinction into the Cranelift IR that describes memory accesses.

This patch provides such a way by adding an explicit endianness marker
to each instance of MemFlags. Since each Cranelift IR instruction that
describes memory accesses already has an instance of MemFlags attached,
that will now automatically provide endianness information.

Beyond the core mechanics of adding the endianness marker to MemFlags,
this patch also updates all creators of MemFlags instances. In order
to ensure each MemFlags has the correct endianness marker, it is now
mandatory to pass such a marker whenever constructing a MemFlags.

This in turn requires all places constructing a MemFlags (except for
the WebAssembly translator) to pass in the native target endianness.
This can be determined from the TargetIsa where that is readily
available; front-end code wil retrieve the target endianness from a new
member of the TargetFrontendConfig structure filled by the back end.

In addition, the GlobalValueData::Load structure was changed to hold
a full MemFlags instead of just a "readonly" flag, which not only
simplifies related code a bit but now allows passing through of
endianness information as well.

Finally, IR parsing and debug printing routines were updated to
handle the new endianness marker; this in turn required updating
Cranelift "filetests" test cases that contain memory instructions.

This patch addresses issue #2124.

Wasmtime GitHub notifications bot (Dec 08 2020 at 13:42):

bjorn3 created PR Review Comment:

I think using the native endian by default would be nicer. Except for WASM you almost always want to use the native endian.

Wasmtime GitHub notifications bot (Dec 08 2020 at 13:42):

bjorn3 submitted PR Review.

Wasmtime GitHub notifications bot (Dec 08 2020 at 13:45):

uweigand submitted PR Review.

Wasmtime GitHub notifications bot (Dec 08 2020 at 13:45):

uweigand created PR Review Comment:

But where would you get that? It depends on the target.

Wasmtime GitHub notifications bot (Dec 08 2020 at 13:48):

bjorn3 submitted PR Review.

Wasmtime GitHub notifications bot (Dec 08 2020 at 13:48):

bjorn3 created PR Review Comment:

An empty MemFlags would mean native endian, but you could add a flag to override it to either little endian or big endian. A little endian target will then look if the big endian flag exists and use little endian otherwise. A big endian target will look if the little endian flag exists and use big endian otherwise. You could add a method to MemFlags that accepts the native endian and computes the real endian of the memory instruction to avoid duplication of this logic.

Wasmtime GitHub notifications bot (Dec 08 2020 at 13:48):

uweigand submitted PR Review.

Wasmtime GitHub notifications bot (Dec 08 2020 at 13:48):

uweigand created PR Review Comment:

Also, the point of requiring every MemFlags to carry explicit endianness was to make it impossible to accidentally "lose" endianness e.g. during IR transformations. This is of course a decision that can be made either way: is it more important to try and prevent this type of accidental bug, at the cost of added verbosity, or the other way round?

Wasmtime GitHub notifications bot (Dec 08 2020 at 14:06):

uweigand updated PR #2488 from endian-memory-v4 to main:

WebAssembly memory operations are by definition little-endian even on
big-endian target platforms. However, other memory accesses will require
native target endianness (e.g. to access parts of the VMContext that is
also accessed by VM native code). This means on big-endian targets,
the code generator will have to handle both little- and big-endian
memory accesses. However, there is currently no way to encode that
distinction into the Cranelift IR that describes memory accesses.

This patch provides such a way by adding an explicit endianness marker
to each instance of MemFlags. Since each Cranelift IR instruction that
describes memory accesses already has an instance of MemFlags attached,
that will now automatically provide endianness information.

Beyond the core mechanics of adding the endianness marker to MemFlags,
this patch also updates all creators of MemFlags instances. In order
to ensure each MemFlags has the correct endianness marker, it is now
mandatory to pass such a marker whenever constructing a MemFlags.

This in turn requires all places constructing a MemFlags (except for
the WebAssembly translator) to pass in the native target endianness.
This can be determined from the TargetIsa where that is readily
available; front-end code wil retrieve the target endianness from a new
member of the TargetFrontendConfig structure filled by the back end.

In addition, the GlobalValueData::Load structure was changed to hold
a full MemFlags instead of just a "readonly" flag, which not only
simplifies related code a bit but now allows passing through of
endianness information as well.

Finally, IR parsing and debug printing routines were updated to
handle the new endianness marker; this in turn required updating
Cranelift "filetests" test cases that contain memory instructions.

This patch addresses issue #2124.

Wasmtime GitHub notifications bot (Dec 08 2020 at 18:14):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Dec 08 2020 at 18:14):

cfallin created PR Review Comment:

I think this is a case where we want to prefer explicitness: otherwise it's too easy to introduce assumptions that break only on big-endian systems. Hopefully we eventually have such a system on CI, but it's still better to lift the safety to the API/type level.

Wasmtime GitHub notifications bot (Dec 09 2020 at 01:32):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Dec 09 2020 at 01:32):

cfallin submitted PR Review.

Wasmtime GitHub notifications bot (Dec 09 2020 at 01:32):

cfallin created PR Review Comment:

If we have doc-comments here, I think we should start with what the flag means then what the consequences (undefined behavior if you violate the implication of the flag) are. So something like:

notrap: Program promises that the access will not trap; if it does, the result is undefined (usually a fatal signal). Causes backends to omit trap metadata for resulting access. Set by MemFlags::trusted(), and in a virtual-machine / sandboxing context, usually used for "trusted" loads to VM data structures, i.e., addresses not controlled by untrusted guest code.

aligned: Program promises that the access is aligned to the natural alignment for the element type. If not, result is undefined (may result in traps on some machines).

readonly: Program promises that the access is to readonly data. If used for a store, or if the underlying memory content is changed, then results are undefined. In practice, allows the compiler to assume that one loaded value can be reused and assumed to be up-to-date.

bigendian: (your comment)

Wasmtime GitHub notifications bot (Dec 09 2020 at 01:32):

cfallin created PR Review Comment:

Outdated doc-comment? In principle, any of the MemFlags bits could apply to a global value load, not just readonly, I think.

Wasmtime GitHub notifications bot (Dec 09 2020 at 01:32):

cfallin created PR Review Comment:

Also, to add to above: it might be nice to add some rationale here; surely some readers will be confused by this, as it feels a bit odd coming from e.g. LLVM. We should state that this arises from the goal to have a completely machine-independent IR, whose meaning doesn't change across targets; in contrast, other IRs often are generated with target information already in mind. The client that generates IR may wish to consult target info to optimally choose "native" endian for cases where it doesn't matter, but it's better to be explicit about this in all cases.

Wasmtime GitHub notifications bot (Dec 09 2020 at 01:35):

cfallin edited PR Review Comment.

Wasmtime GitHub notifications bot (Dec 14 2020 at 22:00):

cfallin closed without merge PR #2488.

Last updated: Apr 18 2025 at 08:04 UTC