wasmtime / PR #9775 pulley: Implement more of loads/stores · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #9775 pulley: Implement more of loads/stores

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:18):

alexcrichton opened PR #9775 from alexcrichton:pulley-loads-and-stores to bytecodealliance:main:

This commit gets the address.wast spec test working by filling out more load/store infrastructure in Pulley. In doing so I've done a refactoring of the existing load/store methods. Changes here are:

All load/stores are *_offset32 now instead of optionally having no offset, a 8-bit offset, or a 64-bit offset.

All x-register loads/stores are prefixed with x now.

All loads/stores have "le" for little-endian in their name.

Loads/stores are refactored to have 8 and 16-bit variants.

Sign-extending loads now either extend to 32 or 64 depending on the opcode.

Float loads/stores are added.

Big-endian is handled with explicit big-endian loads/stores instead bswap to handle this transparently in the backend (e.g. for stores not ISLE-generated) and to handle floats.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:18):

alexcrichton requested fitzgen for a review on PR #9775.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:18):

alexcrichton requested wasmtime-compiler-reviewers for a review on PR #9775.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:18):

alexcrichton requested wasmtime-default-reviewers for a review on PR #9775.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:18):

alexcrichton requested wasmtime-core-reviewers for a review on PR #9775.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:19):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:19):

alexcrichton created PR review comment:

Curious on your thoughts on this in particular @fitzgen.

I'll note that we don't currently do this well, e.g. xconst8 always writes the full 64-bits and br_if has no variant that only reads 32-bits. I think we'll want to change that over time though.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:20):

alexcrichton commented on PR #9775:

At a high level the spec_testsuite/*.wast still can't be executed, but that's the next PR after this to get that executing on CI (and a large number of tests already pass)

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:26):

alexcrichton updated PR #9775.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:26):

alexcrichton requested wasmtime-fuzz-reviewers for a review on PR #9775.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:27):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:27):

alexcrichton created PR review comment:

Also curious on your thoughts here @fitzgen of the deletion of this fuzzer. Personally I feel like it's a bit of a pain to maintain while not adding much more over the wasm-smith fuzzing we have already.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:34):

alexcrichton updated PR #9775.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:44):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:44):

alexcrichton created PR review comment:

Also double-check me on this, is this right?

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:51):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:51):

fitzgen created PR review comment:

I think we could remove the parenthetical here. They must be fast to decode in software (perhaps worth adding that qualifier) at runtime and various bit-packing schemes will either be fast enough or not. I don't think it necessarily makes sense to say we won't do any bit-packing at all, since (a) we already do a little, and (b) we need to balance the decoding with instruction density and cache usage.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:51):

fitzgen created PR review comment:

We are (rightfully) burning a lot of opcode space for loads and stores, but I wonder if we can make an exception for the "only write the bottom bits" and always extend to the full register width? This would mean we wouldn't need both xload8_s32_offset32 and xload8_s64_offset32 and would effectively have only the latter. This would roughly cut in half the number of load/store instructions we have but wouldn't constrain the cranelift backend at all. The cost would be a couple extra extends on 32-bit platforms, which I don't think is too bad a trade off in this particular case.

And FWIW, I still think the default position should be to only write the low bits of registers when doing 32-bit adds/ands/xors/etc, we are just running into all the edge cases now :)

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:51):

fitzgen created PR review comment:

For xconst* I think filling the whole register is what we want regardless. They are a little different from, say, bitwise arithmetic since they aren't really an operation themselves and are more about getting data in place to perform actual operations.

Although IIRC we are still zero-extending rather than sign-extending in const*. The latter is going to save us code size, since otherwise all negative immediates will need to be loaded in registers with xconst64s. But maybe we did fix this already at one point...

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:51):

fitzgen created PR review comment:

This actually _increases_ the number of opcodes, but _decreases_ the number of instructions in most programs. That is, with macro ops we are doing _more work per instruction_. And the reason we want to do more work per instruction is because one of the biggest costs of interpretation compared to compilation is the interpreter loop's per-instruction overhead.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:51):

fitzgen created PR review comment:

Maybe also add a note about signed vs unsigned naming conventions

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:51):

fitzgen created PR review comment:

nitpick: Use the new helper here and elsewhere

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:53):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:53):

fitzgen created PR review comment:

I don't think so, since (unlike addition) subtraction lets high bits (which should be invisible) move down to the low bits (and become visible), right?

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:59):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Dec 10 2024 at 16:59):

alexcrichton created PR review comment:

Good point, I'll add notes about the balance. I was mostly thinking here of things like RISC-V instructions where the registers are spread throughout the instruction in various locations to fit around encodings of other instructions. That's the "super fancy" part but I also don't think we're at risk of getting there any time soon.

Wasmtime GitHub notifications bot (Dec 10 2024 at 17:05):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Dec 10 2024 at 17:05):

fitzgen created PR review comment:

actually I think this works correctly

Wasmtime GitHub notifications bot (Dec 10 2024 at 17:06):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Dec 10 2024 at 17:06):

fitzgen created PR review comment:

let's merge this and get further in spec tests, and eventually differential fuzzing, which will definitely catch any issues here

Wasmtime GitHub notifications bot (Dec 10 2024 at 17:25):

alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (Dec 10 2024 at 17:25):

alexcrichton created PR review comment:

This I feel is a bit entangled with the question below of "do we want xload8_s32_offset32". I'm a bit worried about xconst8 always filling the entire register mostly from the perspective of 32-bit platforms. If most constants in a program are actually 32-bits then we're in theory spending a lot of instructions to fill the upper half of the register that's never read. Cache-wise that's probably not too problematic but it's possibly problematic from an icache perspective of the implementation of xconst8 itself.

I definitely agree though that for 64-bit platforms it doesn't matter at all.

For now I'm going to go ahead with this PR as-is (with xload8_s32_offset32 for example) and these docs since we don't have a great answer one way or another and I think we're both on the fence. I'm going to add some notes to https://github.com/bytecodealliance/wasmtime/issues/9747 to come back to reevaluate though.

Wasmtime GitHub notifications bot (Dec 10 2024 at 17:26):

alexcrichton updated PR #9775.

Wasmtime GitHub notifications bot (Dec 10 2024 at 17:26):

alexcrichton has enabled auto merge for PR #9775.

Wasmtime GitHub notifications bot (Dec 10 2024 at 18:13):

alexcrichton updated PR #9775.

Wasmtime GitHub notifications bot (Dec 10 2024 at 18:13):

alexcrichton has enabled auto merge for PR #9775.

Wasmtime GitHub notifications bot (Dec 10 2024 at 18:47):

alexcrichton merged PR #9775.

Last updated: Apr 17 2025 at 18:04 UTC