Stream: git-wasmtime

Topic: wasmtime / PR #9775 pulley: Implement more of loads/stores


view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:18):

alexcrichton opened PR #9775 from alexcrichton:pulley-loads-and-stores to bytecodealliance:main:

This commit gets the address.wast spec test working by filling out more load/store infrastructure in Pulley. In doing so I've done a refactoring of the existing load/store methods. Changes here are:

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:18):

alexcrichton requested fitzgen for a review on PR #9775.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:18):

alexcrichton requested wasmtime-compiler-reviewers for a review on PR #9775.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:18):

alexcrichton requested wasmtime-default-reviewers for a review on PR #9775.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:18):

alexcrichton requested wasmtime-core-reviewers for a review on PR #9775.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:19):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:19):

alexcrichton created PR review comment:

Curious on your thoughts on this in particular @fitzgen.

I'll note that we don't currently do this well, e.g. xconst8 always writes the full 64-bits and br_if has no variant that only reads 32-bits. I think we'll want to change that over time though.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:20):

alexcrichton commented on PR #9775:

At a high level the spec_testsuite/*.wast still can't be executed, but that's the next PR after this to get that executing on CI (and a large number of tests already pass)

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:26):

alexcrichton updated PR #9775.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:26):

alexcrichton requested wasmtime-fuzz-reviewers for a review on PR #9775.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:27):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:27):

alexcrichton created PR review comment:

Also curious on your thoughts here @fitzgen of the deletion of this fuzzer. Personally I feel like it's a bit of a pain to maintain while not adding much more over the wasm-smith fuzzing we have already.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:34):

alexcrichton updated PR #9775.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:44):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:44):

alexcrichton created PR review comment:

Also double-check me on this, is this right?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:51):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:51):

fitzgen created PR review comment:

I think we could remove the parenthetical here. They must be fast to decode in software (perhaps worth adding that qualifier) at runtime and various bit-packing schemes will either be fast enough or not. I don't think it necessarily makes sense to say we won't do any bit-packing at all, since (a) we already do a little, and (b) we need to balance the decoding with instruction density and cache usage.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:51):

fitzgen created PR review comment:

We are (rightfully) burning a lot of opcode space for loads and stores, but I wonder if we can make an exception for the "only write the bottom bits" and always extend to the full register width? This would mean we wouldn't need both xload8_s32_offset32 and xload8_s64_offset32 and would effectively have only the latter. This would roughly cut in half the number of load/store instructions we have but wouldn't constrain the cranelift backend at all. The cost would be a couple extra extends on 32-bit platforms, which I don't think is too bad a trade off in this particular case.

And FWIW, I still think the default position should be to only write the low bits of registers when doing 32-bit adds/ands/xors/etc, we are just running into all the edge cases now :)

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:51):

fitzgen created PR review comment:

For xconst* I think filling the whole register is what we want regardless. They are a little different from, say, bitwise arithmetic since they aren't really an operation themselves and are more about getting data in place to perform actual operations.

Although IIRC we are still zero-extending rather than sign-extending in const*. The latter is going to save us code size, since otherwise all negative immediates will need to be loaded in registers with xconst64s. But maybe we did fix this already at one point...

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:51):

fitzgen created PR review comment:

This actually _increases_ the number of opcodes, but _decreases_ the number of instructions in most programs. That is, with macro ops we are doing _more work per instruction_. And the reason we want to do more work per instruction is because one of the biggest costs of interpretation compared to compilation is the interpreter loop's per-instruction overhead.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:51):

fitzgen created PR review comment:

Maybe also add a note about signed vs unsigned naming conventions

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:51):

fitzgen created PR review comment:

nitpick: Use the new helper here and elsewhere

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:53):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:53):

fitzgen created PR review comment:

I don't think so, since (unlike addition) subtraction lets high bits (which should be invisible) move down to the low bits (and become visible), right?

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:59):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 16:59):

alexcrichton created PR review comment:

Good point, I'll add notes about the balance. I was mostly thinking here of things like RISC-V instructions where the registers are spread throughout the instruction in various locations to fit around encodings of other instructions. That's the "super fancy" part but I also don't think we're at risk of getting there any time soon.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 17:05):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 17:05):

fitzgen created PR review comment:

actually I think this works correctly

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 17:06):

fitzgen submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 17:06):

fitzgen created PR review comment:

let's merge this and get further in spec tests, and eventually differential fuzzing, which will definitely catch any issues here

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 17:25):

alexcrichton submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 17:25):

alexcrichton created PR review comment:

This I feel is a bit entangled with the question below of "do we want xload8_s32_offset32". I'm a bit worried about xconst8 always filling the entire register mostly from the perspective of 32-bit platforms. If most constants in a program are actually 32-bits then we're in theory spending a lot of instructions to fill the upper half of the register that's never read. Cache-wise that's probably not too problematic but it's possibly problematic from an icache perspective of the implementation of xconst8 itself.

I definitely agree though that for 64-bit platforms it doesn't matter at all.

For now I'm going to go ahead with this PR as-is (with xload8_s32_offset32 for example) and these docs since we don't have a great answer one way or another and I think we're both on the fence. I'm going to add some notes to https://github.com/bytecodealliance/wasmtime/issues/9747 to come back to reevaluate though.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 17:26):

alexcrichton updated PR #9775.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 17:26):

alexcrichton has enabled auto merge for PR #9775.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 18:13):

alexcrichton updated PR #9775.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 18:13):

alexcrichton has enabled auto merge for PR #9775.

view this post on Zulip Wasmtime GitHub notifications bot (Dec 10 2024 at 18:47):

alexcrichton merged PR #9775.


Last updated: Dec 23 2024 at 13:07 UTC