Stream: git-wasmtime

Topic: wasmtime / issue #4080 Rework x64 addressing-mode lowerin...


view this post on Zulip Wasmtime GitHub notifications bot (Apr 29 2022 at 22:25):

cfallin commented on issue #4080:

@fitzgen or @abrown, this should be ready for review now. I've stacked it
on top of #4091 as otherwise it's a merge conflict. Updated description:

This PR refactors the x64 backend address-mode lowering to use an
incremental-build approach, where it considers each node in a tree of
iadds that feed into a load/store address and, at each step, builds
the best possible Amode. It will combine an arbitrary number of
constant offsets (an extension beyond the current rules), and can
capture a left-shifted (scaled) index in any position of the tree
(another extension).

This doesn't have any measurable performance improvement on our Wasm
benchmarks in Sightglass, unfortunately, because the IR lowered from
wasm32 will do address computation in 32 bits and then uextend it to
add to the 64-bit heap base. We can't quite lift the 32-bit adds to 64
bits because this loses the wraparound semantics.

(We could label adds as "expected not to overflow", and allow those to
be lifted to 64 bit operations; wasm32 heap address computation should
fit this. This is add nuw (no unsigned wrap) in LLVM IR terms. That's
likely my next step.)

Nevertheless, (i) this generalizes the cases we can handle, which should
be a good thing, all other things being equal (and in this case, no
compile time impact was measured); and (ii) might benefit non-Wasm
frontends.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2022 at 17:45):

fitzgen commented on issue #4080:

(We could label adds as "expected not to overflow", and allow _those_ to
be lifted to 64 bit operations; wasm32 heap address computation should
fit this. This is add nuw (no unsigned wrap) in LLVM IR terms. That's
likely my next step.)

This is a pretty big change for CLIF: right now it is ~basically UB-free other than memory loads/stores (and invalid raw_bitcasts?). Division by zero, for example, is not UB and semantically raises a trap. Adding the equivalent of add nuw would be the first instance of UB we have in basic arithmetic instructions, and that's something we shouldn't do lightly. I think this will have subtle and far-reaching effects that aren't immediately obvious to us now. For example, verification/superoptimization would have to keep track of UB, explicitly represent it in the SMT queries they create, and then check refinement rather than equivalence.

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2022 at 21:39):

cfallin commented on issue #4080:

Updated! @abrown let me know if the additional comments make sense.

In the process of tweaking priorities to try to get everything to apply in the right order, I was hitting some weird behavior and suspect there may be an issue with priorities in general. I'm going to go play with that a bit now, but it shouldn't affect correctness here (the order in which rules apply seems wrong sometimes but the rules themselves are applied correctly).

view this post on Zulip Wasmtime GitHub notifications bot (May 02 2022 at 23:52):

cfallin commented on issue #4080:

This is a pretty big change for CLIF: right now it is ~basically UB-free other than memory loads/stores (and invalid raw_bitcasts?). Division by zero, for example, is not UB and semantically raises a trap. Adding the equivalent of add nuw would be the first instance of UB we have in basic arithmetic instructions, and that's something we shouldn't do lightly. I think this will have subtle and far-reaching effects that aren't immediately obvious to us now. For example, verification/superoptimization would have to keep track of UB, explicitly represent it in the SMT queries they create, and then check refinement rather than equivalence.

Yeah, I've been digging into cases where I thought this might help and it seems that the adds are coming from bona fine i32.adds in the Wasm, not some part of the legalization of heap_addr with offsets; in this case we actually have to implement the wrapping behavior faithfully, so nevermind the above. I do agree as well actually that nondeterminism isn't the best answer here; if we come back to a similar issue later maybe the right answer is to just use i64s in the right spots (or the host pointer type) to avoid the extends.


Last updated: Nov 22 2024 at 16:03 UTC