wasmtime / PR #3746 s390x: Add support for all remaining ... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #3746 s390x: Add support for all remaining ...

Wasmtime GitHub notifications bot (Feb 01 2022 at 08:51):

uweigand opened PR #3746 from s390x-atomicops to main:

This adds support for all atomic operations that were unimplemented
so far in the s390x back end:

atomic_rmw operations xchg, nand, smin, smax, umin, umax

$I8 and $I16 versions of atomic_rmw and atomic_cas

little endian versions of atomic_rmw and atomic_cas

All of these have to be implemented by a compare-and-swap loop;
and for the $I8 and $I16 versions the actual atomic instruction
needs to operate on the surrounding aligned 32-bit word.

Since we cannot emit new control flow during ISLE instruction
selection, these compare-and-swap loops are emitted as a single
meta-instruction to be expanded at emit time.

However, since there is a large number of different versions of
the loop required to implement all the above operations, I've
implemented a facility to allow specifying the loop bodies
from within ISLE after all, by creating a vector of MInst
structures that will be emitted as part of the meta-instruction.

There are still restrictions, in particular instructions that
are part of the loop body may not modify any virtual register.
But even so, this approach looks preferable to doing everything
in emit.rs.

A few instructions needed in those compare-and-swap loop bodies
were added as well, in particular the RxSBG family of instructions
as well as the LOAD REVERSED in-register byte-swap instructions.

This patch also adds filetest runtests to verify the semantics
of all operations, in particular the subword and little-endian
variants (those are currently only executed on s390x).

CC @cfallin @fitzgen

Wasmtime GitHub notifications bot (Feb 01 2022 at 18:16):

fitzgen created PR review comment:

I think this kind of thing that is explicitly about side effects should be quarantined to inst.isle.

Wasmtime GitHub notifications bot (Feb 01 2022 at 18:16):

fitzgen created PR review comment:

Could we move casloop_emit and casloop_result to inst.isle and then provide a combined version that passes the result of casloop_emit to caseloop_result for you? This combined helper would present a little more of an expression-oriented facade, like we prefer to use inside of lower.isle.

Similar for casloop_emit followed by casloop_rotate_result.

Wasmtime GitHub notifications bot (Feb 01 2022 at 18:16):

fitzgen created PR review comment:

Is it possible to expand the loop bodies inside isle.rs, rather than creating a new meta instruction variant? If that's possible, I think this would allow regalloc to see the loop, and loosen the restrictions on loop bodies.

Wasmtime GitHub notifications bot (Feb 01 2022 at 18:16):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Feb 01 2022 at 18:16):

fitzgen created PR review comment:

Does "iv" stand for "instruction vec"? For readability, what do you think of calling this inst_builder_new and similar for the other iv_* decls?

Wasmtime GitHub notifications bot (Feb 01 2022 at 18:16):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Feb 02 2022 at 10:14):

uweigand updated PR #3746 from s390x-atomicops to main.

Wasmtime GitHub notifications bot (Feb 02 2022 at 10:14):

uweigand submitted PR review.

Wasmtime GitHub notifications bot (Feb 02 2022 at 10:14):

uweigand created PR review comment:

I was wondering myself where to put those helpers. I've now moved them to inst.isle.

Wasmtime GitHub notifications bot (Feb 02 2022 at 10:15):

uweigand submitted PR review.

Wasmtime GitHub notifications bot (Feb 02 2022 at 10:15):

uweigand created PR review comment:

Done.

Wasmtime GitHub notifications bot (Feb 02 2022 at 10:15):

uweigand submitted PR review.

Wasmtime GitHub notifications bot (Feb 02 2022 at 10:15):

uweigand created PR review comment:

Done.

Wasmtime GitHub notifications bot (Feb 02 2022 at 10:21):

uweigand submitted PR review.

Wasmtime GitHub notifications bot (Feb 02 2022 at 10:21):

uweigand created PR review comment:

I don't see how this would be possible. Emitting separate instructions instead of a single meta-instruction gets us back to the very problem the meta-instruction is intended to avoid: we'd need to create branches (and their target labels), which we cannot do since we cannot modify the CFG at this point. And even if could do that somehow, we'd also have to model the loop dataflow correctly: it's not just that we're writing to a register, it is that this register is a loop-carried dependency - so if we split out separate instructions, we'd have to create a new PHI node to correctly represent that ...

In any case, the restriction to hard registers works out fine for this use case on s390x (and I don't really foresee any other use cases): we need to have %r0 and %r1 as reserved registers anyway for various reasons, and using those two actually reduces register pressure in the CAS loop, which already needs quite a few of them.

Wasmtime GitHub notifications bot (Feb 02 2022 at 21:10):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Feb 02 2022 at 21:10):

cfallin created PR review comment:

@fitzgen indeed we have a restriction here that we cannot create new control-flow at the VCode level (the succs and preds and associated bits are already computed, and contiguous block IDs are already assigned). This practice of meta-instructions that contain mini-CFGs (always with a single-in single-out property, modulo traps) has been our solution so far, and we have some other examples in aarch64 atomics (CAS loops), divides (special-case checks and traps), br_tables (bounds checks), and maybe a few others. Maybe we could loosen this in the future if we think we might want opts after lowering that could make use of visibility into the instructions.

FWIW elsewhere we've mostly done this in a way that is specific to each use-case rather than a generic combinator (loop with "body" field) but if that's what works here, I'm not really opposed to it.

Wasmtime GitHub notifications bot (Feb 03 2022 at 19:22):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Feb 03 2022 at 19:22):

fitzgen created PR review comment:

Gotcha. Makes sense, but seems like a little bit of an unfortunate restriction.

Wasmtime GitHub notifications bot (Feb 03 2022 at 19:22):

fitzgen submitted PR review.

Wasmtime GitHub notifications bot (Feb 03 2022 at 19:28):

cfallin submitted PR review.

Wasmtime GitHub notifications bot (Feb 03 2022 at 19:28):

cfallin created PR review comment:

I agree, and we could potentially relax it if it turns out we need more general control flow expansion later; there is a bit of a performance tradeoff here though that a more general approach (reconstructing succs/preds and numbering blocks after lowering) would impose. As a ballpark number, when I reworked lowering to work in a single-pass, emit-blocks-in-their-final-order way, with no edits to the CFG afterward, it was IIRC about a 10% compile-time improvement.

Wasmtime GitHub notifications bot (Feb 08 2022 at 20:47):

uweigand updated PR #3746 from s390x-atomicops to main.

Wasmtime GitHub notifications bot (Feb 08 2022 at 21:48):

cfallin merged PR #3746.

Last updated: Apr 18 2025 at 04:04 UTC