Stream: git-wasmtime

Topic: wasmtime / PR #10276 x64: Refactor assembler ISLE constru...


view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2025 at 22:30):

alexcrichton requested abrown for a review on PR #10276.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2025 at 22:30):

alexcrichton requested wasmtime-compiler-reviewers for a review on PR #10276.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2025 at 22:30):

alexcrichton opened PR #10276 from alexcrichton:assembler-isle-refactor to bytecodealliance:main:

This commit is spawned out of discussion between me and Andrew in conjunction with the thoughts in #10238. The goal here is to pave a way forward for various kinds of instructions in the future as well as give access to more instructions today we already have formats for.

The major changes in this commit are:

Using this new support various x64_*_mem instructions have now moved over to the new assembler and using that instead. Looking to the future this is intended to make it easier to generate constructors that return ProducesFlags or ConsumesFlags such as x64_adc and friends. This will require more refactoring to enable this but the goal is to move roughly in such a direction.

I've attempted to make this abstract enough that it'll be relatively easily extensible in the future to more ISLE constructors with minimal changes, so some abstractions here may not be fully used just yet but the hope is that they will be in the near future.

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2025 at 22:31):

alexcrichton commented on PR #10276:

I'll also note that this creates a lot of conflicts with https://github.com/bytecodealliance/wasmtime/pull/10273, and I'm happy to have that go through first and rebase around that, or also have this go in first and I can help that PR rebase around this. Either way works for me

view this post on Zulip Wasmtime GitHub notifications bot (Feb 22 2025 at 23:44):

github-actions[bot] commented on PR #10276:

Subscribe to Label Action

cc @cfallin, @fitzgen

<details>
This issue or pull request has been labeled: "cranelift", "cranelift:area:x64", "isle"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2025 at 03:17):

alexcrichton commented on PR #10276:

An example of extending this is https://github.com/alexcrichton/wasmtime/commit/ea299c4bc6bc05cc7662f5a80903a350a5a22751 where I've updated the add/adc instructions to using ProducesFlags and ConsumesFlags in ISLE. That enabled removing more adc/add constructors using the now-old MInst variants. It generates a lot of constructors that we don't actually use today, but they're all there and much more easily accessible than they are today.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2025 at 05:48):

cfallin requested cfallin for a review on PR #10276.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2025 at 05:49):

cfallin commented on PR #10276:

Adding myself as a reviewer here -- want to take a look, excited to see progress on untangling some of the layers of abstraction.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 23 2025 at 18:16):

alexcrichton commented on PR #10276:

Some before/after of ISLE constructors for this PR are:

before

(decl x64_addb_mi (AssemblerReadWriteGprMem AssemblerImm8) AssemblerReadWriteGprMem)
(extern constructor x64_addb_mi x64_addb_mi)
(decl x64_addb_mr (AssemblerReadWriteGprMem AssemblerReadGpr) AssemblerReadWriteGprMem)
(extern constructor x64_addb_mr x64_addb_mr)
(decl x64_addb_rm (AssemblerReadWriteGpr AssemblerReadGprMem) AssemblerReadWriteGpr)
(extern constructor x64_addb_rm x64_addb_rm)

after

(decl x64_addb_mi_raw (GprMem AssemblerImm8) MInstAndGprMem)
(extern constructor x64_addb_mi_raw x64_addb_mi_raw)
(decl x64_addb_mi_mem (Amode AssemblerImm8) SideEffectNoResult)
(rule (x64_addb_mi_mem rm8 imm8) (side_effect_minst_and_gpr_mem (x64_addb_mi_raw rm8 imm8)))
(decl x64_addb_mi (Gpr AssemblerImm8) Gpr)
(rule (x64_addb_mi rm8 imm8) (emit_minst_and_gpr_mem (x64_addb_mi_raw rm8 imm8)))

(decl x64_addb_mr_raw (GprMem Gpr) MInstAndGprMem)
(extern constructor x64_addb_mr_raw x64_addb_mr_raw)
(decl x64_addb_mr_mem (Amode Gpr) SideEffectNoResult)
(rule (x64_addb_mr_mem rm8 r8) (side_effect_minst_and_gpr_mem (x64_addb_mr_raw rm8 r8)))
(decl x64_addb_mr (Gpr Gpr) Gpr)
(rule (x64_addb_mr rm8 r8) (emit_minst_and_gpr_mem (x64_addb_mr_raw rm8 r8)))

(decl x64_addb_rm_raw (Gpr GprMem) MInstAndGpr)
(extern constructor x64_addb_rm_raw x64_addb_rm_raw)
(decl x64_addb_rm (Gpr GprMem) Gpr)
(rule (x64_addb_rm r8 rm8) (emit_minst_and_gpr (x64_addb_rm_raw r8 rm8)))

view this post on Zulip Wasmtime GitHub notifications bot (Feb 24 2025 at 16:54):

abrown commented on PR #10276:

Ok, I'll take a look at this today. After we talked, I took the approach a few steps further so I'll try to reconcile that with this.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 24 2025 at 22:48):

abrown commented on PR #10276:

Ok, as a point of comparison, here is where I ended up for the same instructions above:

(decl x64_addb_mi (GprMem u8) AssemblerOutputs)
(extern constructor x64_addb_mi x64_addb_mi)
(decl emit_x64_addb_mi (GprMem u8) Gpr)
(rule (emit_x64_addb_mi GprMem u8)
  (let (
    (out AssemblerOutputs (x64_addb_mi GprMem u8))
    (inst MInst (asm_out_inst out))
    (gpr Gpr (asm_out_gpr out))
    (_ Unit (emit inst))
  )
  gpr))

(decl x64_addb_mr (GprMem Gpr) AssemblerOutputs)
(extern constructor x64_addb_mr x64_addb_mr)
(decl emit_x64_addb_mr (GprMem Gpr) Gpr)
(rule (emit_x64_addb_mr GprMem Gpr)
  (let (
    (out AssemblerOutputs (x64_addb_mr GprMem Gpr))
    (inst MInst (asm_out_inst out))
    (gpr Gpr (asm_out_gpr out))
    (_ Unit (emit inst))
  )
  gpr))

(decl x64_addb_rm (Gpr GprMem) AssemblerOutputs)
(extern constructor x64_addb_rm x64_addb_rm)
(decl emit_x64_addb_rm (Gpr GprMem) Gpr)
(rule (emit_x64_addb_rm Gpr GprMem)
  (let (
    (out AssemblerOutputs (x64_addb_rm Gpr GprMem))
    (inst MInst (asm_out_inst out))
    (gpr Gpr (asm_out_gpr out))
    (_ Unit (emit inst))
  )
  gpr))

What's going on in this version is that (1) the x64_addb_* ISLE now returns an AssemblerOutputs instead of actually emitting the instruction; the emission part is left to (2) some emit_x64_addb_* ISLE which we generate right beside it. Why? I was a bit concerned that if we keep on generating new variants of these things (_mem, _flags, etc.) we end up muddying the waters; instead, we can use the "return an inst" version in (1) to fit into all the existing conventions ISLE defines: ProducesFlags, ConsumesFlags, SideEffect, etc.

What is AssemblerOutputs, you ask?

(type AssemblerOutputs (enum
      (WritesMemory (inst MInst))
      (WritesGpr (inst MInst) (gpr Gpr))
))

I was thinking that this core enum can be matched on or auto-converted (AssemblerOutputs.WritesMemory --> SideEffect) into the existing compiler but don't perpetuate all the *Flags and SideEffect logic in the generator. If we were to change any of those conventions, I was hoping this approach would reduce the blast radius. Thoughts?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 24 2025 at 22:51):

abrown edited a comment on PR #10276:

Ok, as a point of comparison, here is where I ended up for the same instructions above:

(decl x64_addb_mi (GprMem u8) AssemblerOutputs)
(extern constructor x64_addb_mi x64_addb_mi)
(decl emit_x64_addb_mi (GprMem u8) Gpr)
(rule (emit_x64_addb_mi GprMem u8)
  (let (
    (out AssemblerOutputs (x64_addb_mi GprMem u8))
    (inst MInst (asm_out_inst out))
    (gpr Gpr (asm_out_gpr out))
    (_ Unit (emit inst))
  )
  gpr))

(decl x64_addb_mr (GprMem Gpr) AssemblerOutputs)
(extern constructor x64_addb_mr x64_addb_mr)
(decl emit_x64_addb_mr (GprMem Gpr) Gpr)
(rule (emit_x64_addb_mr GprMem Gpr)
  (let (
    (out AssemblerOutputs (x64_addb_mr GprMem Gpr))
    (inst MInst (asm_out_inst out))
    (gpr Gpr (asm_out_gpr out))
    (_ Unit (emit inst))
  )
  gpr))

(decl x64_addb_rm (Gpr GprMem) AssemblerOutputs)
(extern constructor x64_addb_rm x64_addb_rm)
(decl emit_x64_addb_rm (Gpr GprMem) Gpr)
(rule (emit_x64_addb_rm Gpr GprMem)
  (let (
    (out AssemblerOutputs (x64_addb_rm Gpr GprMem))
    (inst MInst (asm_out_inst out))
    (gpr Gpr (asm_out_gpr out))
    (_ Unit (emit inst))
  )
  gpr))

What's going on in this version is that (1) the x64_addb_* ISLE now returns an AssemblerOutputs instead of actually emitting the instruction; the emission part is left to (2) some emit_x64_addb_* ISLE which we generate right beside it. Why? I was a bit concerned that if we keep on generating new variants of these things (_mem, _flags, etc.) we end up muddying the waters; instead, we can use the "return an inst" version in (1) to fit into all the existing conventions ISLE defines: ProducesFlags, ConsumesFlags, SideEffect, etc.

What is AssemblerOutputs, you ask?

(type AssemblerOutputs (enum
      (WritesMemory (inst MInst))
      (WritesGpr (inst MInst) (gpr Gpr))
))

I was thinking that this core enum can be matched on or auto-converted (AssemblerOutputs.WritesMemory --> SideEffect) into the existing compiler but don't perpetuate all the *Flags and SideEffect logic in the generator. If we were to change any of those conventions, I was hoping this approach would reduce the blast radius. Thoughts?

[edit: I know that that up above we don't want some of those emit_* ISLE functions to return a Gpr... my branch is still a work in progress so that is not all the way there.]

view this post on Zulip Wasmtime GitHub notifications bot (Feb 24 2025 at 22:51):

abrown edited a comment on PR #10276:

Ok, as a point of comparison, here is where I ended up for the same instructions above:

(decl x64_addb_mi (GprMem u8) AssemblerOutputs)
(extern constructor x64_addb_mi x64_addb_mi)
(decl emit_x64_addb_mi (GprMem u8) Gpr)
(rule (emit_x64_addb_mi GprMem u8)
  (let (
    (out AssemblerOutputs (x64_addb_mi GprMem u8))
    (inst MInst (asm_out_inst out))
    (gpr Gpr (asm_out_gpr out))
    (_ Unit (emit inst))
  )
  gpr))

(decl x64_addb_mr (GprMem Gpr) AssemblerOutputs)
(extern constructor x64_addb_mr x64_addb_mr)
(decl emit_x64_addb_mr (GprMem Gpr) Gpr)
(rule (emit_x64_addb_mr GprMem Gpr)
  (let (
    (out AssemblerOutputs (x64_addb_mr GprMem Gpr))
    (inst MInst (asm_out_inst out))
    (gpr Gpr (asm_out_gpr out))
    (_ Unit (emit inst))
  )
  gpr))

(decl x64_addb_rm (Gpr GprMem) AssemblerOutputs)
(extern constructor x64_addb_rm x64_addb_rm)
(decl emit_x64_addb_rm (Gpr GprMem) Gpr)
(rule (emit_x64_addb_rm Gpr GprMem)
  (let (
    (out AssemblerOutputs (x64_addb_rm Gpr GprMem))
    (inst MInst (asm_out_inst out))
    (gpr Gpr (asm_out_gpr out))
    (_ Unit (emit inst))
  )
  gpr))

What's going on in this version is that (1) the x64_addb_* ISLE now returns an AssemblerOutputs instead of actually emitting the instruction; the emission part is left to (2) some emit_x64_addb_* ISLE which we generate right beside it. Why? I was a bit concerned that if we keep on generating new variants of these things (_mem, _flags, etc.) we end up muddying the waters; instead, we can use the "return an inst" version in (1) to fit into all the existing conventions ISLE defines: ProducesFlags, ConsumesFlags, SideEffect, etc.

What is AssemblerOutputs, you ask?

(type AssemblerOutputs (enum
      (WritesMemory (inst MInst))
      (WritesGpr (inst MInst) (gpr Gpr))
))

I was thinking that this core enum can be matched on or auto-converted (AssemblerOutputs.WritesMemory --> SideEffect) into the existing compiler but don't perpetuate all the *Flags and SideEffect logic in the generator. If we were to change any of those conventions, I was hoping this approach would reduce the blast radius. Thoughts?

[edit: I know that up above we don't want some of those emit_* ISLE functions to return a Gpr... my branch is still a work in progress so that is not all the way there.]

view this post on Zulip Wasmtime GitHub notifications bot (Feb 24 2025 at 23:00):

abrown commented on PR #10276:

Another thought: at some point soon we could move all the code that generates ISLE out of cranelift-assembler-x64-meta and into cranelift-codegen-meta.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 25 2025 at 00:31):

cfallin commented on PR #10276:

Two thoughts to add:

Basically I'm hoping for simplicity via reduction in layers of abstraction as far as possible without losing type safety; that was the original state of play, the flags abstraction added a bit but it was seen as necessary, now the separate assembler layer whose types are being kept separate (rather than propagated through Cranelift) is ratcheting the boilerplate up past a critical threshold IMHO. Curious to hear your thoughts.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 25 2025 at 03:18):

alexcrichton commented on PR #10276:

@abrown

I think you and I basically converged on the same idea. In this PR the *_raw constructors return either MInst (your WritesMemory variant), MInstAndGpr (your WritesGpr variant), or MInstAndGprMem (same as your AssemblerOutputs type). The "middle parts" of then converting from the "raw" thing to the output thing is basically inline ISLE for you vs helper functions for you. Basically I think we're thinking the same thing, but the main difference for me is that no constructor takes GprMem as a first argument, but instead splits that into two constructors -- one taking Gpr and one taking Amode.

Also I can confirm that with something like AssemblerOutputs (or the more specific types I have in this PR) there need not be any changes to handle {Produces,Consumes}Flags. The commit above has no changes to *_raw and enables new flags-aware constructors.


@cfallin

The ideal IMHO is that we actually reuse types directly -- so rather than define our own GprMem/GprMemImm on the Cranelift side, we use the assembler types throughout

:100: from me on this, agreed that should be the end state.

so we have the lowest-level constructors that either (i) emit the inst and return e.g. a Gpr (and that's it, no more types needed), or (ii) return a ProducesFlags/ConsumesFlags and we pass these to with_flags, which does the emission and returns Gpr / pair of Gprs / whatever.

To me this is what it comes down to, the question of "what is the lowest level thing?" As-is today it's not possible to create {Produces,Consumes}Flags in ISLE because the emit happens in the Rust FFI layer. With your suggestion I think it would effectively double-the FFI layer by having one "raw" function return a Gpr and another return an MInst, which is something I was trying to avoid. In playing around with the assembler/generated code the FFI layer is the hardest to debug and reason about because the compiler doesn't show errors in the generated code, only on the asm::generate!() invocation. I've thought a bit about how to fix this and kept coming up blank.

In the end my goal here was to have the "raw" constructor be as general-purpose as possible, aka the "narrow waist" that all ISLE abstractions would be built on. It's more-or-less @abrown's AssemblerOutputs suggestion and personally I like that design over what I have here. It's just one type to think about and then ISLE constructors figure out what to do with it. The forseeable changes I can think of to AssemblerOutputs are (a) a variant returning two Gpr values for mul and (b) adding a variant returning an Xmm register instead of a Gpr.


Overall I feel like we're all basically in agreement about how to design this -- a "raw" thing with fancy ISLE abstractions on top -- and are more-or-less trying to figure out what to paint the bikeshed. What I might concretely propose is to hand-write this in ISLE:

(type AssemblerOutputs (enum
    ;; used for stores, instructions that modify memory, traps (?), etc
    (SideEffect (inst MInst))
    ;; used for anything that returns a gpr, including
    ;; "mr" variants where the first argument was a Gpr
    (RetGpr (inst MInst) (gpr Gpr))
    ;; used for `imul`, `mul` , `div`, etc
    (RetTwoGpr (inst MInst) (gpr1 Gpr) (gpr2 Gpr))
    ;; SSE/AVX/etc
    (RetXmm (inst MInst) (gpr Xmm))

    ;; maybe `RetXmm2`? Unsure if we have any of those
))

The FFI layer in Rust would always return an AssemblerOutputs from each and every method. Most raw constructors would statically return just one variant, for example "RM" encodings would always return RetGpr. The "MR" variants, however, would dynamically return either SideEffect or RetGpr depending on the input. The ISLE constructor would know which it passed in and there'd be one constructor for each (e.g. x64_addb_mi and x64_addb_mi_mem above).

To me this strikes a nice balance of:

I think it would also make sense to define all the helper functions that converts AssemblerOutputs to ctor outputs (e.g. AssemblerOutputs => Gpr) as hand-written functions in ISLE. These would be just below the AssemblerOutputs definition which would be easy to skim and would be relatively easily connect-able to the machine-generated instructions too.


That's a lot of words, but WDYT?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 25 2025 at 04:24):

cfallin commented on PR #10276:

To me this is what it comes down to, the question of "what is the lowest level thing?" As-is today it's not possible to create {Produces,Consumes}Flags in ISLE because the emit happens in the Rust FFI layer. With your suggestion I think it would effectively double-the FFI layer by having one "raw" function return a Gpr and another return an MInst, which is something I was trying to avoid.

Not quite; I'm imaging that the external Rust constructor for (e.g.) a variant of add returns only a ProducesFlags -- that's the fundamental thing -- and one could either use that in a with_flags or (via an automatic conversion) get a Gpr by pairing with a null ConsumesFlags. (call it something like (decl ignore_flags (ProducesFlags) Gpr) maybe?)

Some of this goes back to early ISLE design philosophy, but: my intent with the initial design was to hew as close to a "value semantics" view of the program operators and instructions as possible, because this is what enables reasoning about expression equivalence, makes verification possible, etc. From that point of view, it makes sense that some instruction that is a binary operator (i) takes two registers and (ii) returns a register, just like its CLIF cousins. The bit of lowering logic that bridges the gap is hiding the emit in the constructor and returning the value. The magic of data dependencies means that it is physically impossible to call the ctors in an order that is not a valid toposort of operations, so emitting-when-constructed is always valid.

We started carrying MInsts around instead when we invented the ProducesFlags/ConsumesFlags abstraction; that was the first time that we wanted to have some other way of controlling emission time, to ensure flags aren't clobbered. But we carefully wrapped the insts inside another enum and provided a combinator (with_flags) that itself hid the emission inside, so we were back to a world of value semantics with ensured ordering in the end.

This is part of the reason I'm not so sure that "return the MInst" is more fundamental: it's exposing more of the mechanism, certainly, but it's moving to a world where one has access to sharp tools that can be combined in the wrong way (emitting out of order), and where one has more pieces that one has to put together, more complex overall.

So if possible I'd prefer to bias us back toward implicit emission everywhere, and I do think it's possible here, as described above -- the fact that the instruction produces flags is fundamental, so that one is the one that we define; and an implicit converter can throw away the flags if we don't care. Thoughts?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 25 2025 at 17:23):

alexcrichton commented on PR #10276:

Personally I feel that the principles you're thinking of make sense, but I don't see the need to apply them to generated code -- instead only to the surface area of the generated code. Code generating code to me is always a hairy problem and having a precise return value like ProducesFlags for all instructions to me will be a bit sisyphean. For example:

Basically while I agree with where you're coming from I don't think that ProducesFlags and ConsumesFlags is a great abstraction to build on. I'm also not sure what the loss is by having these *_raw constructors return a "thing" which is a bit non-standard since they're never manually used anyway. In terms of integrating with verification it seems like the "leaves" of verification would be the typed instruction lowerings rather than the *_raw lowerings, which are also all auto-generated.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 25 2025 at 17:31):

cfallin commented on PR #10276:

That's fair enough; I'm not deeply wedded to any of the suggestions, and we should do what works before we try to contort too far toward an ideal. I do agree that if we could find a better abstraction for working with flags, that might be welcome. (I'll let this percolate through my slightly creaky neurons some more but I wonder if it might be simpler to go back to a world of explicitly represented flags values, track which flags are current in the lowering infra, and panic if an instruction sequence is emitted that clobbers the expected flags.) Anyway -- you both have been iterating on this in a more hands-on way so I'll trust your instincts here!

view this post on Zulip Wasmtime GitHub notifications bot (Feb 25 2025 at 18:33):

abrown commented on PR #10276:

I just want to say that this discussion has been very helpful to change my mind about a couple of things:

What do we do next? I like Alex's plan a few posts up; how much more needs to change to get there from this PR?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2025 at 19:59):

alexcrichton commented on PR #10276:

To put the plan down in writing (and make sure I didn't miss anything) the consensus from today's Cranelift meeting was:

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2025 at 23:31):

abrown updated PR #10276.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 26 2025 at 23:33):

abrown updated PR #10276.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 27 2025 at 00:06):

cfallin submitted PR review:

This looks reasonable, thanks!

view this post on Zulip Wasmtime GitHub notifications bot (Feb 27 2025 at 00:06):

cfallin created PR review comment:

Can we add an example here, just to see at a glance what the constructors look like? It's a little hard to pick out from the format string below otherwise.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 27 2025 at 02:27):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 27 2025 at 02:27):

abrown created PR review comment:

Like inline in the code comment or in this PR?

view this post on Zulip Wasmtime GitHub notifications bot (Feb 27 2025 at 02:28):

cfallin submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 27 2025 at 02:28):

cfallin created PR review comment:

In the code comment, yep, similar to how elsewhere there are some examples of (decl ...) and the like.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 27 2025 at 18:12):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 27 2025 at 18:12):

abrown created PR review comment:

Now that I'm looking at this, there are some examples up at the top of this function already like we've done elsewhere... But they are a bit sparse — I'll add a bit more detail.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 27 2025 at 18:22):

abrown updated PR #10276.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 27 2025 at 18:22):

abrown has enabled auto merge for PR #10276.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 27 2025 at 18:58):

abrown merged PR #10276.


Last updated: Feb 28 2025 at 02:27 UTC