Stream: git-wasmtime

Topic: wasmtime / PR #11639 [Cranelift] fold `(or ...) + (neg .....


view this post on Zulip Wasmtime GitHub notifications bot (Sep 08 2025 at 13:11):

bongjunj opened PR #11639 from bongjunj:fold-add-or-neg to bytecodealliance:main:

<!--
Please make sure you include the following information:

Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.html

Please ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->

This adds (rule (simplify (iadd ty (bor ty x y) (ineg ty y))) (band ty x (bnot ty y)))

view this post on Zulip Wasmtime GitHub notifications bot (Sep 08 2025 at 13:11):

bongjunj requested alexcrichton for a review on PR #11639.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 08 2025 at 13:11):

bongjunj requested wasmtime-compiler-reviewers for a review on PR #11639.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 08 2025 at 13:21):

bongjunj commented on PR #11639:

Seems like iadd ... (ineg ty y) -> isub ... y wins over this rule.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 08 2025 at 14:39):

alexcrichton commented on PR #11639:

That might be fixable by adjusting the per-opcode costs perhaps? We could make arithmetic operations like iadd/isub slightly more costly than bitwise operations like band/bnot perhaps

view this post on Zulip Wasmtime GitHub notifications bot (Sep 08 2025 at 15:43):

cfallin commented on PR #11639:

I'll note that the mid-end does keep both around, rather than destructively rewriting (because egraphs!), so in the future if we have a more sophisticated cost function extractor there may be cases for both -- e.g. if the slightly more expensive form uses partial results that are already computed somewhere else. Perhaps different ISAs will have different cost functions too (they should all be 1-cycle ALU ops on any reasonable machine, but maybe some combinations of instructions fold together or compressed instruction forms are available or ...).

All that said, I'm curious @bongjunj -- are you driving your exploration with some sort of overall goodness metric? In other words, are you finding any and all equivalences, or is your goal to find those that seem to simplify somehow? And for this particular one, did you see instances where it leads to useful simplifications?

(I'm not opposed at all to building up a nice database of simplifications in general; as we've sometimes said, "rules are cheap" with ISLE's DSL compiler combining their matching. Just curious where all this is going.)

view this post on Zulip Wasmtime GitHub notifications bot (Sep 08 2025 at 15:45):

github-actions[bot] commented on PR #11639:

Subscribe to Label Action

cc @cfallin, @fitzgen

<details>
This issue or pull request has been labeled: "cranelift", "isle"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Sep 09 2025 at 00:34):

bongjunj commented on PR #11639:

Just realized that this is another version of the simplification of https://github.com/bytecodealliance/wasmtime/pull/10979

In addition, to @cfallin's comment, all my simplification rules are inspired by LLVM InstCombine rules.
For example, this particular rule resembles the LLVM optimization of the following:

define i32 @src(i32 %A) {
  %B = or i32 %A, 123
  %C = add i32 %B, -123
  ret i32 %C
}

define i32 @tgt(i32 %A) {
  %C = and i32 %A, -124
  ret i32 %C
}

(https://alive2.llvm.org/ce/z/QY4j7V)

So basically, what I'm doing now is observe the discrepancy between the LLVM InstCombine pass and Cranelift's mid-end optimizer and then add rules to Cranelift for such missed optimization opportunities. In other words, the good metric we are looking for here is kind of "LLVM-ness". But I'm not sure how we can measure the usefulness of (with a well-established metric), or find an instance of this particular rule.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 09 2025 at 15:38):

cfallin commented on PR #11639:

So basically, what I'm doing now is observe the discrepancy between the LLVM InstCombine pass and Cranelift's mid-end optimizer and then add rules to Cranelift for such missed optimization opportunities. In other words, the good metric we are looking for here is kind of "LLVM-ness".

That sounds great, then! I wanted to make sure we had some ground truth indicating these rewrites could be useful, and "LLVM does it" is a very strong argument for that. Thanks for putting in this effort!

view this post on Zulip Wasmtime GitHub notifications bot (Sep 09 2025 at 15:42):

cfallin commented on PR #11639:

To the immediate question of making this rule actually fire: since we already rewrite (iadd _ x (ineg _ y)) to (isub _ x y), could you rewrite the left-hand side to cascade on that, and match on (isub _ (bor _ x y) y)?

view this post on Zulip Wasmtime GitHub notifications bot (Sep 09 2025 at 17:54):

fitzgen commented on PR #11639:

Small clarification on the following, because I think it is pretty important when we are talking about rewrites that aren't necessarily beneficial on their own:

I'm not opposed at all to building up a nice database of simplifications in general; as we've sometimes said, "rules are cheap" with ISLE's DSL compiler combining their matching.

Rules are cheap, but e-nodes are expensive. (At least, expensive relative to rules, and there is always the risk of accidentally expanding to exponential numbers of e-nodes, which can be subtly easy to do.)

So adding all the commutative versions of a beneficial simplification is cheap, but adding basic commutation rules for every commutative operation (e.g. a+b --> b+a and a|b --> b|a) is expensive.

Similarly, if we have an input A that matches rule r to produce B which then matches rule s to produce the final output C (A --r--> B --s--> C) then creating a "macro rule" that composes r and s (A --rs--> C) is a win in our system (from a cost perspective) since we are trading away an intermediate e-node (expensive) for an additional rule (cheap).[^0]

[^0]: It would be super cool if, given an input set of basic rules, the ISLE compiler automatically derived a set of macro rules based on them. Seems like a potentially fun/fruitful research project.

So adding rules that create new e-nodes because maybe they will be useful for some other beneficial rewrite, but aren't beneficial on their own, is something that should ultimately be approached with care. That doesn't mean we shouldn't ever do it, but we should at least put in the effort to check that there actually exists another beneficial rewrite that could fire afterwards, and make sure it isn't too general such that it will result in tons of intermediate e-nodes that might not actually lead to some other beneficial rule firing.


Last updated: Dec 06 2025 at 06:05 UTC