fitzgen opened Issue #1709:
E.g. it may make sense for the
cg_clif
(the cranelift-based backend for rustc) to have its own peephole optimizations pass that contains optimizations specific to code generated bycg_clif
.
fitzgen labeled Issue #1709:
E.g. it may make sense for the
cg_clif
(the cranelift-based backend for rustc) to have its own peephole optimizations pass that contains optimizations specific to code generated bycg_clif
.
fitzgen labeled Issue #1709:
E.g. it may make sense for the
cg_clif
(the cranelift-based backend for rustc) to have its own peephole optimizations pass that contains optimizations specific to code generated bycg_clif
.
github-actions[bot] commented on Issue #1709:
Subscribe to Label Action
cc @fitzgen
<details>
This issue or pull request has been labeled: "cranelift:area:peepmatic"Thus the following users have been cc'd because of the following labels:
- fitzgen: cranelift:area:peepmatic
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
github-actions[bot] commented on Issue #1709:
Subscribe to Label Action
cc @bnjbvr
<details>
This issue or pull request has been labeled: "cranelift"Thus the following users have been cc'd because of the following labels:
- bnjbvr: cranelift
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
MaxGraey commented on Issue #1709:
Regarding other peepmatic transform rules I'm wondering is it possible implement such simplification for integers which actually behave as boolean? Something like:
(=> (when (icmp_imm eq 0 $x) (bit-width $x 1)) (bxor_imm 1 $x))
and other thansforms which have
i1
(boolean) type. Or it doesn't make sense for peepmatic?
fitzgen commented on Issue #1709:
Although there is a
b1
type for 1-bit booleans, there isn't really ani1
type for 1-bit integers.As for specifically replacing an
icmp_imm
with anbxor_imm
, I'm not convinced it would improve codegen since both instructions have a single uop and latency of 1 cycle on x86_64 (I don't know about aarch64).On the more general topic of "what kind of optimizations make sense for peepmatic?": I don't think it makes sense to focus on adding one-off peepmatic optimizations to Cranelift at this time. Instead, I am planning on writing
- a left-hand side extractor, that extracts candidate LHSes from clif IR into Souper IR, and
- a Souper optimization to peepmatic DSL translator.
With these two bits in hand, we will be able to automatically find missing optimizations, and generate the optimal RHS for each of our given LHSes. This should be much more fruitful (and less buggy!) than writing optimizations by hand.
But of course by the time we have these synthesized optimizations from Souper, we need a way to hook them into Cranelift, which is what resolving this issue should provide :)
If you're interested in helping out with this stuff, let me know and I can try and divide up these tasks into smaller bits! Also, I probably didn't explain everything super well, so if you have questions about what I am talking about, don't hesitate to ask questions.
MaxGraey commented on Issue #1709:
a left-hand side extractor, that extracts candidate LHSes from clif IR into Souper IR
we will be able to automatically find missing optimizations, and generate the optimal RHS for each of our given LHSesThat's sound very cool! As I understand it's working in this repo currently?
MaxGraey edited a comment on Issue #1709:
a left-hand side extractor, that extracts candidate LHSes from clif IR into Souper IR
we will be able to automatically find missing optimizations, and generate the optimal RHS for each of our given LHSesThat's sound very cool! As I understand progress going here?
MaxGraey edited a comment on Issue #1709:
a left-hand side extractor, that extracts candidate LHSes from clif IR into Souper IR
we will be able to automatically find missing optimizations, and generate the optimal RHS for each of our given LHSesThat's sound very cool! As I understand progress going here?
As for specifically replacing an icmp_imm with an bxor_imm, I'm not convinced it would improve codegen since both instructions have a single uop and latency of 1 cycle on x86_64 (I don't know about aarch64).
Thanks for explanation. I just ask about this due to LLVM's cost model it seems always prefer bit-wise
xor
fori1
types
MaxGraey edited a comment on Issue #1709:
a left-hand side extractor, that extracts candidate LHSes from clif IR into Souper IR
we will be able to automatically find missing optimizations, and generate the optimal RHS for each of our given LHSesThat's sound very cool! As I understand progress going here?
As for specifically replacing an icmp_imm with an bxor_imm, I'm not convinced it would improve codegen since both instructions have a single uop and latency of 1 cycle on x86_64 (I don't know about aarch64).
Thanks for explanation. I just ask about this due to LLVM's cost model it seems always prefer bit-wise
xor
fori1
/b1
types
MaxGraey edited a comment on Issue #1709:
a left-hand side extractor, that extracts candidate LHSes from clif IR into Souper IR
we will be able to automatically find missing optimizations, and generate the optimal RHS for each of our given LHSesThat's sound very cool! As I understand progress going here?
As for specifically replacing an icmp_imm with an bxor_imm, I'm not convinced it would improve codegen since both instructions have a single uop and latency of 1 cycle on x86_64 (I don't know about aarch64).
Thanks for explanation. I just ask about this due to LLVM's cost model it seems always prefer bit-wise
xor
fori1
/b1
types in this case
fitzgen commented on Issue #1709:
That's sound very cool! As I understand progress going here?
Sort of. Jubi and I are working closely together, but her project is taking a slightly different path (translating Souper optimizations into Rust source code that implements a peephole pass directly, rather than using peepmatic, which didn't exist when she started). That said, we are sharing notes and brainstorming together.
I haven't started working on the extractor or the Souper-to-peepmatic stuff yet, because I've been busy with reference types.
I just ask about this due to LLVM's cost model it seems always prefer bit-wise
xor
fori1
/b1
types in this caseIt could be canonicalization, like you mentioned in the other thread. Or perhaps on some microarchs it makes a difference.
MaxGraey commented on Issue #1709:
I haven't started working on the extractor or the Souper-to-peepmatic stuff yet, because I've been busy with reference types
Ah, got it. Yeah, implementing reference types is really puzzled for vm and compilers. I heard llvm / lld has some difficulties with that.
MaxGraey edited a comment on Issue #1709:
I haven't started working on the extractor or the Souper-to-peepmatic stuff yet, because I've been busy with reference types
Ah, got it. Yeah, implementing reference types is really puzzled for vm and compilers. I heard llvm / lld has some difficulties with that as well
MaxGraey commented on Issue #1709:
I'm wondering is it make sense add optimization for double equal to zero? For example binaryen (wasm target) when optimize for size usually prefer
(i32.eqz (i32.eqz (local.get $x)))
instead(i32.ne (local.get $x) (i32.const 0)))
due to first version one byte smaller
Last updated: Jan 24 2025 at 00:11 UTC