afonso360 commented on issue #5926:
That is unfortunate :oh_no: . And yeah, It looks like egraphs is not the right place to do it (for now!).
The new load must be placed at the same point in the side-effecting skeleton as the original, or we could be changing where traps are detected. We don't have any way to guarantee that in egraph rules right now.
It's only an improvement if all users of the value want the bitcasted type. If any want the original type then you'd need to insert a bitcast in the other direction, or worse, keep both loads. We can't tell in the egraph whether a value is used until the elaboration step at the end, which I think is too late for this.
Oh yeah! I was sort of expecting that it wouldn't perform the rewrite if there was more than one user for the load, doing the load twice doesn't seem like a great option.
I think we can keep all those constraints if we do this via
simple_preopt
(maybe, I haven't thought a lot about this!). But I think it would be nice to have some numbers on how much this affects perf before adding more stuff there.Thank you guys for looking at this though!
afonso360 edited a comment on issue #5926:
That is unfortunate :oh_no: . And yeah, It looks like egraphs is not the right place to do it (for now!).
The new load must be placed at the same point in the side-effecting skeleton as the original, or we could be changing where traps are detected. We don't have any way to guarantee that in egraph rules right now.
It's only an improvement if all users of the value want the bitcasted type. If any want the original type then you'd need to insert a bitcast in the other direction, or worse, keep both loads. We can't tell in the egraph whether a value is used until the elaboration step at the end, which I think is too late for this.
Oh yeah! I was sort of expecting that it wouldn't perform the rewrite if there was more than one user for the load, doing the load twice doesn't seem like a great option.
I think we can keep all those constraints if we do this via
simple_preopt
(Maybe. I haven't thought a lot about this!). But I think it would be nice to have some numbers on how much this affects perf before adding more stuff there.Thank you guys for looking at this though!
jameysharp commented on issue #5926:
I'm curious whether the backends can do this optimization, if it's just about load-sinking. I'm not sure what it would take to get the load-sinking machinery to "see through" bitcasts.
I think the most general bitcast optimizations would be target-specific anyway. Something like, if both types are stored in the same register class, then the lowering for bitcast can just return the input register. I'm not sure how endianness interacts with that and I've probably missed other important details, but maybe there's something easy to try in there.
cfallin commented on issue #5926:
I'm curious whether the backends can do this optimization, if it's just about load-sinking. I'm not sure what it would take to get the load-sinking machinery to "see through" bitcasts.
We could possibly do the same thing that we do on conditional branches with the
maybe_uextend
extractor: basically, build a little helper manually to do that "seeing through". It seems like a pretty reasonable manageable (i.e., not too high) level of complexity to me, vs. trying to reason about the combination in the mid-end.
Last updated: Nov 22 2024 at 17:03 UTC