alexcrichton opened PR #6022 from x64-refactor-extend to main:
This commit refactors the
{s,u}extendinstruction lowerings for the x64 backend to share a bit more code and move heavy lifting toextend_to_gprwhich is called from a number of other locations as well. Additionally special cases forpextr{b,w}are added to this as more special-cases where the zero-extend can be elided due to the automatic zero-extensions of the original instructions.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
has_typedoes a lot of work if you use it here. Inlower,srcis anInst, but here it's aValue. As a result, first there's an implicit conversion to get the instruction that defined this value, thenhas_typegets the first result of that instruction. But in all of these cases, that's the sameValuethat we started with. So let's usevalue_typedirectly to bypass all that.And while the existing comment was true, that we can't do this with an internal extractor today, I think we can do it with an internal constructor instead.
(decl pure partial zeroes_upper (Inst) Unit) (rule (zeroes_upper (iadd _ _)) unit) (rule (zeroes_upper (isub _ _)) unit) ... (rule 1 (extend_to_gpr src @ (value_type $I32) $I64 (ExtendKind.Zero)) (if (zeroes_upper src)) src)I don't like the name I picked for
zeroes_upperso please replace that, but I think the overall structure should work.
jameysharp created PR review comment:
In addition to the comments about
uextendthat apply tosextend, I'd also like to renumber the rule priorities here. Maybe just match the priorities used inuextend: the$I128and$I64cases don't specify a priority and default to 0, and thefits_in_32case is at priority -1.
jameysharp created PR review comment:
Pre-existing, but would you mind changing
(and val (value_type ty))toval @ (value_type ty)while you're here?
jameysharp created PR review comment:
This seems like a nice optimization for
extractlanebut I'm concerned that it may be fragile. Correct me if I'm missing something, but I think it's relying on the lowering rules forextractlaneto produce specific instructions, without having any clear connection between the two sets of rules. Not only are they far removed from each other, but this rule checks that the result typefits_in_16, while that rule checks that the argument type ismulti_lane 8ormulti_lane 16.At minimum I think each place needs a comment referring to the other. Ideally it'd be great to connect them in some way that's harder to get wrong, though I haven't come up with a good way to do that.
jameysharp created PR review comment:
I guess this comment should include
I64now.Thanks to #6018, I now know that you don't need to check that the source of an extend fits in 64 bits if its destination is
I128. The CLIF verifier checks that the source is narrower than the destination. So this could just match on(uextend src). I'm not fully decided on whether there's value in keeping the more verbose pattern as documentation, but I'm inclined to think that it's more confusing than helpful.Similarly, the
src_tyconstraints in the other twouextendrules can be removed.This also all applies to
sextendbelow.In addition, I can't select the next few lines to comment on them but I'd prefer the
$I64case to be at the same priority as$I128. For the moment the(fits_in_32 dst_ty)case still has to be at a different priority (either higher or lower, doesn't matter much) but I'm hoping to fix that someday.
alexcrichton updated PR #6022 from x64-refactor-extend to main.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Oh nice suggestion! IIRC the desire was to reuse the predicate "does this zero the upper bits" in other instructions, but I at least don't personally remember the context of if there's other locations to use this predicate. In any case I've refactored it out to avoid some extra boilerplate.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
True!
One way this could be tackled is to just go ahead and emit the instruction right here in the helper. For the case of
extractlanethat's pretty easy since it's a single instruction and theextractlanerules themselves aren't fancy. That's still somewhat brittle though as it doesn't handle evolution ofextractlaneover time (if at all).Perhapes more worrisome, though, is the preexisting cases for dropping a
uextend. The pattern matches for 32-bit I think have the same trait where they don't necessarily guarantee a particular instruction is being used, we're just relying on things being implemented in a particular way. That part I don't know how best to solve because I definitely wouldn't want to inline all theiaddrules here, for example.Do you think it's worthwhile to comment all of these though? While perhaps exhaustive it may be getting a bit out of hand since a good handful of the binops span multiple terminals of rules and if the purpose of the comment is to say "please always use the
subintruction here or something else that zeros the upper 32-bits foruextendelision" then it's easy to miss that if it's only located at the top of the rules.I suppose an even crazier strategy would be to postprocess the instructions after lowering, but that naturally has no existing infrastructure and is almost certainly overkill for this one use case.
jameysharp submitted PR review.
jameysharp created PR review comment:
Oh, I guess I assumed that we could pretty well count on any implementation of e.g. 32-bit add having this behavior on x86. I thought of
pextrbas a little more magical because it zero-extends to a narrower width.I'm not saying this is a good plan, but I guess another approach would be to scatter the individual
extend_to_gprandvalue32_zeros_upper32rules out to place them next to thelowerrules that they rely on. Actually, I don't entirely hate that.We don't necessarily have to solve this in this PR, I'm just a little nervous about it long-term. I am curious if @cfallin or others have any other suggestions for this.
cfallin submitted PR review.
cfallin created PR review comment:
Yeah, unfortunately this is the status-quo for
iaddand friends too. Placing rules that are intertwined in this way (producer that zeroes upper bits, consumer that relies on it) next to each other is not a bad idea; or at least, we could add comments nothing the implicit extra invariant everywhere it is either enforced or relied upon.At various points we've talked about "demanded-bits" and "known-bits" optimizations that could solve this sort of issue at the lowering-framework level: for example, in the upward scan during isel, we could record "requires zero-extended value" when we see a uextend and otherwise pass through the source register, then generate the extend after the producer if any consumer needed it. That needs a bit more thought to refine the right abstractions though.
We could also optimize VCode; at various points we've talked about post-lowering GVN, cprop, etc., and eliding redundant uextends could be a part of that. But again, much more plumbing needed.
Finally, this is exactly the sort of invariant that the verification project encodes in annotations, and checks on both ends. So I'm optimistic that eventually we'll be in a situation where it's actually enforced, and we can sleep easy at night. In the meantime IMHO it's probably enough to rely on documentation via comments and/or placing intertwined rules together.
cfallin edited PR review comment.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
I should note that the
pextrbdocumentation I'm pretty sure indicates that it zero-extends to the whole 64-bit register, but I'm not certain enough to add the rule just yet and otherwise trying to write manual tests for this is a bit of a pain. I'm also mostly interested in this from a wasm perspective wherei8x16.extract_lane_ufor example unconditionally does auextendtoi32to obey wasm semantics but theuextendcan be elided in all cases for the x86 instruction (and probably other platforms too which I should look into).One point of hesitation I would have to scatter the
(rule ...)s to various places would be that priorities are needed for the(rule ...)annotations and it's easy enough to reason about priorities when they're written in one place but having to group them from lots of different places could make things difficult.Given though that the main motivation is the wasm lowering here, I could also just add this one specific rule for
uextendaround the place thatextractlaneis codegen'd since they'd both use the singlepextr{b,w}instruction.I'm still not sure how best to resolve this, but if you're mainly worried about the
extractlanecases being the odd ones out, how about I add comments to the(rule)with extractlane here and comments on the lowering ofextractlaneas well pointing to each other?
jameysharp submitted PR review.
jameysharp created PR review comment:
I think adding comments just for extractlane is plenty for this PR. Thanks!
I agree with your concerns about scattering rules around that rely on priorities. I think we can deal with that, and also I think all our options are bad in different ways, which is why I said I don't hate that option. But in any case it doesn't have to happen right now.
alexcrichton updated PR #6022 from x64-refactor-extend to main.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
Ok I have added some more words to help connect the dots here :+1:
jameysharp submitted PR review.
jameysharp created PR review comment:
Perfect, thank you!
jameysharp submitted PR review.
alexcrichton merged PR #6022.
Last updated: Dec 13 2025 at 21:03 UTC