alexcrichton commented on issue #6876:
This might be good for a rule like on x64 where much of the logic of bitselect can be skipped if the mask is known to have a particular pattern? (e.g. generated by icmp or fcmp)
afonso360 commented on issue #6876:
Yeah, that's a neat way of de-duplicating these rules. I also think it might be worth copying that
vconst_all_ones_or_zerospattern into an egraphs rule, since it looks like we should be able to const propagate that
afonso360 edited a comment on issue #6876:
Yeah, that's a neat way of de-duplicating these rules. I also think it might be worth copying that
vconst_all_ones_or_zerospattern into an egraphs rule, since it looks like we should be able to const propagate that.
afonso360 edited a comment on issue #6876:
Yeah, that's a neat way of de-duplicating these rules. I also think it might be worth copying that
vconst_all_ones_or_zerospattern into an egraphs rule, since it looks like we should be able to const propagate that.Edit: Hmm actually I don't think we can quite deduplicate this as neatly since we have different ways of generating a
icmpmask andfcmpmask.
afonso360 edited a comment on issue #6876:
Yeah, that's a neat way of de-duplicating these rules. I also think it might be worth copying that
vconst_all_ones_or_zerospattern into an egraphs rule, since it looks like we should be able to const propagate that.Edit: Hmm actually I don't think we can quite deduplicate this as neatly since we have different ways of generating a
icmpmask andfcmpmask.Edit2: Wait, now I'm confused. We're already sort of skipping the main bitselect lowering by using
vmergeinstead, that merges whole lanes instead of bits. We can't skip generating theicmp/fcmpmask since in RISC-V the masks are 1 bit per lane, so we always need to make the conversion from 1 bit per bit, into 1 bit per lane. Is that what you meant?
alexcrichton commented on issue #6876:
Oh this may not be applicable for riscv64, but on x64 at least there's no bitselect instruction so it uses the and/or/not/etc combo by default, but x64 does have lane-based select so if a bitselect's input mask is a comparison then the and/or/not/etc combo can be skipped in favor of a single lane-based instruction. I figured the same might be possible for riscv64 where bitselect could skip the combo of instructions and generate a single
vmerge*(which I interpreted, perhaps mistakenly, as a lane-based instruction) after the normal instruction generated for the comparison. That way the rules forbitselectwouldn't generate icmp/fcmp instructions at all, instead the output of those instructions will be fed directly intovmerge.since we have different ways of generating a icmp mask and fcmp mask.
On x64 I believe this is handled by
bitselectnot actually generating the icmp/fcmp, instead basically callingput_in_regfor the output of the icmp/fcmp which then will later use other lowering rules to generate the icmp/fcmp.We can't skip generating the icmp/fcmp mask since in RISC-V the masks are 1 bit per lane, so we always need to make the conversion from 1 bit per bit, into 1 bit per lane. Is that what you meant?
Right yeah, my thinking was that like on x64 the lowering for
bitselectwouldn't generate the cmp, only thevmerge, but because the mask of thebitselectis only pattern-matched on but we don't use the sub-operands that it would still generate the cmp naturally via other lowering rules.
afonso360 commented on issue #6876:
Oh, okay so it's pretty much the same on RISC-V. We don't have a direct
bitselectequivalent, we lower toand/or/notcombo as well by default.vmergeis a lanewise select as you mentioned.That way the rules for bitselect wouldn't generate icmp/fcmp instructions at all, instead the output of those instructions will be fed directly into vmerge.
Here's where this diverges.
vmergedoes not support the regular masks that are fed intobitselect. For ai8x16we need a 16bit mask instead of a 128bit mask. (Or a 8 bit mask fori16x8, etc...)In the regular
icmpandfcmpinstruction lowerings we do two things, first we generate the small mask (1 bit per lane), and then we do a second step of expanding that into a big mask.In this PR
gen_icmp_maskandgen_fcmp_maskonly generate the small mask, that is compatible withvmerge. But we never expand it, which is what would happen if we let the regularicmpandfcmpinstructions get lowered.Does that make sense? I think I might be missing something, sorry if I am.
afonso360 edited a comment on issue #6876:
Oh, okay so it's pretty much the same on RISC-V. We don't have a direct
bitselectequivalent, we lower toand/or/notcombo as well by default.vmergeis a lanewise select as you mentioned.That way the rules for bitselect wouldn't generate icmp/fcmp instructions at all, instead the output of those instructions will be fed directly into vmerge.
Here's where this diverges.
vmergedoes not support the regular masks that are fed intobitselect. For ai8x16we need a 16bit mask instead of a 128bit mask. (Or a 8 bit mask fori16x8, etc...)In the regular
icmpandfcmpinstruction lowerings we do two things, first we generate the small mask (1 bit per lane), and then we do a second step of expanding that into a big mask.In this PR
gen_icmp_maskandgen_fcmp_maskonly generate the small mask, that is compatible withvmerge. But we never expand it, which is what would happen if we let the regularicmpandfcmpinstructions get lowered.Does that make sense? I think I might be missing something, sorry if I am.
--
Edit:Yeah I think this is where x64 and RISC-V are different. Reading the spec for the
blendinstruction it looks like it makes a decision based on the MSB on each lane, which allows it to directly use the input forbitselect, if we know its all the same for the entire lane.On RISC-V our masks are completely different we only use the lowest
nbits of the entire register (wherenis the lane count). So they are sort of "compressed" masks. But that makes them incompatible with WASM and Cranelift'sicmp/fcmp, so we need to pretend that they are the same by expanding them, and that prevents just generatingvmergeas part ofbitselecteven if know that the input is aicmp/fcmp.
jameysharp commented on issue #6876:
A different way to deduplicate these rules would be along the lines of
maybe_uextend; riscv could use amaybe_bitcastextractor to have just one bitselect rule for each of icmp and fcmp, instead of two for each.But I'm not sure that would be more broadly useful so I'm not sure it's worth doing. Just a thought in case it helps with something later.
Last updated: Dec 06 2025 at 06:05 UTC