abrown commented on Issue #2819:
I'm introducing this PR as a draft to solicit opinions about how best to integrate this:
- any preference to switch to a builder pattern (e.g.
Evex::new().v128().prefix(...).map(...).reg(...).rm(...).w(true).opcode(0x1F).encode()
matching the manual's syntax, EVEX.128.66.0F38.W1 1F /r) over the current
encode_evex(...)`?- a better integration with
Inst
thanXmmUnaryRmREvex { op: Avx512Opcode, ...}
? I cannot just add a booleanevex
field toXmmUnaryRmR
because some AVX512 instructions (e.g.VPABSQ
) don't exist so they should not be included inSseOpcode
. But theXmmUnaryRmREvex
approach is not going to scale to a bunch of other instructions well.
abrown edited a comment on Issue #2819:
I'm introducing this PR as a draft to solicit opinions about how best to integrate this:
- any preference to switch to a builder pattern (e.g.
Evex::new().v128().prefix(...).map(...).reg(...).rm(...).w(true).opcode(0x1F).encode()
matching the manual's syntax,EVEX.128.66.0F38.W1 1F /r
) over the currentencode_evex(...)
?- a better integration with
Inst
thanXmmUnaryRmREvex { op: Avx512Opcode, ...}
? I cannot just add a booleanevex
field toXmmUnaryRmR
because some AVX512 instructions (e.g.VPABSQ
) don't exist so they should not be included inSseOpcode
. But theXmmUnaryRmREvex
approach is not going to scale to a bunch of other instructions well.
bnjbvr commented on Issue #2819:
any preference to switch to a builder pattern (e.g. Evex::new().v128().prefix(...).map(...).reg(...).rm(...).w(true).opcode(0x1F).encode() matching the manual's syntax, EVEX.128.66.0F38.W1 1F /r) over the current encode_evex(...)?
Yeah, this could be nice! If the flags don't have any relationship with each other, it might be possible to pass a
&sink
to thenew()
function and encode it as we call the builder functions. Alternatively, deferring it all and building inencode(&sink)
sounds good too! And if this works out well, we could refactor a bit the previous encode functions; their API really looks a bit too C-ish.a better integration with Inst than XmmUnaryRmREvex { op: Avx512Opcode, ...}? I cannot just add a boolean evex field to XmmUnaryRmR because some AVX512 instructions (e.g. VPABSQ) don't exist so they should not be included in SseOpcode. But the XmmUnaryRmREvex approach is not going to scale to a bunch of other instructions well.
Why wouldn't this scale well, out of curiosity? In general I think we should tend to APIs that make it impossible to create invalid instructions. It might not be the case for existing APIs, which should be refactored ideally...
abrown commented on Issue #2819:
@bnjbvr, the latest commit moves to the builder pattern. It's a hybrid of your idea: the EVEX 4-byte prefix is built up as the builder methods are called but we still need to call
.encode(sink)
to emit later because some methods, e.g..reg(...)
, modify both the prefix and the ModR/M byte, which are emitted at completely different times.Being a bit paranoid about performance, I benchmarked the
encode_evex
function approach and the currentEvexInstruction
builder approach by encoding the same instruction repeatedly inside bencher. For some reason, the builder approach turns out to be faster:test isa::x64::inst::encoding::evex::tests::encode_with_function ... bench: 17 ns/iter (+/- 0 test isa::x64::inst::encoding::evex::tests::encode_with_builder ... bench: 6 ns/iter (+/- 0)
I didn't dig too deep into this since @cfallin mentioned that emission is hardly the long pole in the tent. I was happy enough that this builder approach was not slower and left it at that. I think this is ok to review and merge understanding that there are still pieces coming (e.g. this only supports reg-reg addressing).
Why wouldn't this scale well, out of curiosity?
I mean mental scaling, not codegen performance or anything like that (some of us operate on limited brain RAM). Finding the right
Inst
variant when adding a new instruction can be tricky: "so this is unary, right, so I'm going to use that... no wait, I need to use the Xmm form... no, hold on, this is AVX512 so I need to find the Evex form of that." It's a developer experience thing more than anything. I agree that we should restrict the inputs somehow to only generate valid instructions but this "which Inst variant?" question seems a bit different.
abrown edited a comment on Issue #2819:
@bnjbvr, the latest commit moves to the builder pattern. It's a hybrid of your idea: the EVEX 4-byte prefix is built up as the builder methods are called but we still need to call
.encode(sink)
to emit later because some methods, e.g..reg(...)
, modify both the prefix and the ModR/M byte, which are emitted at different times.Being a bit paranoid about performance, I benchmarked the
encode_evex
function approach against theEvexInstruction
builder approach by encoding the same instruction repeatedly inside bencher. For some reason, the builder approach turns out to be faster:test isa::x64::inst::encoding::evex::tests::encode_with_function ... bench: 17 ns/iter (+/- 0 test isa::x64::inst::encoding::evex::tests::encode_with_builder ... bench: 6 ns/iter (+/- 0)
I didn't dig too deep into this since @cfallin mentioned that emission is hardly the long pole in the tent. I was happy enough that this builder approach was not slower and left it at that. I think this is ok to review and merge understanding that there are still pieces coming (e.g. this only supports reg-reg addressing).
Why wouldn't this scale well, out of curiosity?
I mean mental scaling, not codegen performance or anything like that (some of us operate on limited brain RAM). Finding the right
Inst
variant when adding a new instruction can be tricky: "so this is unary, right, so I'm going to use that... no wait, I need to use the Xmm form... no, hold on, this is AVX512 so I need to find the Evex form of that." It's a developer experience thing more than anything. I agree that we should restrict the inputs somehow to only generate valid instructions but this "which Inst variant?" question seems a bit different.
abrown edited a comment on Issue #2819:
@bnjbvr, the latest commit moves to the builder pattern. It's a hybrid of your idea: the EVEX 4-byte prefix is built up as the builder methods are called but we still need to call
.encode(sink)
to emit later because some methods, e.g..reg(...)
, modify both the prefix and the ModR/M byte, which are emitted at different times.Being a bit paranoid about performance, I benchmarked the
encode_evex
function approach against theEvexInstruction
builder approach by encoding the same instruction repeatedly inside bencher. For some reason, the builder approach turns out to be faster:test isa::x64::inst::encoding::evex::tests::encode_with_function ... bench: 17 ns/iter (+/- 0 test isa::x64::inst::encoding::evex::tests::encode_with_builder ... bench: 6 ns/iter (+/- 0)
I didn't dig too deep into this since @cfallin mentioned that emission is hardly the long pole in the tent. I was happy enough that this builder approach was not slower and left it at that. I think this is OK to review and merge understanding that there are still pieces coming (e.g. this only supports reg-reg addressing).
Why wouldn't this scale well, out of curiosity?
I mean mental scaling, not codegen performance or anything like that (some of us operate on limited brain RAM). Finding the right
Inst
variant when adding a new instruction can be tricky: "so this is unary, right, so I'm going to use that... no wait, I need to use the Xmm form... no, hold on, this is AVX512 so I need to find the Evex form of that." It's a developer experience thing more than anything. I agree that we should restrict the inputs somehow to only generate valid instructions but this "which Inst variant?" question seems a bit different.
abrown commented on Issue #2819:
@bnjbvr, can you take another look at this when you get a chance?
bnjbvr commented on Issue #2819:
@bnjbvr, can you take another look at this when you get a chance?
Yep, happy to take a look in the next few days!
Last updated: Jan 24 2025 at 00:11 UTC