Stream: git-wasmtime

Topic: wasmtime / Issue #2819 x64: lower iabs.i64x2 using a sing...


view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2021 at 17:57):

abrown commented on Issue #2819:

I'm introducing this PR as a draft to solicit opinions about how best to integrate this:

view this post on Zulip Wasmtime GitHub notifications bot (Apr 08 2021 at 17:57):

abrown edited a comment on Issue #2819:

I'm introducing this PR as a draft to solicit opinions about how best to integrate this:

view this post on Zulip Wasmtime GitHub notifications bot (Apr 12 2021 at 09:24):

bnjbvr commented on Issue #2819:

any preference to switch to a builder pattern (e.g. Evex::new().v128().prefix(...).map(...).reg(...).rm(...).w(true).opcode(0x1F).encode() matching the manual's syntax, EVEX.128.66.0F38.W1 1F /r) over the current encode_evex(...)?

Yeah, this could be nice! If the flags don't have any relationship with each other, it might be possible to pass a &sink to the new() function and encode it as we call the builder functions. Alternatively, deferring it all and building in encode(&sink) sounds good too! And if this works out well, we could refactor a bit the previous encode functions; their API really looks a bit too C-ish.

a better integration with Inst than XmmUnaryRmREvex { op: Avx512Opcode, ...}? I cannot just add a boolean evex field to XmmUnaryRmR because some AVX512 instructions (e.g. VPABSQ) don't exist so they should not be included in SseOpcode. But the XmmUnaryRmREvex approach is not going to scale to a bunch of other instructions well.

Why wouldn't this scale well, out of curiosity? In general I think we should tend to APIs that make it impossible to create invalid instructions. It might not be the case for existing APIs, which should be refactored ideally...

view this post on Zulip Wasmtime GitHub notifications bot (Apr 13 2021 at 18:58):

abrown commented on Issue #2819:

@bnjbvr, the latest commit moves to the builder pattern. It's a hybrid of your idea: the EVEX 4-byte prefix is built up as the builder methods are called but we still need to call .encode(sink) to emit later because some methods, e.g. .reg(...), modify both the prefix and the ModR/M byte, which are emitted at completely different times.

Being a bit paranoid about performance, I benchmarked the encode_evex function approach and the current EvexInstruction builder approach by encoding the same instruction repeatedly inside bencher. For some reason, the builder approach turns out to be faster:

test isa::x64::inst::encoding::evex::tests::encode_with_function ... bench:          17 ns/iter (+/- 0
test isa::x64::inst::encoding::evex::tests::encode_with_builder  ... bench:           6 ns/iter (+/- 0)

I didn't dig too deep into this since @cfallin mentioned that emission is hardly the long pole in the tent. I was happy enough that this builder approach was not slower and left it at that. I think this is ok to review and merge understanding that there are still pieces coming (e.g. this only supports reg-reg addressing).

Why wouldn't this scale well, out of curiosity?

I mean mental scaling, not codegen performance or anything like that (some of us operate on limited brain RAM). Finding the right Inst variant when adding a new instruction can be tricky: "so this is unary, right, so I'm going to use that... no wait, I need to use the Xmm form... no, hold on, this is AVX512 so I need to find the Evex form of that." It's a developer experience thing more than anything. I agree that we should restrict the inputs somehow to only generate valid instructions but this "which Inst variant?" question seems a bit different.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 13 2021 at 18:59):

abrown edited a comment on Issue #2819:

@bnjbvr, the latest commit moves to the builder pattern. It's a hybrid of your idea: the EVEX 4-byte prefix is built up as the builder methods are called but we still need to call .encode(sink) to emit later because some methods, e.g. .reg(...), modify both the prefix and the ModR/M byte, which are emitted at different times.

Being a bit paranoid about performance, I benchmarked the encode_evex function approach against the EvexInstruction builder approach by encoding the same instruction repeatedly inside bencher. For some reason, the builder approach turns out to be faster:

test isa::x64::inst::encoding::evex::tests::encode_with_function ... bench:          17 ns/iter (+/- 0
test isa::x64::inst::encoding::evex::tests::encode_with_builder  ... bench:           6 ns/iter (+/- 0)

I didn't dig too deep into this since @cfallin mentioned that emission is hardly the long pole in the tent. I was happy enough that this builder approach was not slower and left it at that. I think this is ok to review and merge understanding that there are still pieces coming (e.g. this only supports reg-reg addressing).

Why wouldn't this scale well, out of curiosity?

I mean mental scaling, not codegen performance or anything like that (some of us operate on limited brain RAM). Finding the right Inst variant when adding a new instruction can be tricky: "so this is unary, right, so I'm going to use that... no wait, I need to use the Xmm form... no, hold on, this is AVX512 so I need to find the Evex form of that." It's a developer experience thing more than anything. I agree that we should restrict the inputs somehow to only generate valid instructions but this "which Inst variant?" question seems a bit different.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 13 2021 at 18:59):

abrown edited a comment on Issue #2819:

@bnjbvr, the latest commit moves to the builder pattern. It's a hybrid of your idea: the EVEX 4-byte prefix is built up as the builder methods are called but we still need to call .encode(sink) to emit later because some methods, e.g. .reg(...), modify both the prefix and the ModR/M byte, which are emitted at different times.

Being a bit paranoid about performance, I benchmarked the encode_evex function approach against the EvexInstruction builder approach by encoding the same instruction repeatedly inside bencher. For some reason, the builder approach turns out to be faster:

test isa::x64::inst::encoding::evex::tests::encode_with_function ... bench:          17 ns/iter (+/- 0
test isa::x64::inst::encoding::evex::tests::encode_with_builder  ... bench:           6 ns/iter (+/- 0)

I didn't dig too deep into this since @cfallin mentioned that emission is hardly the long pole in the tent. I was happy enough that this builder approach was not slower and left it at that. I think this is OK to review and merge understanding that there are still pieces coming (e.g. this only supports reg-reg addressing).

Why wouldn't this scale well, out of curiosity?

I mean mental scaling, not codegen performance or anything like that (some of us operate on limited brain RAM). Finding the right Inst variant when adding a new instruction can be tricky: "so this is unary, right, so I'm going to use that... no wait, I need to use the Xmm form... no, hold on, this is AVX512 so I need to find the Evex form of that." It's a developer experience thing more than anything. I agree that we should restrict the inputs somehow to only generate valid instructions but this "which Inst variant?" question seems a bit different.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 14 2021 at 16:09):

abrown commented on Issue #2819:

@bnjbvr, can you take another look at this when you get a chance?

view this post on Zulip Wasmtime GitHub notifications bot (Apr 14 2021 at 16:20):

bnjbvr commented on Issue #2819:

@bnjbvr, can you take another look at this when you get a chance?

Yep, happy to take a look in the next few days!


Last updated: Jan 24 2025 at 00:11 UTC