jameysharp commented on issue #6850:
I think it's a good sign that the verifier is rejecting those tests. It's nice to confirm that it's detecting the problem that it was supposed to check for!
That said, I think the way we should fix these tests is by changing the parser for the textual CLIF representation, in
cranelift/reader/src/parser.rs
.The first thing to note is that I believe
parse_inst_operands
should only be called for anInstructionFormat::UnaryImm
ifexplicit_control_type
is notNone
; I thinkiconst
requires that a type always be provided, and it's the only instruction which uses that instruction format.So when parsing operands for that format, we can report a syntax error if that argument is
None
, and otherwise match on the type to decide whether to callself.match_imm8/16/32
. Those return signed types likei8
, which needs to be converted to unsigned so it's zero-extended. So something likeImm64::from(self.match_imm8(...) as u8)
should check that it fits in range for the appropriate type.Does that make sense? Feel free to do that in this PR, or make a separate PR for that change which we'll merge first, either way.
timjrd commented on issue #6850:
After looking at the CLIF parser, it does make total sense. Thanks for the detailed explanation. :smile:
I have a somewhat working patch, I need to fix a few things and it should be good!
timjrd commented on issue #6850:
Hi @jameysharp :wave: please see the message of my last commit: I'm facing some test breakage, I fixed a few ones, but many others seem to be related to the conversion from signed to unsigned integers. Maybe I should adapt the ranges in the verifier to allow both signed and unsigned use (-128 .. 255 instead of 0 .. 255 for i8) as initially proposed by @afonso360? :smile:
jameysharp commented on issue #6850:
One more thought: The fact that these backends produce different code based on the supposedly-ignored high bits of
iconst
instructions is potentially a sign of bugs in the backends. Hopefully these cases are all harmless but it's hard to tell at a glance. So these test failures are a good sign! They indicate that this PR will help us ensure that there aren't any surprising ways to affect codegen from a Cranelift frontend.
afonso360 commented on issue #6850:
Hey, Just wanted to say that this is awesome work! Thanks for working on it. I'm going to look into the RISC-V failures, but It might take a bit.
bjorn3 commented on issue #6850:
Would it make sense to store Uimm64 instead of Imm64 in
iconst
given that sign extending is no longer allowed?
jameysharp commented on issue #6850:
Would it make sense to store Uimm64 instead of Imm64 in
iconst
given that sign extending is no longer allowed?Yes, I think so—but I think that's a much bigger change and we shouldn't try to do it in this PR.
timjrd commented on issue #6850:
Alright! I'm waiting for a greenlight (or a fix) before updating the outputs of the tests, since I'm a total ISAs / assembly / compiler noob.
After this PR is merged, I would be happy to work on replacing
Imm64
byUimm64
, if the task is suitable for a compiler beginner with limited time resource. It's a related change, so at least I know where to start. :smile:
jameysharp commented on issue #6850:
Now that #6915 is merged (thanks @afonso360!), hopefully most of the test failures will disappear in this PR. Are you comfortable rebasing on main and taking another look at which tests fail now?
timjrd commented on issue #6850:
I updated the egraphs file-tests. There are 2 remaining failures:
FAIL filetests/filetests/isa/x64/simd-lane-access-compile.clif: compile
line 185--- expected +++ actual @@ -2,7 +2,7 @@ pushq %rbp movq %rsp, %rbp block0: - movl $-1, %ecx + movl $65535, %ecx movd %ecx, %xmm1 pshuflw $0, %xmm1, %xmm3 pshufd $0, %xmm3, %xmm0 @@ -15,7 +15,7 @@ pushq %rbp movq %rsp, %rbp block1: ; offset 0x4 - movl $0xffffffff, %ecx + movl $0xffff, %ecx movd %ecx, %xmm1 pshuflw $0, %xmm1, %xmm3 pshufd $0, %xmm3, %xmm0
FAIL filetests/filetests/isa/s390x/constants.clif: compile
line 4--- expected +++ actual @@ -1,9 +1,9 @@ VCode: block0: - lhi %r2, -1 + lhi %r2, 255 br %r14 Disassembled: block0: ; offset 0x0 - lhi %r2, -1 + lhi %r2, 0xff br %r14
jameysharp commented on issue #6850:
This feels very close!
@abrown and @uweigand, could you take a look at the test failures happening on this PR and check my reasoning about them? The diffs in the precise-output compile tests are in a comment above.
For s390x, the failing test is: https://github.com/bytecodealliance/wasmtime/blob/62fdafa14ab3e9adb31ac4f26348c939126c4fb9/cranelift/filetests/filetests/isa/s390x/constants.clif#L4-L18
I think the lowering rule in question is this one: https://github.com/bytecodealliance/wasmtime/blob/62fdafa14ab3e9adb31ac4f26348c939126c4fb9/cranelift/codegen/src/isa/s390x/inst.isle#L2985-L2989
I'm guessing that callers are expected to ignore all but the least-significant 8 bits of
%r2
if a function returns anI8
, so it wouldn't matter that in this case the constant has been sign-extended past 8 bits. It also doesn't matter to Wasmtime, I think, since we don't use 8-bit constants and for anything wider there are type-specific lowering rules. But I think it's hard to prove that there aren't any other situations where a sign-extended 8-bit constant could cause other instructions to produce incorrect results.That said, my guess is that we can just update this test expectation from "-1" to "255"/"0xff" and everything else will be good enough as-is.
For x64, this is the failing test: https://github.com/bytecodealliance/wasmtime/blob/62fdafa14ab3e9adb31ac4f26348c939126c4fb9/cranelift/filetests/filetests/isa/x64/simd-lane-access-compile.clif#L185-L215
When the x64 backend lowers
(splat (iconst x))
wherex
is ani16
, it emits a 32-bit constant, which I think is because the matching lowering rule invokesbitcast_gpr_to_xmm $I32
. As a result, it includes bits in the constant which are supposed to be ignored.I believe this is harmless because the value is used by
pshuflw
which is told to read only the lowest 16 bits, overwriting the uninitialized bits. If so, we can safely update the test expectation in this PR because only unused bits change. But I wonder if we should change the generated code further for this case? For example, doesmovw $-1, %cx
have a shorter encoding thanmovl $-1, %ecx
, or weird costs? Would it be worth matching onsplat
oficonst.i16
and precomputing the result ofpshuflw
as a 32-bit constant, if we have to emit a 32-bit constant anyway? etc.
uweigand commented on issue #6850:
I'm guessing that callers are expected to ignore all but the least-significant 8 bits of
%r2
if a function returns anI8
, so it wouldn't matter that in this case the constant has been sign-extended past 8 bits. It also doesn't matter to Wasmtime, I think, since we don't use 8-bit constants and for anything wider there are type-specific lowering rules. But I think it's hard to prove that there aren't any other situations where a sign-extended 8-bit constant could cause other instructions to produce incorrect results.The general assumption in the s390x back-end is that when holding an
i8
in a register, the upper bits are undefined and don't-care. (For example,i8
addition uses a 32-bit addition ISA instruction, which will change upper bits.) From that perspective, usinglhi .., -1
andlhi .., 255
should be equivalent when loading ani8
value. (Also note thatlhi
only modifies the low 32 bits of the 64-bit register in the first place, the high bits are left unmodified. So either way this doesn't deterministically set all bits in the full register.)As to the caller's expectations, the platform ABI does indeed require all scalar integer types to be zero- or sign-extended to the full register width when passed in registers. However, this is implemented at the cranelift IR level by using
i8 uext
ori8 sext
as argument or return type, not a plaini8
. (The latter is only used in rare cases like when passing a single-byte struct by value. In those cases the ABI does indeed state that the upper bits are undefined, soi8
correctly represents that semantics.)That said, my guess is that we can just update this test expectation from "-1" to "255"/"0xff" and everything else will be good enough as-is.
Agreed.
abrown commented on issue #6850:
I believe this is harmless because the value is used by pshuflw which is told to read only the lowest 16 bits, overwriting the uninitialized bits. If so, we can safely update the test expectation in this PR because only unused bits change. But I wonder if we should change the generated code further for this case? For example, does movw $-1, %cx have a shorter encoding than movl $-1, %ecx, or weird costs? Would it be worth matching on splat of iconst.i16 and precomputing the result of pshuflw as a 32-bit constant, if we have to emit a 32-bit constant anyway? etc.
@jameysharp:
- Your assumption is correct:
pshuflw
reads only the low 16 bits; you could just change the test- I don't think you're going to get smaller code with a word-
mov
; it's the same single byte opcode as a doubleword (32 bits in x86)- I like your idea of changing the codegen: I would just double up the 16-bit constant in the 32-bit
mov
and then use a singlepshufd
: for constanta
,movl <a>:<a>, %ecx; movd %ecx, %xmm1; pshufd $0, %xmm1, %xmm1
. (Another option would be to use apshufb
but then you have to setup a separatexmm
register for the mask).
jameysharp commented on issue #6850:
Awesome: besides sorting out the writer change that's affecting the egraph tests, the last step for this PR is to update these two test expectations to match what the new parser implementation does.
I don't think you're going to get smaller code with a word-
mov
; it's the same single byte opcode as a doubleword (32 bits in x86)I see it's the same opcode, but the immediate operand is only two bytes instead of four, right? Looking more carefully, I'm confused by the use of
operand_size_of_type_32_64
in theimm
constructor used byiconst
. Why doesn't it use smaller move-from-immediate instructions for smaller types? Is this just to ensure that the upper bits in the register are initialized, or perhaps because nothing else needed smaller operand sizes? https://github.com/bytecodealliance/wasmtime/blob/62fdafa14ab3e9adb31ac4f26348c939126c4fb9/cranelift/codegen/src/isa/x64/inst.isle#L2568-L2572
jameysharp commented on issue #6850:
Oh, though you still have it marked as a draft PR. Do _you_ feel it's ready to merge? You should click the "Ready for review" button if so.
timjrd commented on issue #6850:
@jameysharp I think it's OK except for the 2 comments I made above. :smile:
jameysharp commented on issue #6850:
Those two diffs are the same s390x instruction printed by two different disassemblers, so it's actually only one change. I looked at it before approving the PR earlier and decided that, although I don't know what the
iilf
instruction does, it's clearly emitting a 32-bit immediate constant and the instruction it came from is aniconst.i32
, so it's probably fine. It's possible that @uweigand may want to change codegen based on this, iflhi
is a more efficient instruction encoding or something, but I think that can safely land after this PR if it's worth doing at all.
timjrd commented on issue #6850:
Nice to see this PR merged! Thanks for all your detailed explanations and guidance. This project if definitely beginner-friendly! :rocket:
timjrd edited a comment on issue #6850:
Nice to see this PR merged! Thanks for all your detailed explanations and guidance. This project is definitely beginner-friendly! :rocket:
uweigand commented on issue #6850:
Those two diffs are the same s390x instruction printed by two different disassemblers, so it's actually only one change. I looked at it before approving the PR earlier and decided that, although I don't know what the
iilf
instruction does, it's clearly emitting a 32-bit immediate constant and the instruction it came from is aniconst.i32
, so it's probably fine. It's possible that @uweigand may want to change codegen based on this, iflhi
is a more efficient instruction encoding or something, but I think that can safely land after this PR if it's worth doing at all.This is now https://github.com/bytecodealliance/wasmtime/pull/6952
Last updated: Dec 23 2024 at 12:05 UTC