jameysharp requested fitzgen for a review on PR #8508.
jameysharp opened PR #8508 from jameysharp:fix-evex-disasm
to bytecodealliance:main
:
The operand collector had these operands in src1/src2/dst order, but the pretty-printer had dst/src1/src2 order instead.
Note that fixing the pretty-printer makes it disagree with the disassembly from Capstone. I have stared at the emit code and the Intel reference manual and can't figure out how to reconcile these.
However, I have verified that the
vpsraq
instruction is executed on my laptop in a runtest (cranelift/filetests/filetests/runtests/simd-sshr.clif
), and its runtime behavior matches the CLIF interpreter. So this does not appear to be a codegen/correctness bug.I'm hoping @abrown or @alexcrichton can help explain the disassembly discrepancy.
jameysharp requested wasmtime-compiler-reviewers for a review on PR #8508.
fitzgen requested alexcrichton for a review on PR #8508.
fitzgen requested abrown for a review on PR #8508.
abrown submitted PR review:
I don't understand:
- if this disagrees with the Capstone disassembly, shouldn't some test fail here?
- why does changing the order of pretty-print invocation matter here? Is that thing still stateful?
- are we aiming for Intel syntax here or AT&T?
jameysharp commented on PR #8508:
- why does changing the order of pretty-print invocation matter here? Is that thing still stateful?
Right now, the order that the backend's implementation of
get_operands
invokes methods likecollector.reg_use
must match the order thatpretty_print
oremit
invokes methods likeallocs.next
. The first puts operands into the input for regalloc2, and the others consume allocations from the output of regalloc2, and the only thing keeping them in sync is having all three places agree on the ordering.So there's definitely one bug here in that
x64_get_operands
did thereg_use
calls beforereg_def
, andemit
calledallocs.next
in that same order, butpretty_print
calledallocs.next
in a different order. And this PR fixes that bug, but then reveals that something doesn't line up, somewhere.I have a commit locally to get rid of this ordering dependency from
pretty_print
andemit
, so that onlyget_operands
matters and it can use any order it wants, and we'll never have this problem again. But that commit changes the pretty-printing behavior in the same way that this PR does, by implicitly correcting the order in the pretty-printer. So I want to get this fixed and reviewed by itself first, so we can think about it without the rest of the implications of that change.
- if this disagrees with the Capstone disassembly, shouldn't some test fail here?
As far as I know, there are no tests which compare our pretty-printer output with Capstone's output. Maybe there should be! It's tricky in cases like pseudo-instructions or synthetic address modes, but we could probably test a lot of instructions easily enough.
So the only way this shows up right now is that for precise-output compile filetests we list our pretty-printer output, and then we list Capstone's output, and a human has to decide whether those are close enough. In this PR, I used
CRANELIFT_TEST_BLESS
to update the one filetest which
used this instruction, and saw that the pretty-printer output changed but the Capstone output didn't, and now I don't know what to do.The actual binary machine code that's emitted isn't changing, so it's not surprising that none of the runtests failed, or that the Capstone output didn't change either.
Note that there are currently exactly two instructions which use our
Inst::XmmRmREvex
representation,vpmullq
andvpsraq
. We don't currently have any filetests which containvpmullq
, and there's exactly one use of this non-immediate form ofvpsraq
. So I don't know whether both instructions are affected.
- are we aiming for Intel syntax here or AT&T?
We're using roughly AT&T syntax in the pretty-printer output. There are good arguments for switching to Intel syntax, but Trevor tried doing that a while ago and found it's a lot of work. So instead we configure Capstone into AT&T mode to be consistent with our existing choices.
Before this PR, our pretty-printer was effectively printing
{src2}, {dst}, {src1}
. I don't think that makes sense for either Intel or AT&T syntax, does it?In our sole filetest example, it happens that
dst
andsrc1
get assigned the same register, because of ABI constraints I guess, so I suppose Capstone could be printing this in{src2}, {src1}, {dst}
order and we wouldn't be able to tell the difference. Would that be reasonable for AT&T syntax?I think one possibility is that Capstone's version of AT&T syntax has the operands backwards for this particular instruction. I wouldn't blame them if the non-Intel (and non-default) printing mode on a single random SIMD instruction wasn't checked carefully.
jameysharp updated PR #8508.
jameysharp commented on PR #8508:
Based on our discussion in the Cranelift meeting today I've updated this PR, and everything makes sense to me now. Thank you so much to @abrown and @fitzgen for helping me figure this out! I've updated the commit message to explain what happened:
The operand collector had these operands in src1/src2/dst order, but the pretty-printer fetched the allocations in dst/src1/src2 order instead.
Although our pretty-printer looked like it was printing src1/src2/dst, because it consumed operands in the wrong order, what it actually printed was src2/dst/src1.
Meanwhile, Capstone actually uses src2/src1/dst order in AT&T mode. (GNU objdump agrees.)
In the only filetest covering the vpsraq instruction, our output agreed with Capstone because register allocation picked the same register for both src1 and dst, so the two orders were indistinguishable. I've extended the filetest to force register allocation to pick different registers.
This format is also used for vpmullq, but we didn't have any compile filetests covering that instruction, so I've added one with the same register allocation pattern.
Now our pretty-printer agrees with Capstone on both instructions.
abrown submitted PR review:
LGTM!
abrown submitted PR review:
LGTM! (With a couple of emit tests to fix up...)
jameysharp updated PR #8508.
jameysharp updated PR #8508.
jameysharp updated PR #8508.
jameysharp commented on PR #8508:
Would you review again now that I've fixed the emit-tests? I found several more things to fix in the process.
abrown submitted PR review.
abrown merged PR #8508.
Last updated: Jan 24 2025 at 00:11 UTC