Stream: git-wasmtime

Topic: wasmtime / PR #8508 cranelift/x64: Fix XmmRmREvex pretty-...


view this post on Zulip Wasmtime GitHub notifications bot (Apr 29 2024 at 21:30):

jameysharp requested fitzgen for a review on PR #8508.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 29 2024 at 21:30):

jameysharp opened PR #8508 from jameysharp:fix-evex-disasm to bytecodealliance:main:

The operand collector had these operands in src1/src2/dst order, but the pretty-printer had dst/src1/src2 order instead.

Note that fixing the pretty-printer makes it disagree with the disassembly from Capstone. I have stared at the emit code and the Intel reference manual and can't figure out how to reconcile these.

However, I have verified that the vpsraq instruction is executed on my laptop in a runtest (cranelift/filetests/filetests/runtests/simd-sshr.clif), and its runtime behavior matches the CLIF interpreter. So this does not appear to be a codegen/correctness bug.

I'm hoping @abrown or @alexcrichton can help explain the disassembly discrepancy.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 29 2024 at 21:30):

jameysharp requested wasmtime-compiler-reviewers for a review on PR #8508.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 29 2024 at 22:01):

fitzgen requested alexcrichton for a review on PR #8508.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 29 2024 at 22:01):

fitzgen requested abrown for a review on PR #8508.

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 21:16):

abrown submitted PR review:

I don't understand:

view this post on Zulip Wasmtime GitHub notifications bot (Apr 30 2024 at 23:54):

jameysharp commented on PR #8508:

Right now, the order that the backend's implementation of get_operands invokes methods like collector.reg_use must match the order that pretty_print or emit invokes methods like allocs.next. The first puts operands into the input for regalloc2, and the others consume allocations from the output of regalloc2, and the only thing keeping them in sync is having all three places agree on the ordering.

So there's definitely one bug here in that x64_get_operands did the reg_use calls before reg_def, and emit called allocs.next in that same order, but pretty_print called allocs.next in a different order. And this PR fixes that bug, but then reveals that something doesn't line up, somewhere.

I have a commit locally to get rid of this ordering dependency from pretty_print and emit, so that only get_operands matters and it can use any order it wants, and we'll never have this problem again. But that commit changes the pretty-printing behavior in the same way that this PR does, by implicitly correcting the order in the pretty-printer. So I want to get this fixed and reviewed by itself first, so we can think about it without the rest of the implications of that change.

As far as I know, there are no tests which compare our pretty-printer output with Capstone's output. Maybe there should be! It's tricky in cases like pseudo-instructions or synthetic address modes, but we could probably test a lot of instructions easily enough.

So the only way this shows up right now is that for precise-output compile filetests we list our pretty-printer output, and then we list Capstone's output, and a human has to decide whether those are close enough. In this PR, I used CRANELIFT_TEST_BLESS to update the one filetest which
used this instruction, and saw that the pretty-printer output changed but the Capstone output didn't, and now I don't know what to do.

The actual binary machine code that's emitted isn't changing, so it's not surprising that none of the runtests failed, or that the Capstone output didn't change either.

Note that there are currently exactly two instructions which use our Inst::XmmRmREvex representation, vpmullq and vpsraq. We don't currently have any filetests which contain vpmullq, and there's exactly one use of this non-immediate form of vpsraq. So I don't know whether both instructions are affected.

We're using roughly AT&T syntax in the pretty-printer output. There are good arguments for switching to Intel syntax, but Trevor tried doing that a while ago and found it's a lot of work. So instead we configure Capstone into AT&T mode to be consistent with our existing choices.

Before this PR, our pretty-printer was effectively printing {src2}, {dst}, {src1}. I don't think that makes sense for either Intel or AT&T syntax, does it?

In our sole filetest example, it happens that dst and src1 get assigned the same register, because of ABI constraints I guess, so I suppose Capstone could be printing this in {src2}, {src1}, {dst} order and we wouldn't be able to tell the difference. Would that be reasonable for AT&T syntax?

I think one possibility is that Capstone's version of AT&T syntax has the operands backwards for this particular instruction. I wouldn't blame them if the non-Intel (and non-default) printing mode on a single random SIMD instruction wasn't checked carefully.

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 16:22):

jameysharp updated PR #8508.

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 16:26):

jameysharp commented on PR #8508:

Based on our discussion in the Cranelift meeting today I've updated this PR, and everything makes sense to me now. Thank you so much to @abrown and @fitzgen for helping me figure this out! I've updated the commit message to explain what happened:

The operand collector had these operands in src1/src2/dst order, but the pretty-printer fetched the allocations in dst/src1/src2 order instead.

Although our pretty-printer looked like it was printing src1/src2/dst, because it consumed operands in the wrong order, what it actually printed was src2/dst/src1.

Meanwhile, Capstone actually uses src2/src1/dst order in AT&T mode. (GNU objdump agrees.)

In the only filetest covering the vpsraq instruction, our output agreed with Capstone because register allocation picked the same register for both src1 and dst, so the two orders were indistinguishable. I've extended the filetest to force register allocation to pick different registers.

This format is also used for vpmullq, but we didn't have any compile filetests covering that instruction, so I've added one with the same register allocation pattern.

Now our pretty-printer agrees with Capstone on both instructions.

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 16:53):

abrown submitted PR review:

LGTM!

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 16:55):

abrown submitted PR review:

LGTM! (With a couple of emit tests to fix up...)

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 18:01):

jameysharp updated PR #8508.

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 18:17):

jameysharp updated PR #8508.

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 18:20):

jameysharp updated PR #8508.

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 18:20):

jameysharp commented on PR #8508:

Would you review again now that I've fixed the emit-tests? I found several more things to fix in the process.

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 20:48):

abrown submitted PR review.

view this post on Zulip Wasmtime GitHub notifications bot (May 01 2024 at 21:13):

abrown merged PR #8508.


Last updated: Oct 23 2024 at 20:03 UTC