Stream: git-wasmtime

Topic: wasmtime / PR #2068 x64: finish implementing wasm MVP sup...


view this post on Zulip Wasmtime GitHub notifications bot (Jul 23 2020 at 14:38):

bnjbvr opened PR #2068 from x64 to main:

This mixed bag of commits implements enough support so that Cranelift-in-Spidermonkey runs all the wasm test cases correctly, including multi-value, reference types, explicit bounds checks, etc. So with respect to Spidermonkey, this brings the x64 level support to the same as aarch64's (on Linux at least; Win64 fastcall support is still to be implemented); wasmtime might require additional work.

Next step is setting up CI so this is actually tested in automation.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 23 2020 at 14:38):

bnjbvr requested cfallin, julian-seward1 and jlb6740 for a review on PR #2068.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 23 2020 at 14:38):

bnjbvr requested cfallin, julian-seward1 and jlb6740 for a review on PR #2068.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 23 2020 at 14:38):

bnjbvr requested cfallin, julian-seward1 and jlb6740 for a review on PR #2068.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 23 2020 at 19:30):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 23 2020 at 19:30):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 23 2020 at 19:30):

cfallin created PR Review Comment:

{max,min}{ss,sd}?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 23 2020 at 19:30):

cfallin created PR Review Comment:

This could use some explanatory comments (it's very clever though!). It looks like it's emulating a rounding mode by converting input/2, except rounding up by OR'ing in the original LSB?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 23 2020 at 19:30):

cfallin created PR Review Comment:

Can we pull this out as a top-level helper function?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 23 2020 at 19:30):

cfallin created PR Review Comment:

(Ah, I see comments now below for this. Perhaps just add a forward-reference here, as in "see below for explanation".)

view this post on Zulip Wasmtime GitHub notifications bot (Jul 23 2020 at 19:30):

cfallin created PR Review Comment:

Would it make sense to merge CvtFloatToSintSeq and CvtFloatToUintSeq with a flag is_signed: bool?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 23 2020 at 19:30):

cfallin created PR Review Comment:

Perhaps instead change add_stackmap to take a parameter StackmapExtent, which is an enum with variants StackmapExtent::UpcomingBytes(len) and StackmapExtent::PriorBytes(len) or something like that?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 23 2020 at 19:30):

cfallin created PR Review Comment:

Should this be 0xffff_ffff_ffff_ffff (i.e. u64::MAX)? Or is the RegMemImm::imm sign-extended? (Comment if so?)

view this post on Zulip Wasmtime GitHub notifications bot (Jul 23 2020 at 19:30):

cfallin created PR Review Comment:

Looking at the sequences above for these conversions, it appears that src is sometimes accessed after the temps. A def is allowed to reuse the source's register if the source is dead after this instruction, so we might inadvertently overwrite the source when we initialize the temp. Could you verify that we copy the source out to a temp as the first action in each of the sequences above?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 23 2020 at 19:30):

cfallin created PR Review Comment:

Extract this sequence to a helper emit_fcmp, analogous to emit_cmp above?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 11:29):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 11:29):

julian-seward1 created PR Review Comment:

nit: could you add a comment here, explaining how the Movd/Movq cases are distinguished? Presumably via REX.W ? Because the two match- arms here are identical, and it's confusing to look at at first (kinda looks like a bug :)

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 11:29):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 11:29):

julian-seward1 created PR Review Comment:

ditto

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 11:37):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 11:37):

julian-seward1 created PR Review Comment:

This is (indeed as Chris says) very clever. Just a note that it won't work unchanged if we ever come to support X86_32, since cvtsi2s{d,s} will only take a 32 bit input in that case.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 11:38):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 11:38):

julian-seward1 created PR Review Comment:

Please, for clarity, specify whether the said "unsigned int" is 32 or 64 bits.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 11:44):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 11:44):

julian-seward1 created PR Review Comment:

It seems like the two comments are inconsistent:

/// Converts an unsigned int64 to a float64.
/// Is the target a 64-bits or 32-bits register?

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 11:47):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 11:47):

julian-seward1 created PR Review Comment:

We (collectively) should discuss register conventions, regarding get_regs and how it interacts with temp regs in these sequences, since I think we are in danger of falling into a nasty correctness hole otherwise. Not just here; the in-progress aarch64 atomics work has the same problem. Once we have a clear common understanding, we can go back and audit this stuff. I'm just noting it here; doesn't need to block landing right now.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 11:49):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 11:49):

julian-seward1 created PR Review Comment:

See also my comment above about need for clear analysis/policy/audit of temp register uses.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 12:05):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 12:05):

julian-seward1 created PR Review Comment:

Since Andps already exists, is there any point in adding Andpd? Similar question for Orpd. They are functionally identical, so the only reason for doing this is if we believe we'll get some kind of domain-crossing penalty by applying (eg) Andps to the output of some insn that generates F64x2 rather than F32x4. Reading the opt guide sec 3.5.2.3 "Bypass between execution domains" leads me to believe that Andpd/Andps/Orpd/Orps would all "live" in the "integer domain".

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 12:06):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 12:06):

julian-seward1 created PR Review Comment:

Or maybe all in the FP domain (who knows). Point is, I can't see that the implied 32x4 vs 64x2 differentiation would make any difference.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 12:12):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 12:12):

julian-seward1 created PR Review Comment:

Similar comments to the Andps/pd/Orps/pd duality.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 12:27):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 12:27):

julian-seward1 created PR Review Comment:

Would it be safer to panic on any cases that we don't expect to ever get -- are there any? Specifically, for equality, signfulness is meaningless, and for overflows -- aren't there both signed and unsigned overflows? Certainly on x64 there are (rflags.C vs rflags.O).

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 12:28):

julian-seward1 edited PR Review Comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 12:35):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:12):

bnjbvr created PR Review Comment:

@julian-seward1 Indeed, fixed the above comment for CvtUint64ToFloatSeq so it says to a float32/float64, thanks!

@cfallin I had thought of this; it just makes the emit code very long, and it was nice to have one axis to cut them. I hesitated between signed vs unsigned, or saturated vs not saturated, but the sat instructions emit very similar code, so splitting signed and unsigned made more sense. We could merge them back at some point, maybe using an enum to avoid manipulating the two booleans.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:12):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:21):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:21):

bnjbvr created PR Review Comment:

Good catch for CvtUint64ToFloatSeq, scary that testing didn't stumble upon it. Agreed we'll need a better policy in this kind of situation; in the meantime, I'll mark it as mod instead of use, and will make sure to add a copy of the real src to a temporary fed to this Inst's src; this is sufficient to ensure the src and tmps will get different registers (otherwise, they'd be conflicting writes). Thanks!

Other sequences use mod for the src operand for this same reason.

[and several minutes later] Even scarier, the checked div or rem sequence had the same potential bug. Changed it to mark the divisor operand as modified too.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:28):

bnjbvr created PR Review Comment:

It's sign-extended, i'll add a comment!

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:28):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:30):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:30):

bnjbvr created PR Review Comment:

I wasn't sure about this, to be honest: there aren't many opportunities for reuse, since the actual Fcmp lowering does things quite differently. Ok on principle, though.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:45):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:45):

bnjbvr created PR Review Comment:

Ah, good catch! At first, when the cvt instructions were not handled here, this would also set the RexFlags. Now we could unify movd and movq into a single enum variant, since they effectively have the same effect here. Will consider for later; it's nice to have the proper name in the enum list...

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:46):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:46):

bnjbvr created PR Review Comment:

Noted!

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:50):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:50):

bnjbvr created PR Review Comment:

Good question, I like having the properly typed names, but since the F64 variants may be one byte longer (because of 66-prefix)... That's a question for our Intel experts :-)

@abrown @jlb6740 Is there a preference to use e.g. andps over andpd, if we know the inputs are actually doubles? (See above reference to Intel's manual that Julian quoted.)

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:50):

bnjbvr edited PR Review Comment.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:56):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 14:56):

bnjbvr created PR Review Comment:

Hmm, reverted this change; this was code motion from the aarch64 code, but your comment made me realize the returned values likely only apply to aarch64, and eventually x64 didn't use it.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 15:04):

bnjbvr updated PR #2068 from x64 to main:

This mixed bag of commits implements enough support so that Cranelift-in-Spidermonkey runs all the wasm test cases correctly, including multi-value, reference types, explicit bounds checks, etc. So with respect to Spidermonkey, this brings the x64 level support to the same as aarch64's (on Linux at least; Win64 fastcall support is still to be implemented); wasmtime might require additional work.

Next step is setting up CI so this is actually tested in automation.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 16:45):

abrown created PR Review Comment:

leads me to believe that Andpd/Andps/Orpd/Orps would all "live" in the "integer domain".

@julian-seward1, reading that section, I would rather think that Andpd/Andps/Orpd/Orps all live in the FP domain, not in the SIMD (a.k.a. "integer") domain. The example there switches between ADDPS (FP) and PAND (SIMD) and back to ADDPS (FP). So I would expect that to keep everything in the same domain, to follow the manual's example, we would need ADDPS; ANDPS; ADDPS. Or am I missing something?

Since Andps already exists, is there any point in adding Andpd?

The question still stands, though, and since I don't know of any penalty like the one mentioned (and I believe packed single floats and packed double floats to be in the same FP domain), then the assembler could avoid knowing about another instruction. But what is the benefit of that? We can always choose ANDPS in the lowering if we are concerned about code size? I would leave it in so that lowering users have the option to use it if it feels more natural.

I think where I'm coming from (and I'm still developing this thought, so bear with me) is that at the assembler level, which is what we are talking about here, the best naming convention in the assembler code is one that matches the ISA naming convention 1:1. Instead of writing Inst::XmmAbcDef(SseOpcode::Andps, ...) (or its helper function) I am suggesting the abstraction should look more like ...::Andps(...) so that we more directly think in the ISA namespace than in Cranelift's custom namespace. From that viewpoint, it makes sense to me to have more instruction variants available (e.g. ANDPD) to reduce friction during lowering. (I understand that's still not strongest argument for keeping ANDPD but I'm really thinking more about the ISA naming point... it was a bit frustrating to have to think in terms of an ad hoc convention instead when I was addding SIMD FP arithmetic support).

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 16:45):

abrown submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 16:46):

bnjbvr updated PR #2068 from x64 to main:

This mixed bag of commits implements enough support so that Cranelift-in-Spidermonkey runs all the wasm test cases correctly, including multi-value, reference types, explicit bounds checks, etc. So with respect to Spidermonkey, this brings the x64 level support to the same as aarch64's (on Linux at least; Win64 fastcall support is still to be implemented); wasmtime might require additional work.

Next step is setting up CI so this is actually tested in automation.

view this post on Zulip Wasmtime GitHub notifications bot (Jul 24 2020 at 17:29):

bnjbvr merged PR #2068.


Last updated: Oct 23 2024 at 20:03 UTC