wasmtime / Issue #2504 Draft: I128 support (partial) on x64. · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / Issue #2504 Draft: I128 support (partial) on x64.

Wasmtime GitHub notifications bot (Dec 13 2020 at 06:34):

For context, I should also note: this (I128 support) is one of the few pieces still missing from the new x64 backend. The Wasm frontend does not use it, but rustc_codegen_clif does (cc @bjorn3). I expect we would need a few more ops before we can try this with cg_clif, though.

Wasmtime GitHub notifications bot (Dec 13 2020 at 07:41):

julian-seward1 commented on Issue #2504:

@cfallin As a general comment, I see the dilemma. I haven't yet read the patch, but wanted to ask: part of the motivation for going this route was that we would then be able to remove the complexity, the semantic murkyness, and the potential for generating poor code, caused by having to explicitly represent the carry flag in CLIF. If we revert to expanding out doubleword arithmetic at the CLIF level, is there some other way we can avoid those disadvantages?

Also .. does it matter that, doing this per-target (per this patch) potentially makes it possible to use target-specific sequences, which wouldn't be possible if it were done at the CLIF level?

Wasmtime GitHub notifications bot (Dec 13 2020 at 07:58):

julian-seward1 commented on Issue #2504:

doing this per-target (per this patch) potentially makes it possible to use target-specific sequences

For example, doubleword comparison on x64 is a bit elaborate, but for aarch64 it's just cmp ; ccmp.

Wasmtime GitHub notifications bot (Dec 13 2020 at 09:44):

bjorn3 commented on Issue #2504:

I just tested this with cg_clif. As far as I can see this works fine, but there are still missing pieces both related and unrelated to 128bit int support which mean that I can't completely test this yet.

support necessary in any case:

isplit/iconcat

icmp.i128

tls_value

AbiPurpose::StructArg

support necessary to not require modifications of cg_clif:

load/store of i128

rotl.i128/rotr.i128

bitrev.i8

Wasmtime GitHub notifications bot (Dec 13 2020 at 09:45):

julian-seward1 commented on Issue #2504:

One thing that does concern me is struct ValueRegs, in particular the potential inefficiency induced by it. It's obviously fully general, in the sense that it can hold 4 unrelated Reg values, but it does require passing around a 128-bit thing where before we were just passing around a 32-bit thing (Reg). And, given that it requires indexing, I reckon it ain't gonna wind up in a vector register, hence will always be in memory.

I had wondered if such generality is really necessary. Given that -- IIUC -- all the result virtual register fields are allocated before isel starts, would it be adequate to have an representation which specifies a base virtual register, plus zero, one or two subsequently-numbered vregs of the same class? Then that would fit in 34 bits (hence, a uint64_t) and could be passed around in a (host) register just as Reg is now.

Even if that won't work, does ValueRegs really need to be template-typed? Are there any other types of things we need to store in there?

Wasmtime GitHub notifications bot (Dec 13 2020 at 09:56):

bjorn3 edited a comment on Issue #2504:

I just tested this with cg_clif. As far as I can see this works fine, but there are still missing pieces both related and unrelated to 128bit int support which mean that I can't completely test this yet.

support necessary in any case:

isplit/iconcat (https://github.com/cfallin/wasmtime/pull/14)

icmp.i128

tls_value

AbiPurpose::StructArg

support necessary to not require modifications of cg_clif:

load/store of i128

rotl.i128/rotr.i128

bitrev.i8

Wasmtime GitHub notifications bot (Dec 13 2020 at 18:41):

bjorn3 edited a comment on Issue #2504:

I just tested this with cg_clif. As far as I can see this works fine, but there are still missing pieces both related and unrelated to 128bit int support which mean that I can't completely test this yet.

support necessary in any case:

isplit/iconcat (https://github.com/cfallin/wasmtime/pull/14)

icmp.i128 (https://github.com/cfallin/wasmtime/pull/15)

tls_value

AbiPurpose::StructArg

support necessary to not require modifications of cg_clif:

load/store of i128

rotl.i128/rotr.i128

bitrev.i8

brz.i128/brnz.i128

Wasmtime GitHub notifications bot (Dec 13 2020 at 18:42):

bjorn3 edited a comment on Issue #2504:

I just tested this with cg_clif. As far as I can see this works fine, but there are still missing pieces both related and unrelated to 128bit int support which mean that I can't completely test this yet.

support necessary in any case:

isplit/iconcat (https://github.com/cfallin/wasmtime/pull/14)

icmp.i128 (https://github.com/cfallin/wasmtime/pull/15)

tls_value

AbiPurpose::StructArg

call with i128 arguments

support necessary to not require modifications of cg_clif:

load/store of i128

rotl.i128/rotr.i128

bitrev.i8

brz.i128/brnz.i128

Wasmtime GitHub notifications bot (Dec 13 2020 at 19:52):

cfallin commented on Issue #2504:

So I'm starting to warm up to this approach relative to falling back on existing legalizations; it's really not too much more work to get to the end-point and the legalization is pretty hairy too. I'm working on the additional ops now; hopefully ready soon!

To a few design-question points:

One thing that does concern me is struct ValueRegs, in particular the potential inefficiency induced by it. It's obviously fully general, in the sense that it can hold 4 unrelated Reg values, but it does require passing around a 128-bit thing where before we were just passing around a 32-bit thing (Reg). And, given that it requires indexing, I reckon it ain't gonna wind up in a vector register, hence will always be in memory.

I played with this a bit with Compiler Explorer (link) -- it seems it's passed by reference as an arg, but returned in two regs (rdx:rax). So, yeah, not as efficient as I had hoped, but at least it's not malloc'ing... though see below for alternative idea.

I had wondered if such generality is really necessary. Given that -- IIUC -- all the result virtual register fields are allocated before isel starts, would it be adequate to have an representation which specifies a base virtual register, plus zero, one or two subsequently-numbered vregs of the same class? Then that would fit in 34 bits (hence, a uint64_t) and could be passed around in a (host) register just as Reg is now.

Tempting idea! This doesn't handle the real-register case though (needed at least for ABI specs). I suppose we could special-case that but then we bifurcate some other APIs (gen load/spill, move, ...).

I'm thinking that we might actually be able to optimize in another (much simpler) way -- on 64-bit machines and with our largest types being 128-bit, we can get away with the 1 / 2-reg cases. Perhaps we could define a static constant for size and, with some config magic, select 2 unless arm32 (or later, x86-32, etc.) are compiled in. Then we're just passing around 64 bits, which with padding is probably free relative to our current 32-bit size in most cases.

Even if that won't work, does ValueRegs really need to be template-typed? Are there any other types of things we need to store in there?

Yes, I did this to try to keep to the spirit of Reg, WritableReg, VirtualReg, RealReg. At least for the former two, I tried at first doing Writable<ValueRegs> instead, but Writable is defined in regalloc and requires implementation of a private trait, i.e., it's closed to new uses. IMHO it's somewhat nice for regs() to return (an array of) the correctly-typed thing in many cases...

Wasmtime GitHub notifications bot (Dec 13 2020 at 20:18):

julian-seward1 commented on Issue #2504:

@cfallin Great work, btw! I should have said that earlier.

Wasmtime GitHub notifications bot (Dec 14 2020 at 00:49):

cfallin commented on Issue #2504:

Just added implementations for (I think) all of the relevant i128 ops. Will make a cleanup pass and address some of the comments above later. @bjorn3, I didn't get to the TLS support or StructArg support yet, but this might be close enough to hack something together? If not, I'll return to this in a bit :-)

Wasmtime GitHub notifications bot (Dec 14 2020 at 02:54):

cfallin commented on Issue #2504:

@bjorn3 I just added what I think might be a sufficient implementation of ELF TLS support (basically just copying your recipe in the old x86 backend), and also permitted StructReturn-purpose args to compile. I'm not sure if any more needs to be done for the latter (does cg_clif rely on some sort of legalization that lowers into a struct-return pointer?).

Let me know if this is sufficient to get cg_clif working! If so, I'll clean this up a bit and split out some pieces to actually land it.

Wasmtime GitHub notifications bot (Dec 14 2020 at 03:01):

cfallin edited a comment on Issue #2504:

@bjorn3 I just added what I think might be a sufficient implementation of ELF TLS support (basically just copying your recipe in the old x86 backend), and also permitted StructReturn (edit: and StructArgument)-purpose args to compile. I'm not sure if any more needs to be done for the latter (does cg_clif rely on some sort of legalization that lowers into a struct-return pointer?).

Let me know if this is sufficient to get cg_clif working! If so, I'll clean this up a bit and split out some pieces to actually land it.

Wasmtime GitHub notifications bot (Dec 14 2020 at 03:01):

cfallin edited a comment on Issue #2504:

@bjorn3 I just added what I think might be a sufficient implementation of ELF TLS support (basically just copying your recipe in the old x86 backend), and also permitted StructReturn (edit: and StructArgument)-purpose args to compile. I'm not sure if any more needs to be done for the latter (does cg_clif rely on some sort of legalization that lowers into a struct-arg/return pointer?).

Let me know if this is sufficient to get cg_clif working! If so, I'll clean this up a bit and split out some pieces to actually land it.

Wasmtime GitHub notifications bot (Dec 14 2020 at 09:13):

bjorn3 commented on Issue #2504:

Let me know if this is sufficient to get cg_clif working!

Almost. The standard library compiles with the above TLS fix, but std_example fails at several points:

saturating_sub::<i128>() (select.i128?)

u128 -> float cast (incorrect return value)

__m128i -> [u8; 16] transmute (raw_bitcast between i64x2 and i128?)

Wasmtime GitHub notifications bot (Dec 14 2020 at 09:21):

bjorn3 edited a comment on Issue #2504:

Let me know if this is sufficient to get cg_clif working!

Almost. The standard library compiles with the above TLS fix, but std_example fails at several points:

saturating_sub::<i128>() (select.i128?)

u128 -> float cast (incorrect return value)

<strike>__m128i -> [u8; 16] transmute (raw_bitcast between i64x2 and i128?)</strike> (edit: it fixed itself somehow)

Wasmtime GitHub notifications bot (Dec 14 2020 at 10:01):

bjorn3 commented on Issue #2504:

I got a SIGSEGV on simple-raytracer. I reduced it to:

struct Vec3 {
    x: f64,
    y: f64,
    z: f64,
}

impl Vec3 {
    fn neg(self) -> Vec3 {
        Vec3 {
            x: -self.x,
            y: -self.y,
            z: -self.z,
        }
    }
}

fn main() {
    Vec3 {
        x: 0.0,
        y: 0.0,
        z: 0.0,
    }.neg();
}

(rr) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000560560f17c2a in main::Vec3::neg () at src/bin/main.rs:10
10                  x: -self.x,
(rr) disassemble/r
Dump of assembler code for function main::Vec3::neg:
   0x0000560560f17c17 <+0>:     55      push   %rbp
   0x0000560560f17c18 <+1>:     48 89 e5        mov    %rsp,%rbp
   0x0000560560f17c1b <+4>:     48 b8 00 00 00 00 00 00 00 80   movabs $0x8000000000000000,%rax
   0x0000560560f17c25 <+14>:    66 48 0f 6e c0  movq   %rax,%xmm0
=> 0x0000560560f17c2a <+19>:    66 0f 57 06     xorpd  (%rsi),%xmm0
   0x0000560560f17c2e <+23>:    48 b8 00 00 00 00 00 00 00 80   movabs $0x8000000000000000,%rax
   0x0000560560f17c38 <+33>:    66 48 0f 6e c8  movq   %rax,%xmm1
   0x0000560560f17c3d <+38>:    66 0f 57 4e 08  xorpd  0x8(%rsi),%xmm1
   0x0000560560f17c42 <+43>:    48 b8 00 00 00 00 00 00 00 80   movabs $0x8000000000000000,%rax
   0x0000560560f17c4c <+53>:    66 48 0f 6e d0  movq   %rax,%xmm2
   0x0000560560f17c51 <+58>:    66 0f 57 56 10  xorpd  0x10(%rsi),%xmm2
   0x0000560560f17c56 <+63>:    f2 0f 11 07     movsd  %xmm0,(%rdi)
   0x0000560560f17c5a <+67>:    f2 0f 11 4f 08  movsd  %xmm1,0x8(%rdi)
   0x0000560560f17c5f <+72>:    f2 0f 11 57 10  movsd  %xmm2,0x10(%rdi)
   0x0000560560f17c64 <+77>:    48 89 ec        mov    %rbp,%rsp
   0x0000560560f17c67 <+80>:    5d      pop    %rbp
   0x0000560560f17c68 <+81>:    c3      retq
End of assembler dump.
(rr) info reg rsi
rsi            0x7ffd802b5678      140726753777272
(rr) p *(0x7ffd802b5678 as *const usize)
$1 = 0

Wasmtime GitHub notifications bot (Dec 17 2020 at 14:14):

bjorn3 edited a comment on Issue #2504:

I got a SIGSEGV on simple-raytracer. I reduced it to:

struct Vec3 {
    x: f64,
    y: f64,
    z: f64,
}

impl Vec3 {
    fn neg(self) -> Vec3 {
        Vec3 {
            x: -self.x,
            y: -self.y,
            z: -self.z,
        }
    }
}

fn main() {
    Vec3 {
        x: 0.0,
        y: 0.0,
        z: 0.0,
    }.neg();
}

(rr) cont
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000560560f17c2a in main::Vec3::neg () at src/bin/main.rs:10
10                  x: -self.x,
(rr) disassemble/r
Dump of assembler code for function main::Vec3::neg:
   0x0000560560f17c17 <+0>:     55      push   %rbp
   0x0000560560f17c18 <+1>:     48 89 e5        mov    %rsp,%rbp
   0x0000560560f17c1b <+4>:     48 b8 00 00 00 00 00 00 00 80   movabs $0x8000000000000000,%rax
   0x0000560560f17c25 <+14>:    66 48 0f 6e c0  movq   %rax,%xmm0
=> 0x0000560560f17c2a <+19>:    66 0f 57 06     xorpd  (%rsi),%xmm0
   0x0000560560f17c2e <+23>:    48 b8 00 00 00 00 00 00 00 80   movabs $0x8000000000000000,%rax
   0x0000560560f17c38 <+33>:    66 48 0f 6e c8  movq   %rax,%xmm1
   0x0000560560f17c3d <+38>:    66 0f 57 4e 08  xorpd  0x8(%rsi),%xmm1
   0x0000560560f17c42 <+43>:    48 b8 00 00 00 00 00 00 00 80   movabs $0x8000000000000000,%rax
   0x0000560560f17c4c <+53>:    66 48 0f 6e d0  movq   %rax,%xmm2
   0x0000560560f17c51 <+58>:    66 0f 57 56 10  xorpd  0x10(%rsi),%xmm2
   0x0000560560f17c56 <+63>:    f2 0f 11 07     movsd  %xmm0,(%rdi)
   0x0000560560f17c5a <+67>:    f2 0f 11 4f 08  movsd  %xmm1,0x8(%rdi)
   0x0000560560f17c5f <+72>:    f2 0f 11 57 10  movsd  %xmm2,0x10(%rdi)
   0x0000560560f17c64 <+77>:    48 89 ec        mov    %rbp,%rsp
   0x0000560560f17c67 <+80>:    5d      pop    %rbp
   0x0000560560f17c68 <+81>:    c3      retq
End of assembler dump.
(rr) info reg rsi
rsi            0x7ffd802b5678      140726753777272
(rr) p *(0x7ffd802b5678 as *const usize)
$1 = 0

Edit: This bug is unrelated to this PR. I opened #2507.

Wasmtime GitHub notifications bot (Dec 28 2020 at 08:38):

cfallin commented on Issue #2504:

@bjorn3 I just pushed a number of fixes/updates; this I think addresses everything above except for the StructReturn fix and whatever is causing the u128 to/from float conversion error. (I'm testing with a local rustc_codegen_cranelift checkout; I see the same error that you're presumably seeing on std_example.rs line 79, converting u128->f32).

I'm not sure if the latter has to do with the former; it seems that the u128->f32 conversion is a call into libgcc so I suspect it's an ABI issue rather than an i128 arithmetic issue. (Thank goodness for that; I started to work out the open-coded sequence for each of the u128 to/from f32/f64 cases, signed/unsigned, and quickly ran away...)

I won't have more time to look at this for a bit (I'm mostly away until after the new year) but if you have any further debugging insights or want to try to implement the StructReturn fix (I think we just need to preprend a StructReturn return value if an arg exists, and wire the value through) please feel free!

(Also it mostly goes without saying but since this PR has grown quite a bit beyond just i128 arithmetic, I plan to split it up into proper pieces for separate review once we confirm that cg_clif works.)

Wasmtime GitHub notifications bot (Dec 28 2020 at 09:42):

bjorn3 commented on Issue #2504:

I'm not sure if the latter has to do with the former; it seems that the u128->f32 conversion is a call into libgcc so I suspect it's an ABI issue rather than an i128 arithmetic issue.

On my system it calls into rust's compiler-builtins.

or want to try to implement the StructReturn fix (I think we just need to preprend a StructReturn return value if an arg exists, and wire the value through)

That and use the correct register. It isn't very important though as the old backend didn't implement it either.

Wasmtime GitHub notifications bot (Dec 28 2020 at 10:23):

bjorn3 commented on Issue #2504:

Minimal repro:

fn convert(i: u128) -> f32 {
    if i == 0 {
        // this branch is taken
        return 0.0;
    }

    extract_sign(i); // except when this call is removed

    100.0
}

fn main() {
    assert_eq!(convert(100u128), 100.0);
}

fn extract_sign(i: u128) {}

Assembly of convert:

00000000000f4af5 <_ZN11std_example7convert17ha5c22c0f04088fadE>:
   f4af5:       55                      push   %rbp
   f4af6:       48 89 e5                mov    %rsp,%rbp
   f4af9:       48 83 ff 00             cmp    $0x0,%rdi
   f4afd:       40 0f 94 c0             sete   %al
   f4b01:       48 83 fe 00             cmp    $0x0,%rsi
   f4b05:       40 0f 94 c1             sete   %cl
   f4b09:       48 21 c1                and    %rax,%rcx
   f4b0c:       0f 84 08 00 00 00       je     f4b1a <_ZN11std_example7convert17ha5c22c0f04088fadE+0x25>
   f4b12:       0f 57 c0                xorps  %xmm0,%xmm0
   f4b15:       48 89 ec                mov    %rbp,%rsp
   f4b18:       5d                      pop    %rbp
   f4b19:       c3                      retq
   f4b1a:       e8 a1 01 00 00          callq  f4cc0 <_ZN11std_example12extract_sign17h8b332672bd416c0aE>
   f4b1f:       be 00 00 c8 42          mov    $0x42c80000,%esi
   f4b24:       66 0f 6e c6             movd   %esi,%xmm0
   f4b28:       48 89 ec                mov    %rbp,%rsp
   f4b2b:       5d                      pop    %rbp
   f4b2c:       c3                      retq

Wasmtime GitHub notifications bot (Dec 28 2020 at 21:56):

cfallin commented on Issue #2504:

@bjorn3 thanks for that repro -- the disassembly made it pretty clear what was wrong: brz/brnz on an i128 used an incorrect width of and/or instruction. Pushed a fix just now.

Locally, ./test.sh in rustc_codegen_cranelift using this branch now seems to pass all the tests; it dies later during benchmarking but I'm not sure what's wrong:
Benchmark #2: ../build/cargo.sh build
Error: Command terminated with non-zero exit code. Use the '-i'/'--ignore-failure' option if you want to ignore this. Alternatively, use the '--show-output' option to debug what went wrong.
(Also, in case it's relevant, I set JIT_SUPPORTED=0 in scripts/config.sh as that seemed to get me past an issue related to the TLS relocation not being handled in cranelift-jit. Known issue?)

In any case, I'm happy that this seems to allow cg_clif to pass the tests in examples/ :-)

Wasmtime GitHub notifications bot (Dec 28 2020 at 22:17):

bjorn3 commented on Issue #2504:

Locally, ./test.sh in rustc_codegen_cranelift using this branch now seems to pass all the tests

:tada:

it dies later during benchmarking but I'm not sure what's wrong:

Can you add --show-output to the hyperfine invocation in scripts/tests.sh? I can't test it myself until tomorrow. You could also cd to simple-raytracer and then run ../build/cargo.sh build directly to avoid having to build the sysroot again.

Wasmtime GitHub notifications bot (Dec 29 2020 at 02:03):

cfallin commented on Issue #2504:

I tried building simple-raytracer, and the cg_clif invocation to compile the tiff crate is dying with a segfault:
   Compiling tiff v0.2.1
error: could not compile `tiff`

Caused by:
  process didn't exit successfully: `/home/cfallin/work/rustc_codegen_cranelift/build/bin/cg_clif --crate-name tiff [...]` (signal: 11, SIGSEGV: invalid memory reference)
I tracked this back to a nonsense pointer passed into slice::copy_from_slice and tracked it backward with rr for a while; but couldn't reach the origin (it's a somewhat manual process tracing through disassemblies and setting memory watchpoints). Valgrind shows that the pointer itself is undefined data. So we're missing some sort of dataflow linkage (a store? ABI semantics related to wide values?) but I've hit my quota for today. Curious to see if you can discover more!

(I implemented StructReturn slightly more properly but this didn't seem to be the problem; wasn't likely, as you noted, but wanted to be sure.)

Wasmtime GitHub notifications bot (Dec 29 2020 at 06:06):

bjorn3 commented on Issue #2504:

This is a proc macro that has an ABI incompatibility with the cg_llvm comliled rustc due to StructArgument not yet being implemented.

Wasmtime GitHub notifications bot (Dec 29 2020 at 09:07):

cfallin commented on Issue #2504:

@bjorn3 I went ahead and implemented StructArgument as well. With that done, simple-raytracer builds, and its binary runs to completion. (The following is the first result computed via rustc -> cg_clif -> x64-new-backend, aside from compilations themselves: result.png!)

There do seem to be a few last test failures later in the test.sh run (in the stdlib I guess?): 1139 passed; 16 failed, mostly some i8/i16 ops and some FP string formatting issues. The former doesn't surprise me too much as narrow-value support is still somewhat young relative to the 32/64-bit paths that are well-hammered by Wasm. In any case, if you (or anyone else) feel like taking a look at those last issues, please do feel free!

I'll clean up this mess of a draft branch when I'm actually "working" again on Jan 4 :-)

Wasmtime GitHub notifications bot (Dec 29 2020 at 22:14):

cfallin commented on Issue #2504:

Last two fixes pushed: DIV on 8-bit values places remainders in AH, not DL (the weird and wonderful depths of x86 never cease to amuse!), and our select impl wasn't recognizing that booleans are integers and need integer moves.

All stdlib tests pass now! I really will close my laptop now and read some books until I clean this up in the new year :-)

Wasmtime GitHub notifications bot (Dec 29 2020 at 22:36):

bjorn3 commented on Issue #2504:

Yay! I hope you will have a great New Year!

Wasmtime GitHub notifications bot (Dec 30 2020 at 08:30):

julian-seward1 commented on Issue #2504:

@cfallin

DIV on 8-bit values places remainders in AH, not DL
That's surely a case of the general redundant-REX (0x40) rule, or in this case the lack of 0x40? Does the remainder go in DL if there's an 0x40 prefix?

Wasmtime GitHub notifications bot (Dec 30 2020 at 08:30):

julian-seward1 edited a comment on Issue #2504:

@cfallin

DIV on 8-bit values places remainders in AH, not DL

That's surely a case of the general redundant-REX (0x40) rule, or in this case the lack of 0x40? Does the remainder go in DL if there's an 0x40 prefix?

Wasmtime GitHub notifications bot (Dec 30 2020 at 17:56):

cfallin commented on Issue #2504:

@cfallin

DIV on 8-bit values places remainders in AH, not DL

That's surely a case of the general redundant-REX (0x40) rule, or in this case the lack of 0x40? Does the remainder go in DL if there's an 0x40 prefix?

Sadly no, a REX prefix doesn't change anything -- it's just the way the 8-bit variant is defined (reference). A likely remnant of the 8086 days when AL and AH were separately-useful registers for byte-sized values...

Wasmtime GitHub notifications bot (Dec 30 2020 at 17:59):

cfallin edited a comment on Issue #2504:

@cfallin

DIV on 8-bit values places remainders in AH, not DL

That's surely a case of the general redundant-REX (0x40) rule, or in this case the lack of 0x40? Does the remainder go in DL if there's an 0x40 prefix?

Sadly no, a REX prefix doesn't change anything -- it's just the way the 8-bit variant is defined (reference). A likely remnant of the 8086 days when AL and AH were separately-useful registers for byte-sized values...

(Edit: here's another reference showing more -- a REX byte will change whether the divisor arg refers to low byte of one of the 16 x86-64 registers, or a low/high byte of one of the 8 original registers; but remainder always goes into AH.)

Wasmtime GitHub notifications bot (Jan 04 2021 at 06:02):

cfallin commented on Issue #2504:

Did some rebase/squash surgery on this branch -- I will split it into I think 8 PRs:

Multi-reg value framework

I128 ops for x64 backend

TLS support in x64 backend

StructArg/StructRet ABI support in x64/aarch64

Fix to #2508 (REX prefix fixes for 8-bit instructions)

Fix to #2507 (Avoid unaligned SSE loads)

urem/srem bugfix for 8-bit DIV

select.b1 bugfix

Several of the above are dependent on the multi-reg refactor but the rest should be reviewable in parallel.

Wasmtime GitHub notifications bot (Jan 04 2021 at 06:37):

cfallin commented on Issue #2504:

The pieces were a bit more interrelated than I had hoped so this became four PRs: #2538 (framework), #2539 (i128 ops), #2540 (TLS), #2541 (struct args). 2539 depends on 2538; and 2540/2541 depend on 2538 and 2539.

Wasmtime GitHub notifications bot (Jan 05 2021 at 17:39):

cfallin commented on Issue #2504:

Closing this draft PR now that the real PRs resulting from it are under review.

Last updated: Apr 18 2025 at 14:03 UTC