wasmtime / PR #2149 This patch fills in the missing piece... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #2149 This patch fills in the missing piece...

Wasmtime GitHub notifications bot (Aug 20 2020 at 05:40):

julian-seward1 opened PR #2149 from atomics-x64-CL to main:

… on newBE/x64. It does

this by providing an implementation of the CLIF instructions AtomicRmw, AtomicCas,
AtomicLoad, AtomicStore and Fence.

The translation is straightforward. AtomicCas is translated into x64 cmpxchg, AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing, AtomicStore becomes a
normal store followed by mfence, and Fence becomes mfence. AtomicRmw is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to cmpxchg it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation. AtomicRmw could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously. The LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.

Wasmtime GitHub notifications bot (Aug 20 2020 at 10:36):

julian-seward1 updated PR #2149 from atomics-x64-CL to main:

… on newBE/x64. It does

this by providing an implementation of the CLIF instructions AtomicRmw, AtomicCas,
AtomicLoad, AtomicStore and Fence.

The translation is straightforward. AtomicCas is translated into x64 cmpxchg, AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing, AtomicStore becomes a
normal store followed by mfence, and Fence becomes mfence. AtomicRmw is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to cmpxchg it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation. AtomicRmw could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously. The LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.

Wasmtime GitHub notifications bot (Aug 20 2020 at 11:08):

julian-seward1 updated PR #2149 from atomics-x64-CL to main:

… on newBE/x64. It does

this by providing an implementation of the CLIF instructions AtomicRmw, AtomicCas,
AtomicLoad, AtomicStore and Fence.

The translation is straightforward. AtomicCas is translated into x64 cmpxchg, AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing, AtomicStore becomes a
normal store followed by mfence, and Fence becomes mfence. AtomicRmw is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to cmpxchg it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation. AtomicRmw could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously. The LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.

Wasmtime GitHub notifications bot (Aug 20 2020 at 11:23):

julian-seward1 requested bnjbvr for a review on PR #2149.

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr submitted PR Review.

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr submitted PR Review.

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

Can you expand the names here, or at least add comments? I have no clues what M/L/S mean.

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

Maybe hoist the aarch64 version then, and hoist it in the machinst code or a new isa/common directory?

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: here and below, can you use doc comments instead, so they show up in LSP hovers/docs.rs, etc?

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: we do not use /* comments in general, can you use // instead?

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

With the new load helper introduced in the SIMD PR (that should land soonish, probably), you might be able to just use Inst::load here.

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: can you expand insn + precise that the second really means second in the loop?

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

Can you commonize all these match arms together? => Inst::alu_rmi_r(true, AluRmiROpcode::from(op), r10_rmi, r11_w) to avoid code duplication

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: here too, please don't use shorthands for read/write

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit (twice): what does rd mean?

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: please expand CAS at least once, with acronym in parenthesis.

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

Can you open an issue for the improvements, please, and refer to it with a TODO comment mentioning the issue number?

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

ditto insn

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

Could we remove this comment? It's a bit weird to read this here, since there's a sequence of vcode insts, and the vcode inst actually trashing this register already mentions it does that by the code in get_regs.

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

Can you remove the r_ prefixes, for consistency with the rest of this file?

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

I've tried to keep "Seq" in the name for synthetic sequences of instruction, can you put it as a suffix here please?

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: here and below, "mfence".to_string()

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: expand insn

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: can you expand mod too?

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: expand instruction

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:41):

julian-seward1 submitted PR Review.

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:41):

julian-seward1 created PR Review Comment:

Well those names are what Intel calls them: mfence, sfence and lfence. I could expand them to what Intel describes them as: Memory Fence, Store Fence and Load Fence respectively. Or maybe I should just add comments on the enum?

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:42):

julian-seward1 submitted PR Review.

Wasmtime GitHub notifications bot (Aug 20 2020 at 15:42):

julian-seward1 created PR Review Comment:

Yeah, I just spotted that.

Wasmtime GitHub notifications bot (Aug 20 2020 at 22:54):

abrown submitted PR Review.

Wasmtime GitHub notifications bot (Aug 20 2020 at 22:54):

abrown created PR Review Comment:

Should be in main now.

Wasmtime GitHub notifications bot (Aug 21 2020 at 08:48):

julian-seward1 submitted PR Review.

Wasmtime GitHub notifications bot (Aug 21 2020 at 08:48):

julian-seward1 created PR Review Comment:

Done. Although the result is actually longer:

            let i3 = if op ==  AtomicRMWOp::Xchg {
                AtomicRMWOp::Xchg => Inst::mov_r_r(true, r10, r11_w),
            } else {
                let alu_op = match op {
                    AtomicRMWOp::Add => AluRmiROpcode::Add,
                    AtomicRMWOp::Sub => AluRmiROpcode::Sub,
                    AtomicRMWOp::And => AluRmiROpcode::And,
                    AtomicRMWOp::Or => AluRmiROpcode::Or,
                    AtomicRMWOp::Xor => AluRmiROpcode::Xor,
                    AtomicRMWOp::Xchg => unreachable!(),
                }
                Inst::alu_rmi_r(true, alu_op, r10_rmi, r11_w)
            };

I tend to assume that rustc/LLVM will do tail-merging and hence cause all the duplication to disappear in the final machine code. I don't know that that's true, though.

Wasmtime GitHub notifications bot (Aug 21 2020 at 08:51):

julian-seward1 created PR Review Comment:

Read; I fixed all of these, and the wr and mod too.

Wasmtime GitHub notifications bot (Aug 21 2020 at 08:51):

julian-seward1 submitted PR Review.

Wasmtime GitHub notifications bot (Aug 21 2020 at 09:06):

julian-seward1 edited PR Review Comment.

Wasmtime GitHub notifications bot (Aug 21 2020 at 09:27):

julian-seward1 submitted PR Review.

Wasmtime GitHub notifications bot (Aug 21 2020 at 09:27):

julian-seward1 created PR Review Comment:

Filed as PR #2153.

Wasmtime GitHub notifications bot (Aug 21 2020 at 10:07):

julian-seward1 submitted PR Review.

Wasmtime GitHub notifications bot (Aug 21 2020 at 10:07):

julian-seward1 created PR Review Comment:

I moved it into a new file src/machinst/inst_common.rs.

Wasmtime GitHub notifications bot (Aug 21 2020 at 10:20):

julian-seward1 updated PR #2149 from atomics-x64-CL to main:

… on newBE/x64. It does

this by providing an implementation of the CLIF instructions AtomicRmw, AtomicCas,
AtomicLoad, AtomicStore and Fence.

The translation is straightforward. AtomicCas is translated into x64 cmpxchg, AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing, AtomicStore becomes a
normal store followed by mfence, and Fence becomes mfence. AtomicRmw is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to cmpxchg it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation. AtomicRmw could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously. The LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.

Wasmtime GitHub notifications bot (Aug 21 2020 at 10:23):

julian-seward1 updated PR #2149 from atomics-x64-CL to main:

… on newBE/x64. It does

this by providing an implementation of the CLIF instructions AtomicRmw, AtomicCas,
AtomicLoad, AtomicStore and Fence.

The translation is straightforward. AtomicCas is translated into x64 cmpxchg, AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing, AtomicStore becomes a
normal store followed by mfence, and Fence becomes mfence. AtomicRmw is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to cmpxchg it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation. AtomicRmw could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously. The LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.

Wasmtime GitHub notifications bot (Aug 21 2020 at 10:48):

julian-seward1 submitted PR Review.

Wasmtime GitHub notifications bot (Aug 21 2020 at 10:48):

julian-seward1 created PR Review Comment:

Done.

Wasmtime GitHub notifications bot (Aug 21 2020 at 10:50):

julian-seward1 updated PR #2149 from atomics-x64-CL to main:

… on newBE/x64. It does

this by providing an implementation of the CLIF instructions AtomicRmw, AtomicCas,
AtomicLoad, AtomicStore and Fence.

The translation is straightforward. AtomicCas is translated into x64 cmpxchg, AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing, AtomicStore becomes a
normal store followed by mfence, and Fence becomes mfence. AtomicRmw is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to cmpxchg it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation. AtomicRmw could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously. The LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.

Wasmtime GitHub notifications bot (Aug 21 2020 at 11:00):

julian-seward1 updated PR #2149 from atomics-x64-CL to main:

… on newBE/x64. It does

this by providing an implementation of the CLIF instructions AtomicRmw, AtomicCas,
AtomicLoad, AtomicStore and Fence.

The translation is straightforward. AtomicCas is translated into x64 cmpxchg, AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing, AtomicStore becomes a
normal store followed by mfence, and Fence becomes mfence. AtomicRmw is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to cmpxchg it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation. AtomicRmw could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously. The LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.

Wasmtime GitHub notifications bot (Aug 21 2020 at 12:00):

julian-seward1 requested bnjbvr for a review on PR #2149.

Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr submitted PR Review.

Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr submitted PR Review.

Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

With the approach suggested above for Xchg, we could spare the r10 register here, alleviating register pressure.

Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

nit: please make this a doc comment ///

Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

Could we not emit it when the opcode is Xchg? (That is, push it down within the else branch below)
Or even better, it seems all the moves could be avoided in the case of Xchg, since r10 could be passed as the read-only input of the cmpxchg instruction? (I know x86 chips eats moves for dinner, but seems better to not do any useless decoding work if we can avoid it!)

Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

Just an idea: if it's generating the same thing as a load, could the lowering be commonized with the rest of the Load-related instructions?

Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

I sympathize with the need to fix this comment, but I think this one is a bit imprecise too: can you write any virtual regs instead of just any regs?

Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

Here, you can use let rm = input_to_reg_mem(ctx, inputs[0]); here, and remove the comment about using 0(addr).

Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

I think you'll be able to use lower_to_amode once #2146 lands; if you happen to land before this, can you add a TODO in my PR please?

Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

ditto for lower_to_amode

Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

nit: can you use the usual rust camelCasing: AtomicRmwOp please?

Wasmtime GitHub notifications bot (Aug 24 2020 at 07:41):

julian-seward1 submitted PR Review.

Wasmtime GitHub notifications bot (Aug 24 2020 at 07:41):

julian-seward1 created PR Review Comment:

Well, probably yes; but I'd prefer to keep it separate as it logically belongs to the atomics group. Also there is an atomics-specific assertion and atomics-specific comments there.

Wasmtime GitHub notifications bot (Aug 24 2020 at 07:43):

julian-seward1 submitted PR Review.

Wasmtime GitHub notifications bot (Aug 24 2020 at 07:43):

julian-seward1 created PR Review Comment:

We could do that. I'd prefer to leave such improvements to the followup PR #2153 though. Also, it could be possibly fixed even better, by using lock xchg .. in this case. That's definitely PR #2153 territory, though.

Wasmtime GitHub notifications bot (Aug 24 2020 at 07:53):

julian-seward1 created PR Review Comment:

That doesn't work. It produces movzbq %v5Jb, %v6J, but the original was movzbq 0(%v5J), %v6J.

Wasmtime GitHub notifications bot (Aug 24 2020 at 07:53):

julian-seward1 submitted PR Review.

Wasmtime GitHub notifications bot (Aug 24 2020 at 07:58):

bjorn3 submitted PR Review.

Wasmtime GitHub notifications bot (Aug 24 2020 at 07:58):

bjorn3 created PR Review Comment:

lock xchg is equivalent to xchg: https://stackoverflow.com/questions/3144335/on-a-multicore-x86-is-a-lock-necessary-as-a-prefix-to-xchg

Wasmtime GitHub notifications bot (Aug 24 2020 at 09:02):

julian-seward1 updated PR #2149 from atomics-x64-CL to main:

… on newBE/x64. It does

this by providing an implementation of the CLIF instructions AtomicRmw, AtomicCas,
AtomicLoad, AtomicStore and Fence.

The translation is straightforward. AtomicCas is translated into x64 cmpxchg, AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing, AtomicStore becomes a
normal store followed by mfence, and Fence becomes mfence. AtomicRmw is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to cmpxchg it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation. AtomicRmw could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously. The LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.

Wasmtime GitHub notifications bot (Aug 24 2020 at 09:50):

julian-seward1 merged PR #2149.

Last updated: Apr 18 2025 at 07:03 UTC