Stream: git-wasmtime

Topic: wasmtime / PR #2149 This patch fills in the missing piece...


view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 05:40):

julian-seward1 opened PR #2149 from atomics-x64-CL to main:

… on newBE/x64. It does

this by providing an implementation of the CLIF instructions AtomicRmw, AtomicCas,
AtomicLoad, AtomicStore and Fence.

The translation is straightforward. AtomicCas is translated into x64 cmpxchg, AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing, AtomicStore becomes a
normal store followed by mfence, and Fence becomes mfence. AtomicRmw is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to cmpxchg it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation. AtomicRmw could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously. The LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 10:36):

julian-seward1 updated PR #2149 from atomics-x64-CL to main:

… on newBE/x64. It does

this by providing an implementation of the CLIF instructions AtomicRmw, AtomicCas,
AtomicLoad, AtomicStore and Fence.

The translation is straightforward. AtomicCas is translated into x64 cmpxchg, AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing, AtomicStore becomes a
normal store followed by mfence, and Fence becomes mfence. AtomicRmw is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to cmpxchg it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation. AtomicRmw could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously. The LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 11:08):

julian-seward1 updated PR #2149 from atomics-x64-CL to main:

… on newBE/x64. It does

this by providing an implementation of the CLIF instructions AtomicRmw, AtomicCas,
AtomicLoad, AtomicStore and Fence.

The translation is straightforward. AtomicCas is translated into x64 cmpxchg, AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing, AtomicStore becomes a
normal store followed by mfence, and Fence becomes mfence. AtomicRmw is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to cmpxchg it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation. AtomicRmw could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously. The LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 11:23):

julian-seward1 requested bnjbvr for a review on PR #2149.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

Can you expand the names here, or at least add comments? I have no clues what M/L/S mean.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

Maybe hoist the aarch64 version then, and hoist it in the machinst code or a new isa/common directory?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: here and below, can you use doc comments instead, so they show up in LSP hovers/docs.rs, etc?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: we do not use /* comments in general, can you use // instead?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

With the new load helper introduced in the SIMD PR (that should land soonish, probably), you might be able to just use Inst::load here.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: can you expand insn + precise that the second really means second in the loop?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

Can you commonize all these match arms together? => Inst::alu_rmi_r(true, AluRmiROpcode::from(op), r10_rmi, r11_w) to avoid code duplication

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: here too, please don't use shorthands for read/write

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit (twice): what does rd mean?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: please expand CAS at least once, with acronym in parenthesis.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

Can you open an issue for the improvements, please, and refer to it with a TODO comment mentioning the issue number?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

ditto insn

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

Could we remove this comment? It's a bit weird to read this here, since there's a sequence of vcode insts, and the vcode inst actually trashing this register already mentions it does that by the code in get_regs.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

Can you remove the r_ prefixes, for consistency with the rest of this file?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

I've tried to keep "Seq" in the name for synthetic sequences of instruction, can you put it as a suffix here please?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: here and below, "mfence".to_string()

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: expand insn

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: can you expand mod too?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:33):

bnjbvr created PR Review Comment:

nit: expand instruction

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:41):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:41):

julian-seward1 created PR Review Comment:

Well those names are what Intel calls them: mfence, sfence and lfence. I could expand them to what Intel describes them as: Memory Fence, Store Fence and Load Fence respectively. Or maybe I should just add comments on the enum?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:42):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:42):

julian-seward1 created PR Review Comment:

Yeah, I just spotted that.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 22:54):

abrown submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 22:54):

abrown created PR Review Comment:

Should be in main now.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 08:48):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 08:48):

julian-seward1 created PR Review Comment:

Done. Although the result is actually longer:

            let i3 = if op ==  AtomicRMWOp::Xchg {
                AtomicRMWOp::Xchg => Inst::mov_r_r(true, r10, r11_w),
            } else {
                let alu_op = match op {
                    AtomicRMWOp::Add => AluRmiROpcode::Add,
                    AtomicRMWOp::Sub => AluRmiROpcode::Sub,
                    AtomicRMWOp::And => AluRmiROpcode::And,
                    AtomicRMWOp::Or => AluRmiROpcode::Or,
                    AtomicRMWOp::Xor => AluRmiROpcode::Xor,
                    AtomicRMWOp::Xchg => unreachable!(),
                }
                Inst::alu_rmi_r(true, alu_op, r10_rmi, r11_w)
            };

I tend to assume that rustc/LLVM will do tail-merging and hence cause all the duplication to disappear in the final machine code. I don't know that that's true, though.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 08:51):

julian-seward1 created PR Review Comment:

Read; I fixed all of these, and the wr and mod too.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 08:51):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 09:06):

julian-seward1 edited PR Review Comment.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 09:27):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 09:27):

julian-seward1 created PR Review Comment:

Filed as PR #2153.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 10:07):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 10:07):

julian-seward1 created PR Review Comment:

I moved it into a new file src/machinst/inst_common.rs.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 10:20):

julian-seward1 updated PR #2149 from atomics-x64-CL to main:

… on newBE/x64. It does

this by providing an implementation of the CLIF instructions AtomicRmw, AtomicCas,
AtomicLoad, AtomicStore and Fence.

The translation is straightforward. AtomicCas is translated into x64 cmpxchg, AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing, AtomicStore becomes a
normal store followed by mfence, and Fence becomes mfence. AtomicRmw is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to cmpxchg it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation. AtomicRmw could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously. The LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 10:23):

julian-seward1 updated PR #2149 from atomics-x64-CL to main:

… on newBE/x64. It does

this by providing an implementation of the CLIF instructions AtomicRmw, AtomicCas,
AtomicLoad, AtomicStore and Fence.

The translation is straightforward. AtomicCas is translated into x64 cmpxchg, AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing, AtomicStore becomes a
normal store followed by mfence, and Fence becomes mfence. AtomicRmw is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to cmpxchg it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation. AtomicRmw could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously. The LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 10:48):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 10:48):

julian-seward1 created PR Review Comment:

Done.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 10:50):

julian-seward1 updated PR #2149 from atomics-x64-CL to main:

… on newBE/x64. It does

this by providing an implementation of the CLIF instructions AtomicRmw, AtomicCas,
AtomicLoad, AtomicStore and Fence.

The translation is straightforward. AtomicCas is translated into x64 cmpxchg, AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing, AtomicStore becomes a
normal store followed by mfence, and Fence becomes mfence. AtomicRmw is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to cmpxchg it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation. AtomicRmw could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously. The LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 11:00):

julian-seward1 updated PR #2149 from atomics-x64-CL to main:

… on newBE/x64. It does

this by providing an implementation of the CLIF instructions AtomicRmw, AtomicCas,
AtomicLoad, AtomicStore and Fence.

The translation is straightforward. AtomicCas is translated into x64 cmpxchg, AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing, AtomicStore becomes a
normal store followed by mfence, and Fence becomes mfence. AtomicRmw is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to cmpxchg it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation. AtomicRmw could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously. The LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 12:00):

julian-seward1 requested bnjbvr for a review on PR #2149.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

With the approach suggested above for Xchg, we could spare the r10 register here, alleviating register pressure.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

nit: please make this a doc comment ///

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

Could we not emit it when the opcode is Xchg? (That is, push it down within the else branch below)
Or even better, it seems all the moves could be avoided in the case of Xchg, since r10 could be passed as the read-only input of the cmpxchg instruction? (I know x86 chips eats moves for dinner, but seems better to not do any useless decoding work if we can avoid it!)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

Just an idea: if it's generating the same thing as a load, could the lowering be commonized with the rest of the Load-related instructions?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

I sympathize with the need to fix this comment, but I think this one is a bit imprecise too: can you write any virtual regs instead of just any regs?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

Here, you can use let rm = input_to_reg_mem(ctx, inputs[0]); here, and remove the comment about using 0(addr).

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

I think you'll be able to use lower_to_amode once #2146 lands; if you happen to land before this, can you add a TODO in my PR please?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

ditto for lower_to_amode

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 14:46):

bnjbvr created PR Review Comment:

nit: can you use the usual rust camelCasing: AtomicRmwOp please?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2020 at 07:41):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2020 at 07:41):

julian-seward1 created PR Review Comment:

Well, probably yes; but I'd prefer to keep it separate as it logically belongs to the atomics group. Also there is an atomics-specific assertion and atomics-specific comments there.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2020 at 07:43):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2020 at 07:43):

julian-seward1 created PR Review Comment:

We could do that. I'd prefer to leave such improvements to the followup PR #2153 though. Also, it could be possibly fixed even better, by using lock xchg .. in this case. That's definitely PR #2153 territory, though.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2020 at 07:53):

julian-seward1 created PR Review Comment:

That doesn't work. It produces movzbq %v5Jb, %v6J, but the original was movzbq 0(%v5J), %v6J.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2020 at 07:53):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2020 at 07:58):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2020 at 07:58):

bjorn3 created PR Review Comment:

lock xchg is equivalent to xchg: https://stackoverflow.com/questions/3144335/on-a-multicore-x86-is-a-lock-necessary-as-a-prefix-to-xchg

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2020 at 09:02):

julian-seward1 updated PR #2149 from atomics-x64-CL to main:

… on newBE/x64. It does

this by providing an implementation of the CLIF instructions AtomicRmw, AtomicCas,
AtomicLoad, AtomicStore and Fence.

The translation is straightforward. AtomicCas is translated into x64 cmpxchg, AtomicLoad
becomes a normal load because x64-TSO provides adequate sequencing, AtomicStore becomes a
normal store followed by mfence, and Fence becomes mfence. AtomicRmw is the only
complex case: it becomes a normal load, followed by a loop which computes an updated value,
tries to cmpxchg it back to memory, and repeats if necessary.

This is a minimum-effort initial implementation. AtomicRmw could be implemented more
efficiently using LOCK-prefixed integer read-modify-write instructions in the case where the old
value in memory is not required. Subsequent work could add that, if required.

The x64 emitter has been updated to emit the new instructions, obviously. The LegacyPrefix
mechanism has been revised to handle multiple prefix bytes, not just one, since it is now
sometimes necessary to emit both 0x66 (Operand Size Override) and F0 (Lock).

In the aarch64 implementation of atomics, there has been some minor renaming for the sake of
clarity, and for consistency with this x64 implementation.

<!--

Please ensure that the following steps are all taken care of before submitting
the PR.

Please ensure all communication adheres to the code of conduct.
-->

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2020 at 09:50):

julian-seward1 merged PR #2149.


Last updated: Dec 23 2024 at 12:05 UTC