AArch64 Atomic Memory Accesses · cranelift

Stream: cranelift

Topic: AArch64 Atomic Memory Accesses

Anton Kirilov (Aug 28 2020 at 21:47):

@Julian Seward I couldn't review your AArch64 atomics changes when you posted them, but I could recently and I have a couple of comments.
Most of them are based on the Wasm memory model presented in the OOSPLA 2019 paper Weakening WebAssembly that has been included in the threading proposal repository.
The main point is that even though the proposal overview states that atomic memory accesses are sequentially consistent, what's actually meant according to my understanding is that they are sequentially consistent for data race-free programs (similar to the situation in C/C++), which is a weaker statement. As a result, most of the explicit memory barriers in the Cranelift implementation are unnecessary and should be covered by load-acquires and store-releases instead. In fact the paper suggests precisely that in the table of mappings in section 7.
Ignoring that issue for a moment, the code generated for atomic loads and particularly atomic stores seems problematic - either because the barriers are not in the right places or there isn't enough of them (cf. the table in the paper). For example, in the case of the atomic store the constituent store could be observed before another one preceding it in program order.

WebAssembly/threads

Threads and Atomics in WebAssembly. Contribute to WebAssembly/threads development by creating an account on GitHub.

WebAssembly/threads

Threads and Atomics in WebAssembly. Contribute to WebAssembly/threads development by creating an account on GitHub.

Anton Kirilov (Aug 28 2020 at 21:51):

Of course, using load-acquires and store-releases will reduce the size of the generated code as well.

Anton Kirilov (Aug 28 2020 at 22:00):

There are also a couple of minor issues - one of them is that the move in the middle of the loop generated for the Xchg variant of Inst::AtomicRMW is unnecessary.

Anton Kirilov (Aug 28 2020 at 22:21):

Another, admittedly nitpick, concerns the printing of the new Inst variants - most of the time we try to follow the disassembly specified by the architecture (though there are things that are out of its scope, e.g. literals). I understand that for Inst::AtomicRMW and Inst::AtomicCAS that's quite inconvenient, but at least in the case of Inst::AtomicLoad and Inst::AtomicStore it shouldn't be a problem, especially if we switch to the proper instructions.

Anton Kirilov (Aug 28 2020 at 23:19):

I also have a question about the Inst::AtomicCAS variant - based on the comments, it was a deliberate decision to do the masking inside the loop. Why? Furthermore, I can't see the reason for the move in the 64-bit case.

Julian Seward (Sep 07 2020 at 07:33):

@Anton Kirilov thanks for the info. Sorry for delayed response; I was on vacation last week.

Julian Seward (Sep 07 2020 at 07:35):

@Anton Kirilov ignoring for now the question of efficiency, you mention that "the code generated for atomic loads and particularly atomic stores seems problematic - either because the barriers are not in the right places or there isn't enough of them (cf. the table in the paper)."
I looked at the table (I assume you mean the table at the start of section 7) but fail to understand what you mean.
Can you say more precisely what the problem is, and how you think the barrier placement should be adjusted so as to fix it?

Anton Kirilov (Sep 07 2020 at 10:21):

Hello, since the code is using explicit memory barriers right now, it is essentially equivalent to the ARMv7 entries in the table. So, the requirement for atomic stores is to have a double barrier, i.e. a barrier both before and after the store. Right now Cranelift generates only a barrier after.

Anton Kirilov (Sep 14 2020 at 13:50):

@Julian Seward Just checking if you've seen my reply.

Julian Seward (Sep 14 2020 at 18:23):

@Anton Kirilov yes, thanks. Doing at least a minimal fix is on my todo list.

Sam Parker (Jul 15 2021 at 12:07):

FYI, I've now picked this up.

Last updated: Apr 08 2025 at 22:03 UTC