afonso360 opened PR #5503 from interp-atomics
to main
:
:wave: Hey,
This PR adds the
atomic_{load,store}
andfence
instructions to the interpreter. I'm not 100% sure if i got the atomicity guarantees right.It also adds these instructions to fuzzgen. It's done about 500k iterations on both AArch64 and x86.
afonso360 updated PR #5503 from interp-atomics
to main
.
bjorn3 created PR review comment:
I don't think you need any fences in the interpreter. The interpreter is running on a single OS thread, so it is by definition sequentially consistent already. If support for running clif ir that uses multiple threads is added, I would expect it to be done by keeping the interpreter running on a single OS thread, but interleave clif instructions from multiple interpreted threads on this single OS thread.
bjorn3 submitted PR review.
afonso360 created PR review comment:
Probably not, especially since we always do our own address translation there is no way to share an address anywhere outside the interpreter.
However I'd like to eventually add a mode where we allow the interpreter to run without address translation and interact with real addresses. Maybe we should only add the fences if/when we develop that capability?
afonso360 submitted PR review.
bjorn3 submitted PR review.
bjorn3 created PR review comment:
Maybe we should only add the fences if/when we develop that capability?
I think so.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
Let's replace this with
match opcode.format()
. Then the cases below are, in order:InstructionFormat::LoadNoOffset
,InstructionFormat::StoreNoOffset
,InstructionFormat::Store
,InstructionFormat::Load
. Any other format should panic withunimplemented!()
. That captures what we actually wanted to know about the opcode: we don't care whether it's atomic or whether it's a store, we just need to know how to construct the rightInstructionData
.This also highlights that the
offset
value returned fromgenerate_load_store_address
is unused for these atomic ops. I expect to get slightly better fuzzing results if we avoid generating a random number that we then ignore. How about returning amax_offset
instead and only doing theint_in_range
call in theStore
andLoad
format cases?
jameysharp created PR review comment:
If we're setting the
aligned
flag on the store, should we set it on the load too?
jameysharp created PR review comment:
What would you think of returning the result of
atomic_load
(v2
) directly and letting the test harness check equality, instead of relying onicmp
to generate correct code?
jameysharp created PR review comment:
It confused me that this parameter is called
max_offset
, because the function returns a value namedoffset
, but this parameter is not a maximum for that value.How about an
aligned: bool
parameter instead? The caller can decide whether to set thealigned
memflag, and inform this function of that choice; that would be good to test anyway. On aarch64 we'd force it to set that flag on atomic ops until #5483 gets fixed.The effect of this parameter would be something like this:
// stack_slot_with_size guarantees that slot_size >= min_size let max_offset = slot_size - min_size; let offset = if aligned { self.u.int_in_range(0..=max_offset / min_size)? * min_size } else { self.u.int_in_range(0..=max_offset) };
I'm assuming that stack slots are always sufficiently aligned, although I don't actually know if that's true.
jameysharp created PR review comment:
I guess the only thing we're checking is that compiling and executing a
fence
doesn't crash, which is good to check even if it isn't very exciting. Given that, do we gain much by returning a value?; Check that the fence instruction doesn't crash. Testing anything else would ; require multiple threads, which requires a runtime like Wasmtime. function %fence() { block0: fence return } ; run: %fence()
jameysharp created PR review comment:
Pre-existing, but I believe this comment is wrong at this point: there are always
min_size
valid bytes at the base+offset returned by this function. If I remember correctly the comment was because we used to sometimes deliberately generate invalidheap_addr
instructions, right?
jameysharp created PR review comment:
I agree, let's leave multi-thread memory consistency out of the interpreter. I think it's worth a comment in each of those places noting that they aren't actually thread-safe though.
jameysharp created PR review comment:
There aren't any i128 tests here. Is the
enable_llvm_abi_extensions
flag necessary?
afonso360 submitted PR review.
afonso360 created PR review comment:
Unfortunately we don't yet support having a run test that just runs we always need a comparison operator.
jameysharp created PR review comment:
Ah, I see. Okay, I suggest taking my version of the comment, but the test itself is fine then.
jameysharp submitted PR review.
afonso360 submitted PR review.
afonso360 created PR review comment:
Yeah I think so, but we probably don't do that anymore. I'm going to remove it.
afonso360 edited PR review comment.
afonso360 created PR review comment:
Those are great suggestions, Thanks!
afonso360 submitted PR review.
afonso360 submitted PR review.
afonso360 created PR review comment:
Yeah, I wasn't a great fan of the
max_offset
thing, but wasn't able to think of anything else at the time. But with the aligned flag and returning the max offset it turned out way better! Thanks! It's also really nice that we can neatly fit in some MemFlags testing as well.I've had to change this so that we always align on 8 byte boundaries instead of
min_size
since it would cause unaligned addresses withi8
andi16
.I'm assuming that stack slots are always sufficiently aligned, although I don't actually know if that's true.
At least for AArch64 we are aligning them at 16 byte boundaries, but I'm not sure what the requirements are for other arches.
afonso360 updated PR #5503 from interp-atomics
to main
.
jameysharp submitted PR review.
jameysharp submitted PR review.
jameysharp created PR review comment:
With the
max_offset
parameter removed, the comment should go away too. Also there's a minor typo./// `aligned`: When passed as true, the resulting address is guaranteed to be aligned /// on an 8 byte boundary.
jameysharp created PR review comment:
I've had to change this so that we always align on 8 byte boundaries instead of
min_size
since it would cause unaligned addresses withi8
andi16
.Oh, why do smaller values need more than their natural alignment? I'd expect
i16
to work at any even address, for instance. And in the other direction, don'ti128
values need 16-byte alignment?
jameysharp submitted PR review.
afonso360 updated PR #5503 from interp-atomics
to main
.
afonso360 submitted PR review.
afonso360 created PR review comment:
For some reason I had the idea that we would need a minimum of 8 byte alignment. But I've been reading the alignment requirements (for AArch64 atomics) and they do mention that it should be a multiple of the access size.
afonso360 updated PR #5503 from interp-atomics
to main
.
afonso360 edited PR review comment.
jameysharp submitted PR review.
jameysharp created PR review comment:
This comment isn't quite right now but I'm going to merge this now anyway, because this PR is cool and I want it. Next time you happen to be working on this file maybe you can update the comment. I don't think it's worth a separate PR.
jameysharp merged PR #5503.
Last updated: Dec 23 2024 at 13:07 UTC