playXE opened Issue #1598:
Add Volatile flag to MemFlags.
Benefit
When some optimization pass knows about load/store type (is it volatile or not?) then it's possible to do a lot of optimizations. One of it is memory to register promotion. I tried to implement mem2reg pass there cranelift-mem2reg and this pass works but in some cases it fails e.g this simple C code:
// compiled with rcc typedef struct Foo { int x; int y; } Foo; int main() { Foo foo; foo.y = 42; }Message: definition error: Compilation error: Verifier errors note: while compiling function u0:0() -> i32 system_v { ss0 = explicit_slot 8 block0: v1 = iconst.i64 4 v2 = iadd.i64 v0, v1 v3 = iconst.i32 42 store v3, v2 v4 = iconst.i32 0 return v4 }All loads and stores might be marked as volatile and mem2reg will not optimize
struct Foo
into registers.
Volatile flag also can help in SRoA when it's possible to replace allocation on stack with single 32 or 64 bit register.Implementation
Just add additional enum case in
codegen/ir/memflags.rs
and add 3 simple functions to MemFlags:volatile() -> Self,is_volatile() -> bool, set_volatile(&mut self)
.Alternatives
I do not see any alternatives.
playXE edited Issue #1598:
Add Volatile flag to MemFlags.
Benefit
When some optimization pass knows about load/store type (is it volatile or not?) then it's possible to do a lot of optimizations. One of it is memory to register promotion. I tried to implement mem2reg pass there cranelift-mem2reg and this pass works but in some cases it fails e.g this simple C code:
// compiled with rcc typedef struct Foo { int x; int y; } Foo; int main() { Foo foo; foo.y = 42; }Message: definition error: Compilation error: Verifier errors note: while compiling function u0:0() -> i32 system_v { ss0 = explicit_slot 8 block0: v1 = iconst.i64 4 v2 = iadd.i64 v0, v1 v3 = iconst.i32 42 store v3, v2 v4 = iconst.i32 0 return v4 }All loads and stores might be marked as volatile and mem2reg will not optimize
struct Foo
into registers.
Volatile flag also can help in SRoA when it's possible to replace allocation on stack with single 32 or 64 bit register.Implementation
Just add additional enum case in
codegen/ir/memflags.rs
and add 3 simple functions to MemFlags:volatile() -> Self,is_volatile() -> bool, set_volatile(&mut self)
.Alternatives
I do not see any alternatives.
UPD: This might be useful for this issue: Support volatile store/loads
bjorn3 commented on Issue #1598:
FYI I implemented something very similar to your mem2reg in cg_clif: https://github.com/bjorn3/rustc_codegen_cranelift/blob/169140506366d5aa96967a71b6af98563a9c2988/src/optimize/stack2reg.rs
UPD: This might be useful for this issue: Support volatile store/loads
Implementing that issue would almosy definitely be done by adding a volatile flag to MemFlags as this issue suggests.
bjorn3 edited a comment on Issue #1598:
FYI I implemented something very similar to your mem2reg in cg_clif: https://github.com/bjorn3/rustc_codegen_cranelift/blob/169140506366d5aa96967a71b6af98563a9c2988/src/optimize/stack2reg.rs
UPD: This might be useful for this issue: Support volatile store/loads
Implementing that issue would almost definitely be done by adding a volatile flag to MemFlags as this issue suggests.
playXE commented on Issue #1598:
@bjorn3 your implementation looks much better than mine, but issue still remains, it is impossible to do mem2reg pass safely without volatile loads and stores :(
bjorn3 commented on Issue #1598:
I have been thinking if we should maybe make
MemFlags
some opaque number just likeSourceLoc
and then pass aMemoryModel
to the compile function, that uses the number to determine what kind of optimizations are allowed aroundload
andstore
. ThatMemoryModel
is then responsible for aliasing, volatility, atomic orderings, ... Another similar way would be to pre-compute all memory dependencies and store that in the function, like @sunfishcode suggested in https://youtu.be/9OIA7DTFQWU beginning at 36:40.
tschneidereit labeled Issue #1598:
Add Volatile flag to MemFlags.
Benefit
When some optimization pass knows about load/store type (is it volatile or not?) then it's possible to do a lot of optimizations. One of it is memory to register promotion. I tried to implement mem2reg pass there cranelift-mem2reg and this pass works but in some cases it fails e.g this simple C code:
// compiled with rcc typedef struct Foo { int x; int y; } Foo; int main() { Foo foo; foo.y = 42; }Message: definition error: Compilation error: Verifier errors note: while compiling function u0:0() -> i32 system_v { ss0 = explicit_slot 8 block0: v1 = iconst.i64 4 v2 = iadd.i64 v0, v1 v3 = iconst.i32 42 store v3, v2 v4 = iconst.i32 0 return v4 }All loads and stores might be marked as volatile and mem2reg will not optimize
struct Foo
into registers.
Volatile flag also can help in SRoA when it's possible to replace allocation on stack with single 32 or 64 bit register.Implementation
Just add additional enum case in
codegen/ir/memflags.rs
and add 3 simple functions to MemFlags:volatile() -> Self,is_volatile() -> bool, set_volatile(&mut self)
.Alternatives
I do not see any alternatives.
UPD: This might be useful for this issue: Support volatile store/loads
tschneidereit labeled Issue #1598:
Add Volatile flag to MemFlags.
Benefit
When some optimization pass knows about load/store type (is it volatile or not?) then it's possible to do a lot of optimizations. One of it is memory to register promotion. I tried to implement mem2reg pass there cranelift-mem2reg and this pass works but in some cases it fails e.g this simple C code:
// compiled with rcc typedef struct Foo { int x; int y; } Foo; int main() { Foo foo; foo.y = 42; }Message: definition error: Compilation error: Verifier errors note: while compiling function u0:0() -> i32 system_v { ss0 = explicit_slot 8 block0: v1 = iconst.i64 4 v2 = iadd.i64 v0, v1 v3 = iconst.i32 42 store v3, v2 v4 = iconst.i32 0 return v4 }All loads and stores might be marked as volatile and mem2reg will not optimize
struct Foo
into registers.
Volatile flag also can help in SRoA when it's possible to replace allocation on stack with single 32 or 64 bit register.Implementation
Just add additional enum case in
codegen/ir/memflags.rs
and add 3 simple functions to MemFlags:volatile() -> Self,is_volatile() -> bool, set_volatile(&mut self)
.Alternatives
I do not see any alternatives.
UPD: This might be useful for this issue: Support volatile store/loads
github-actions[bot] commented on Issue #1598:
Subscribe to Label Action
cc @bnjbvr
<details>
This issue or pull request has been labeled: "cranelift"Thus the following users have been cc'd because of the following labels:
- bnjbvr: cranelift
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
sunfishcode commented on Issue #1598:
Giving frontends a way to provide explicit memory dependencies, and using that to implement volatile semantics would be cool. If anyone's interested in a designing a system to do this, it'd be good to start with a design document so we can look at how it'll impact compile times in fast-compilation use cases, and other tradeoffs involved.
We may want to do something simpler for now that just addresses volatile, and I have a possible alternative. Traditional volatile flags on regular memory accesses turn out to be very error prone. The recommended solution is to transform volatile into a call or separate opcode, and optionally inline the call after all optimizations. It's now how it systematically eliminates a common class of volatile bugs, however it does add complexity in the backend. So instead, how about the following design:
Add new two instructions to the IR:
-
y = volatile_unary(x)
- observe the value ofx
, do side effects, and return a new value
-volatile_rmw(p)
- observe the value of*p
, do side effects, and then clobber*p
.At a very late stage,
volatile_transform
would be translated as if it were an identity operation, andvolatile_rmw
would be translated as if it were a no-op. With those, frontends could then translate a volatile load like this:volatile_rmw(p) x0 = load(p) x = volatile_unary(x0)and a volatile store like this:
x0 = volatile_unary(x) store(p, x0) volatile_rmw(p)That is, we'd use plain load and store instructions, but boxed in with volatile operations, both in control flow and data flow, to prevent any code motion or redundancy elimination. It'll get chatty, especially in the volatile-atomic case, but volatile is rare so it seems like an ok tradeoff. This should let most of the compiler automatically do the right thing, at low cost in overall complexity. And,
volatile_rmw
andvolatile_transform
would be handy besides for crafting compiler unit tests :-).
jyn514 commented on Issue #1598:
The recommended solution is to transform volatile into a call or separate opcode, and optionally inline the call after all optimizations.
Sort of like a memory fence? It marks this load/store as unable to be removed? That sounds reasonable to me.
I can't quite run csmith yet, but as soon as I do I'm happy to test out the combination of mem2reg and volatile to make sure there's no misoptimization.
make MemFlags some opaque number just like SourceLoc and then pass a MemoryModel to the compile function
I don't know that I like the idea of more indirection. If Cranelift uses opaque IDs for these flags, it makes it much harder to find the intended purpose - it's in an entirely separate part of the documentation. I also don't see the benefit over normal MemFlags.
bjorn3 commented on Issue #1598:
I also don't see the benefit over normal MemFlags.
The benefit is that the
MemoryModel
can assign arbitrary meanings to it, like this load can only alias with these other loads to prevent a stacked borrows violation. My proposal is that all properties of loads and stores are handled by theMemoryModel
, so that you can easily switch between a C memory model, a Java memory model, stacked borrows and any other memory model without changing Cranelift itself.Also note that I didn't mean to assign a property like
volatile
to a specific number, but instead letMemoryModel
store a map between the number and any relevant properties.
Last updated: Nov 22 2024 at 16:03 UTC