afonso360 opened issue #5033:
:wave: Hey,
This is a follow up to a discussion we recently had about I-Cache maintenance. I'm filing this as a separate issue since I would prefer to deal with this in a different PR than that one.
That PR cleans up our interface for I-Cache maintenance in a way that's hopefully not too AArch64 specific. In the process we noticed that RISC-V does not perform any I-Cache maintenance actions, despite the architecture not guaranteeing that it is cache coherent.
For AArch64 we do a
CORE_SYNC
membarrier, but from my look at the kernel docs it looks like this operation is not yet available for RISC-V. They have a custom syscall that seems to do exactly what we want, but we should double check.
afonso360 edited issue #5033:
:wave: Hey,
This is a follow up to a discussion we recently had about I-Cache maintenance. I'm filing this as a separate issue since I would prefer to deal with this in a different PR than that one.
That PR cleans up our interface for I-Cache maintenance in a way that's hopefully not too AArch64 specific. In the process we noticed that RISC-V does not perform any I-Cache maintenance actions, despite the architecture not guaranteeing that it is cache coherent.
For AArch64 we do a
CORE_SYNC
membarrier, but from my look at the kernel docs it looks like this operation is not yet available for RISC-V. They have a custom syscall that seems to do exactly what we want, but we should double check.
afonso360 edited issue #5033:
:wave: Hey,
This is a follow up to a discussion we recently had about I-Cache maintenance. I'm filing this as a separate issue since I would prefer to deal with this in a different PR than that one.
That PR cleans up our interface for I-Cache maintenance in a way that's hopefully not too AArch64 specific. In the process we noticed that RISC-V does not perform any I-Cache maintenance actions, despite the architecture not guaranteeing that it is cache coherent.
For AArch64 we do a
CORE_SYNC
membarrier + amprotect
call (as a temporary solution), but from my look at the kernel docs it looks like this operation is not yet available for RISC-V. They have a custom syscall that seems to do exactly what we want, but we should double check.
afonso360 edited issue #5033:
:wave: Hey,
This is a follow up to a discussion we recently had about I-Cache maintenance. I'm filing this as a separate issue since I would prefer to deal with this in a different PR than that one.
That PR cleans up our interface for I-Cache maintenance in a way that's hopefully not too AArch64 specific. In the process we noticed that RISC-V does not perform any I-Cache maintenance actions, despite the architecture not guaranteeing that it is cache coherent.
For AArch64 we do a
CORE_SYNC
membarrier and aclear_cache
(some asterisks here, see the PR discussion), but from my look at the kernel docs it looks like this operation is not yet available for RISC-V. They have a custom syscall that seems to do exactly what we want, but we should double check.
afonso360 edited issue #5033:
:wave: Hey,
This is a follow up to a discussion we recently had about I-Cache maintenance. I'm filing this as a separate issue since I would prefer to deal with this in a different PR than that one.
That PR cleans up our interface for I-Cache maintenance in a way that's hopefully not too AArch64 specific. In the process we noticed that RISC-V does not perform any I-Cache maintenance actions, despite the architecture not guaranteeing that it is cache coherent.
For AArch64 we do a
CORE_SYNC
membarrier and aclear_cache
(some asterisks here, see the PR discussion), but from my look at the kernel docs it looks likeCORE_SYNC
is not yet available for RISC-V. They have a custom syscall that seems to do theclear_cache
part, but we should double check what other guarantees it provides us.
afonso360 edited issue #5033:
:wave: Hey,
This is a follow up to a discussion we recently had about I-Cache maintenance. I'm filing this as a separate issue since I would prefer to deal with this in a different PR than that one.
That PR cleans up our interface for I-Cache maintenance in a way that's hopefully not too AArch64 specific. In the process we noticed that RISC-V does not perform any I-Cache maintenance actions, despite the architecture not guaranteeing that it is cache coherent.
For AArch64 we do a
CORE_SYNC
membarrier and aclear_cache
(some asterisks here, see the PR discussion), but from my look at the kernel docs it looks likeCORE_SYNC
is not yet available for RISC-V. They have a custom syscall that seems to do the all-coreclear_cache
part, but we should double check what other guarantees it provides us.
yuyang-ok commented on issue #5033:
@afonso360
riscv64
have the instructionfence.i
which is not called right now.
yuyang-ok commented on issue #5033:
looks like if we execute
fence.i
on riscv,we are done. Right? @afonso360
yuyang-ok edited a comment on issue #5033:
looks like if we execute
fence.i
on riscv after compile to code,we are done. Right? @afonso360
yuyang-ok edited a comment on issue #5033:
looks like if we execute
fence.i
on riscv after compile the code,we are done. Right? @afonso360
yuyang-ok edited a comment on issue #5033:
looks like if we execute
fence.i
on riscv after compile the code,we are done. Right? @afonso360
afonso360 commented on issue #5033:
Well sort of, yes, maybe. At a minimum we need to use the broadcast
fence.i
that executes that instruction on all threads for the current process (i.e. that custom syscall above). Also, that syscall is only guaranteed to clear the icache for a region of memory (despite being implemented as a globalfence.i
). That maps fairly well into theclear_cache
interface that we now have after #4997.But additionally the unresolved question is, does flushing the icache guarantee that it also flushes the pipeline?
On AArch64 it does not, thus we need an additional action, but maybe on RISC-V it does?
Reading the ISA manual for
fence.i
A more complex implementation might snoop the instruction (data) cache on every data (instruction) cache miss, or use an inclusive unified private L2 cache to invalidate lines from the primary instruction cache when they are being written by a local store instruction. If instruction and data caches are kept coherent in this way, or if the memory system consists of only uncached RAMs, then just the fetch pipeline needs to be flushed at a FENCE.I.
(emphasis mine). I don't think this section is normative, but it indicates that that's the expected behaviour. That helps us with the current implementation. Calling the above syscall will almost certainly do everything that we want.
However that syscall itself does not guarantee that it flushes the pipeline since a possible future
fence.i
instruction may not behave this way. (i.e. when they introduce afence.i
for just an address range, maybe the pipeline flush won't be done there)To my reading the RVWMO Memory Consistency Model does not clarify this, although I can barely understand it so.... ¯\\\_(ツ)\_/¯
TL;DR: We will almost certainly be fine calling the custom
riscv_clear_cache
sycall for now, and perhaps future revisions to the ISA manual will clarify this and we can update our implementation then
afonso360 edited a comment on issue #5033:
Well sort of, yes, maybe. At a minimum we need to use the broadcast
fence.i
that executes that instruction on all threads for the current process (i.e. that custom syscall above). Also, that syscall is only guaranteed to clear the icache for a region of memory (despite being implemented as a globalfence.i
). That maps fairly well into theclear_cache
interface that we now have after #4997.But additionally the unresolved question is, does flushing the icache guarantee that it also flushes the pipeline?
On AArch64 it does not, thus we need an additional action, but maybe on RISC-V it does?
Reading the ISA manual for
fence.i
A more complex implementation might snoop the instruction (data) cache on every data (instruction) cache miss, or use an inclusive unified private L2 cache to invalidate lines from the primary instruction cache when they are being written by a local store instruction. If instruction and data caches are kept coherent in this way, or if the memory system consists of only uncached RAMs, then just the fetch pipeline needs to be flushed at a FENCE.I.
(emphasis mine). I don't think this section is normative, but it indicates that that's the expected behaviour. That helps us with the current implementation. Calling the above syscall will almost certainly do everything that we want.
However that syscall itself does not guarantee that it flushes the pipeline since a possible future
fence.i
instruction may not behave this way. (i.e. when they introduce afence.i
for just an address range, maybe the pipeline flush won't be done there)To fix the above we would need a guarantee that a pipeline flush would always accompany a icache flush and to my reading the RVWMO Memory Consistency Model does not clarify this, although I can barely understand it so.... ¯\\\_(ツ)\_/¯
TL;DR: We will almost certainly be fine calling the custom
riscv_clear_cache
sycall for now, and perhaps future revisions to the ISA manual will clarify this and we can update our implementation then
afonso360 edited a comment on issue #5033:
Well sort of, yes, maybe. At a minimum we need to use the broadcast
fence.i
that executes that instruction on all threads for the current process (i.e. that custom syscall above). Also, that syscall is only guaranteed to clear the icache for a region of memory (despite being implemented as a globalfence.i
). That maps fairly well onto theclear_cache
interface that we now have after #4997.But additionally the unresolved question is, does flushing the icache guarantee that it also flushes the pipeline?
On AArch64 it does not, thus we need an additional action, but maybe on RISC-V it does?
Reading the ISA manual for
fence.i
A more complex implementation might snoop the instruction (data) cache on every data (instruction) cache miss, or use an inclusive unified private L2 cache to invalidate lines from the primary instruction cache when they are being written by a local store instruction. If instruction and data caches are kept coherent in this way, or if the memory system consists of only uncached RAMs, then just the fetch pipeline needs to be flushed at a FENCE.I.
(emphasis mine). I don't think this section is normative, but it indicates that that's the expected behaviour. That helps us with the current implementation. Calling the above syscall will almost certainly do everything that we want.
However that syscall itself does not guarantee that it flushes the pipeline since a possible future
fence.i
instruction may not behave this way. (i.e. when they introduce afence.i
for just an address range, maybe the pipeline flush won't be done there)To fix the above we would need a guarantee that a pipeline flush would always accompany a icache flush and to my reading the RVWMO Memory Consistency Model does not clarify this, although I can barely understand it so.... ¯\\\_(ツ)\_/¯
TL;DR: We will almost certainly be fine calling the custom
riscv_clear_cache
sycall for now, and perhaps future revisions to the ISA manual will clarify this and we can update our implementation then
afonso360 edited a comment on issue #5033:
Well sort of, yes, maybe. At a minimum we need to use the broadcast
fence.i
that executes that instruction on all threads for the current process (i.e. that custom syscall above). Also, that syscall is only guaranteed to clear the icache for a region of memory (despite being implemented as a globalfence.i
). That maps fairly well onto theclear_cache
interface that we now have after #4997.But additionally the unresolved question is, does flushing the icache guarantee that it also flushes the pipeline?
On AArch64 it does not, thus we need an additional action, but maybe on RISC-V it does?
Reading the ISA manual for
fence.i
A more complex implementation might snoop the instruction (data) cache on every data (instruction) cache miss, or use an inclusive unified private L2 cache to invalidate lines from the primary instruction cache when they are being written by a local store instruction. If instruction and data caches are kept coherent in this way, or if the memory system consists of only uncached RAMs, then just the fetch pipeline needs to be flushed at a FENCE.I.
(emphasis mine). I don't think this section is normative, but it indicates that that's the expected behaviour. That helps us with the current implementation. Calling the above syscall will almost certainly do everything that we want.
However that syscall itself does not guarantee that it flushes the pipeline since a possible future
fence.i
instruction may not behave this way. (i.e. when they introduce afence.i
for just an address range, maybe the pipeline flush won't be done there)To fix the above we would need a guarantee that a pipeline flush would always accompany a icache flush and to my reading the RVWMO Memory Consistency Model does not clarify this, although I can barely understand it so.... ¯\\\_(ツ)\_/¯
TL;DR: We will almost certainly be fine calling the custom
riscv_flush_icache
sycall for now, and perhaps future revisions to the ISA manual will clarify this and we can update our implementation then
bjorn3 commented on issue #5033:
As I understand it the fence.i instruction will always be the correct way to handle modification of code. Depending on the cpu it may only flush the pipeline (if the instruction cache is coherent with the data cache) or also flush the instruction cache (if it isn't coherent), but the end result is the same: The instruction cache is brought in sync with the data cache and the pipeline is flushed.
As for the instruction cache flush syscall being guaranteed to flush the pipeline, I'm not sure.
afonso360 commented on issue #5033:
Yeah, you're right, it would probably be a different instruction if those semantics would change.
afonso360 edited a comment on issue #5033:
Yeah, you're right, it would probably be a different instruction if those semantics were to change.
yuyang-ok commented on issue #5033:
We will almost certainly be fine calling the custom riscv_flush_icache sycall for now, and perhaps future revisions to the ISA manual will clarify this and we can update our implementation then
I think we can trust the linux implmentation.
yuyang-ok edited a comment on issue #5033:
We will almost certainly be fine calling the custom riscv_flush_icache sycall for now, and perhaps future revisions to the ISA manual will clarify this and we can update our implementation then
I think we can trust the linux implementation.
yuyang-ok commented on issue #5033:
@afonso360 https://github.com/bytecodealliance/wasmtime/pull/4997 is not merged,I think maybe I can begin with this after merge.
afonso360 commented on issue #5033:
Yeah, sure! I'm still waiting for a re-review on that, but it shouldn't take too long.
yuyang-ok commented on issue #5033:
@afonso360 I cannot find the constant.
diff --git a/crates/jit-icache-coherence/src/libc.rs b/crates/jit-icache-coherence/src/libc.rs
index 6ea9cea08..17f37a7c5 100644
--- a/crates/jit-icache-coherence/src/libc.rs
+++ b/crates/jit-icache-coherence/src/libc.rs
@@ -60,7 +60,18 @@ pub(crate) fn pipeline_flush_mt() -> Result<()> {
// In any other case we got an actual error, so lets propagate that up
e => e?,
}
-
yuyang-ok commented on issue #5033:
@afonso360 I don't see
clear_cache
on riscv64, I thinkclear_cache
will do all the job.
afonso360 commented on issue #5033:
Right, I also think the
flush_icache
syscall will work. However, that syscall takes three arguments which we need to pass. It currently doesn't do anything with them but we still need to pass them.Another thing that it would be nice if we could do is call the vDSO implementation which would be lot faster on single core systems since it would just call the
flush.i
instruction and be done with it (I'm not sure how hard it is to do this, so feel free to disregard it).
afonso360 edited a comment on issue #5033:
Right, I also think the
flush_icache
syscall will work. However, that syscall takes three arguments which we need to pass. It currently doesn't do anything with them but we still need to pass them.It would also be nice if we could call the vDSO implementation which would be lot faster on single core systems since it would just call the
flush.i
instruction and be done with it (I'm not sure how hard it is to do this, so feel free to disregard it).
afonso360 edited a comment on issue #5033:
Right, I also think the
flush_icache
syscall will work. However, that syscall takes three arguments which we need to pass. It currently doesn't do anything with them but we still need to pass them.It would also be nice if we could call the vDSO implementation which would be lot faster on single core systems since it would just call the
fence.i
instruction and be done with it (I'm not sure how hard it is to do this, so feel free to disregard it).
yuyang-ok commented on issue #5033:
@afonso360 It is wired
SYS_RISCV_FLUSH_ICACHE_ALL
is same asSYS_RISCV_FLUSH_ICACHE_LOCAL
.https://github.com/torvalds/linux/search?q=SYS_RISCV_FLUSH_ICACHE_ALL
Last updated: Jan 24 2025 at 00:11 UTC