Stream: git-wasmtime

Topic: wasmtime / issue #5033 Perform I-Cache Maintenance on RISC-V


view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 10:25):

afonso360 opened issue #5033:

:wave: Hey,

This is a follow up to a discussion we recently had about I-Cache maintenance. I'm filing this as a separate issue since I would prefer to deal with this in a different PR than that one.

That PR cleans up our interface for I-Cache maintenance in a way that's hopefully not too AArch64 specific. In the process we noticed that RISC-V does not perform any I-Cache maintenance actions, despite the architecture not guaranteeing that it is cache coherent.

For AArch64 we do a CORE_SYNC membarrier, but from my look at the kernel docs it looks like this operation is not yet available for RISC-V. They have a custom syscall that seems to do exactly what we want, but we should double check.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 10:25):

afonso360 edited issue #5033:

:wave: Hey,

This is a follow up to a discussion we recently had about I-Cache maintenance. I'm filing this as a separate issue since I would prefer to deal with this in a different PR than that one.

That PR cleans up our interface for I-Cache maintenance in a way that's hopefully not too AArch64 specific. In the process we noticed that RISC-V does not perform any I-Cache maintenance actions, despite the architecture not guaranteeing that it is cache coherent.

For AArch64 we do a CORE_SYNC membarrier, but from my look at the kernel docs it looks like this operation is not yet available for RISC-V. They have a custom syscall that seems to do exactly what we want, but we should double check.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 10:52):

afonso360 edited issue #5033:

:wave: Hey,

This is a follow up to a discussion we recently had about I-Cache maintenance. I'm filing this as a separate issue since I would prefer to deal with this in a different PR than that one.

That PR cleans up our interface for I-Cache maintenance in a way that's hopefully not too AArch64 specific. In the process we noticed that RISC-V does not perform any I-Cache maintenance actions, despite the architecture not guaranteeing that it is cache coherent.

For AArch64 we do a CORE_SYNC membarrier + a mprotect call (as a temporary solution), but from my look at the kernel docs it looks like this operation is not yet available for RISC-V. They have a custom syscall that seems to do exactly what we want, but we should double check.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 10:52):

afonso360 edited issue #5033:

:wave: Hey,

This is a follow up to a discussion we recently had about I-Cache maintenance. I'm filing this as a separate issue since I would prefer to deal with this in a different PR than that one.

That PR cleans up our interface for I-Cache maintenance in a way that's hopefully not too AArch64 specific. In the process we noticed that RISC-V does not perform any I-Cache maintenance actions, despite the architecture not guaranteeing that it is cache coherent.

For AArch64 we do a CORE_SYNC membarrier and a clear_cache (some asterisks here, see the PR discussion), but from my look at the kernel docs it looks like this operation is not yet available for RISC-V. They have a custom syscall that seems to do exactly what we want, but we should double check.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 10:54):

afonso360 edited issue #5033:

:wave: Hey,

This is a follow up to a discussion we recently had about I-Cache maintenance. I'm filing this as a separate issue since I would prefer to deal with this in a different PR than that one.

That PR cleans up our interface for I-Cache maintenance in a way that's hopefully not too AArch64 specific. In the process we noticed that RISC-V does not perform any I-Cache maintenance actions, despite the architecture not guaranteeing that it is cache coherent.

For AArch64 we do a CORE_SYNC membarrier and a clear_cache (some asterisks here, see the PR discussion), but from my look at the kernel docs it looks like CORE_SYNC is not yet available for RISC-V. They have a custom syscall that seems to do the clear_cache part, but we should double check what other guarantees it provides us.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 07 2022 at 11:41):

afonso360 edited issue #5033:

:wave: Hey,

This is a follow up to a discussion we recently had about I-Cache maintenance. I'm filing this as a separate issue since I would prefer to deal with this in a different PR than that one.

That PR cleans up our interface for I-Cache maintenance in a way that's hopefully not too AArch64 specific. In the process we noticed that RISC-V does not perform any I-Cache maintenance actions, despite the architecture not guaranteeing that it is cache coherent.

For AArch64 we do a CORE_SYNC membarrier and a clear_cache (some asterisks here, see the PR discussion), but from my look at the kernel docs it looks like CORE_SYNC is not yet available for RISC-V. They have a custom syscall that seems to do the all-core clear_cache part, but we should double check what other guarantees it provides us.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 02:40):

yuyang-ok commented on issue #5033:

@afonso360 riscv64 have the instruction fence.i which is not called right now.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 08:35):

yuyang-ok commented on issue #5033:

looks like if we execute fence.i on riscv,we are done. Right? @afonso360

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 09:12):

yuyang-ok edited a comment on issue #5033:

looks like if we execute fence.i on riscv after compile to code,we are done. Right? @afonso360

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 09:12):

yuyang-ok edited a comment on issue #5033:

looks like if we execute fence.i on riscv after compile the code,we are done. Right? @afonso360

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 09:13):

yuyang-ok edited a comment on issue #5033:

looks like if we execute fence.i on riscv after compile the code,we are done. Right? @afonso360

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 09:44):

afonso360 commented on issue #5033:

Well sort of, yes, maybe. At a minimum we need to use the broadcast fence.i that executes that instruction on all threads for the current process (i.e. that custom syscall above). Also, that syscall is only guaranteed to clear the icache for a region of memory (despite being implemented as a global fence.i). That maps fairly well into the clear_cache interface that we now have after #4997.

But additionally the unresolved question is, does flushing the icache guarantee that it also flushes the pipeline?

On AArch64 it does not, thus we need an additional action, but maybe on RISC-V it does?

Reading the ISA manual for fence.i

A more complex implementation might snoop the instruction (data) cache on every data (instruction) cache miss, or use an inclusive unified private L2 cache to invalidate lines from the primary instruction cache when they are being written by a local store instruction. If instruction and data caches are kept coherent in this way, or if the memory system consists of only uncached RAMs, then just the fetch pipeline needs to be flushed at a FENCE.I.

(emphasis mine). I don't think this section is normative, but it indicates that that's the expected behaviour. That helps us with the current implementation. Calling the above syscall will almost certainly do everything that we want.

However that syscall itself does not guarantee that it flushes the pipeline since a possible future fence.i instruction may not behave this way. (i.e. when they introduce a fence.i for just an address range, maybe the pipeline flush won't be done there)

To my reading the RVWMO Memory Consistency Model does not clarify this, although I can barely understand it so.... ¯\\\_(ツ)\_/¯


TL;DR: We will almost certainly be fine calling the custom riscv_clear_cache sycall for now, and perhaps future revisions to the ISA manual will clarify this and we can update our implementation then

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 09:54):

afonso360 edited a comment on issue #5033:

Well sort of, yes, maybe. At a minimum we need to use the broadcast fence.i that executes that instruction on all threads for the current process (i.e. that custom syscall above). Also, that syscall is only guaranteed to clear the icache for a region of memory (despite being implemented as a global fence.i). That maps fairly well into the clear_cache interface that we now have after #4997.

But additionally the unresolved question is, does flushing the icache guarantee that it also flushes the pipeline?

On AArch64 it does not, thus we need an additional action, but maybe on RISC-V it does?

Reading the ISA manual for fence.i

A more complex implementation might snoop the instruction (data) cache on every data (instruction) cache miss, or use an inclusive unified private L2 cache to invalidate lines from the primary instruction cache when they are being written by a local store instruction. If instruction and data caches are kept coherent in this way, or if the memory system consists of only uncached RAMs, then just the fetch pipeline needs to be flushed at a FENCE.I.

(emphasis mine). I don't think this section is normative, but it indicates that that's the expected behaviour. That helps us with the current implementation. Calling the above syscall will almost certainly do everything that we want.

However that syscall itself does not guarantee that it flushes the pipeline since a possible future fence.i instruction may not behave this way. (i.e. when they introduce a fence.i for just an address range, maybe the pipeline flush won't be done there)

To fix the above we would need a guarantee that a pipeline flush would always accompany a icache flush and to my reading the RVWMO Memory Consistency Model does not clarify this, although I can barely understand it so.... ¯\\\_(ツ)\_/¯


TL;DR: We will almost certainly be fine calling the custom riscv_clear_cache sycall for now, and perhaps future revisions to the ISA manual will clarify this and we can update our implementation then

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 10:04):

afonso360 edited a comment on issue #5033:

Well sort of, yes, maybe. At a minimum we need to use the broadcast fence.i that executes that instruction on all threads for the current process (i.e. that custom syscall above). Also, that syscall is only guaranteed to clear the icache for a region of memory (despite being implemented as a global fence.i). That maps fairly well onto the clear_cache interface that we now have after #4997.

But additionally the unresolved question is, does flushing the icache guarantee that it also flushes the pipeline?

On AArch64 it does not, thus we need an additional action, but maybe on RISC-V it does?

Reading the ISA manual for fence.i

A more complex implementation might snoop the instruction (data) cache on every data (instruction) cache miss, or use an inclusive unified private L2 cache to invalidate lines from the primary instruction cache when they are being written by a local store instruction. If instruction and data caches are kept coherent in this way, or if the memory system consists of only uncached RAMs, then just the fetch pipeline needs to be flushed at a FENCE.I.

(emphasis mine). I don't think this section is normative, but it indicates that that's the expected behaviour. That helps us with the current implementation. Calling the above syscall will almost certainly do everything that we want.

However that syscall itself does not guarantee that it flushes the pipeline since a possible future fence.i instruction may not behave this way. (i.e. when they introduce a fence.i for just an address range, maybe the pipeline flush won't be done there)

To fix the above we would need a guarantee that a pipeline flush would always accompany a icache flush and to my reading the RVWMO Memory Consistency Model does not clarify this, although I can barely understand it so.... ¯\\\_(ツ)\_/¯


TL;DR: We will almost certainly be fine calling the custom riscv_clear_cache sycall for now, and perhaps future revisions to the ISA manual will clarify this and we can update our implementation then

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 10:08):

afonso360 edited a comment on issue #5033:

Well sort of, yes, maybe. At a minimum we need to use the broadcast fence.i that executes that instruction on all threads for the current process (i.e. that custom syscall above). Also, that syscall is only guaranteed to clear the icache for a region of memory (despite being implemented as a global fence.i). That maps fairly well onto the clear_cache interface that we now have after #4997.

But additionally the unresolved question is, does flushing the icache guarantee that it also flushes the pipeline?

On AArch64 it does not, thus we need an additional action, but maybe on RISC-V it does?

Reading the ISA manual for fence.i

A more complex implementation might snoop the instruction (data) cache on every data (instruction) cache miss, or use an inclusive unified private L2 cache to invalidate lines from the primary instruction cache when they are being written by a local store instruction. If instruction and data caches are kept coherent in this way, or if the memory system consists of only uncached RAMs, then just the fetch pipeline needs to be flushed at a FENCE.I.

(emphasis mine). I don't think this section is normative, but it indicates that that's the expected behaviour. That helps us with the current implementation. Calling the above syscall will almost certainly do everything that we want.

However that syscall itself does not guarantee that it flushes the pipeline since a possible future fence.i instruction may not behave this way. (i.e. when they introduce a fence.i for just an address range, maybe the pipeline flush won't be done there)

To fix the above we would need a guarantee that a pipeline flush would always accompany a icache flush and to my reading the RVWMO Memory Consistency Model does not clarify this, although I can barely understand it so.... ¯\\\_(ツ)\_/¯


TL;DR: We will almost certainly be fine calling the custom riscv_flush_icache sycall for now, and perhaps future revisions to the ISA manual will clarify this and we can update our implementation then

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 10:11):

bjorn3 commented on issue #5033:

As I understand it the fence.i instruction will always be the correct way to handle modification of code. Depending on the cpu it may only flush the pipeline (if the instruction cache is coherent with the data cache) or also flush the instruction cache (if it isn't coherent), but the end result is the same: The instruction cache is brought in sync with the data cache and the pipeline is flushed.

As for the instruction cache flush syscall being guaranteed to flush the pipeline, I'm not sure.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 10:37):

afonso360 commented on issue #5033:

Yeah, you're right, it would probably be a different instruction if those semantics would change.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 10:39):

afonso360 edited a comment on issue #5033:

Yeah, you're right, it would probably be a different instruction if those semantics were to change.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 10:55):

yuyang-ok commented on issue #5033:

We will almost certainly be fine calling the custom riscv_flush_icache sycall for now, and perhaps future revisions to the ISA manual will clarify this and we can update our implementation then

I think we can trust the linux implmentation.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 08 2022 at 11:02):

yuyang-ok edited a comment on issue #5033:

We will almost certainly be fine calling the custom riscv_flush_icache sycall for now, and perhaps future revisions to the ISA manual will clarify this and we can update our implementation then

I think we can trust the linux implementation.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 03:45):

yuyang-ok commented on issue #5033:

@afonso360 https://github.com/bytecodealliance/wasmtime/pull/4997 is not merged,I think maybe I can begin with this after merge.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 12 2022 at 08:11):

afonso360 commented on issue #5033:

Yeah, sure! I'm still waiting for a re-review on that, but it shouldn't take too long.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 18 2022 at 11:09):

yuyang-ok commented on issue #5033:

@afonso360 I cannot find the constant.

diff --git a/crates/jit-icache-coherence/src/libc.rs b/crates/jit-icache-coherence/src/libc.rs
index 6ea9cea08..17f37a7c5 100644
--- a/crates/jit-icache-coherence/src/libc.rs
+++ b/crates/jit-icache-coherence/src/libc.rs
@@ -60,7 +60,18 @@ pub(crate) fn pipeline_flush_mt() -> Result<()> {
// In any other case we got an actual error, so lets propagate that up
e => e?,
}
-

view this post on Zulip Wasmtime GitHub notifications bot (Oct 18 2022 at 11:21):

yuyang-ok commented on issue #5033:

@afonso360 I don't see clear_cache on riscv64, I think clear_cache will do all the job.

view this post on Zulip Wasmtime GitHub notifications bot (Oct 18 2022 at 11:50):

afonso360 commented on issue #5033:

Right, I also think the flush_icache syscall will work. However, that syscall takes three arguments which we need to pass. It currently doesn't do anything with them but we still need to pass them.

Another thing that it would be nice if we could do is call the vDSO implementation which would be lot faster on single core systems since it would just call the flush.i instruction and be done with it (I'm not sure how hard it is to do this, so feel free to disregard it).

view this post on Zulip Wasmtime GitHub notifications bot (Oct 18 2022 at 11:51):

afonso360 edited a comment on issue #5033:

Right, I also think the flush_icache syscall will work. However, that syscall takes three arguments which we need to pass. It currently doesn't do anything with them but we still need to pass them.

It would also be nice if we could call the vDSO implementation which would be lot faster on single core systems since it would just call the flush.i instruction and be done with it (I'm not sure how hard it is to do this, so feel free to disregard it).

view this post on Zulip Wasmtime GitHub notifications bot (Oct 18 2022 at 11:52):

afonso360 edited a comment on issue #5033:

Right, I also think the flush_icache syscall will work. However, that syscall takes three arguments which we need to pass. It currently doesn't do anything with them but we still need to pass them.

It would also be nice if we could call the vDSO implementation which would be lot faster on single core systems since it would just call the fence.i instruction and be done with it (I'm not sure how hard it is to do this, so feel free to disregard it).

view this post on Zulip Wasmtime GitHub notifications bot (Oct 19 2022 at 00:17):

yuyang-ok commented on issue #5033:

@afonso360 It is wired SYS_RISCV_FLUSH_ICACHE_ALL is same as SYS_RISCV_FLUSH_ICACHE_LOCAL.

https://github.com/torvalds/linux/search?q=SYS_RISCV_FLUSH_ICACHE_ALL


Last updated: Dec 23 2024 at 12:05 UTC