Stream: git-wasmtime

Topic: wasmtime / issue #4166 Cranelift: alias analysis: track e...


view this post on Zulip Wasmtime GitHub notifications bot (May 19 2022 at 23:17):

cfallin labeled issue #4166:

In #4163 we are introducing an alias analysis and redundant-load elimination / store-to-load-forwarding transform.

This initial implementation categorizes all memory accesses as one of four kinds: to a "heap", to a "table", to the "vmctx", or to everything else. These four categories are allowed to be optimized separately from each other; so e.g. a store to a table does not prevent a load from a heap from being merged with an earlier load, if otherwise to the same address.

This is correct, and simple, and allows us to keep just four bits in MemFlags and four u32s for the "last store" vector, per instruction. However, it is somewhat more imprecise than we would like, especially in the future when we expect multiple modules, memories, tables, etc. to become more common.

Thus, we should investigate ways of efficiently representing an arbitrary number of heaps or tables as separate categories of abstract state. This may require an extended MemFlags, or indirection of some kind, or some limit (first 16, 32, ... memories are privileged).

view this post on Zulip Wasmtime GitHub notifications bot (May 19 2022 at 23:17):

cfallin labeled issue #4166:

In #4163 we are introducing an alias analysis and redundant-load elimination / store-to-load-forwarding transform.

This initial implementation categorizes all memory accesses as one of four kinds: to a "heap", to a "table", to the "vmctx", or to everything else. These four categories are allowed to be optimized separately from each other; so e.g. a store to a table does not prevent a load from a heap from being merged with an earlier load, if otherwise to the same address.

This is correct, and simple, and allows us to keep just four bits in MemFlags and four u32s for the "last store" vector, per instruction. However, it is somewhat more imprecise than we would like, especially in the future when we expect multiple modules, memories, tables, etc. to become more common.

Thus, we should investigate ways of efficiently representing an arbitrary number of heaps or tables as separate categories of abstract state. This may require an extended MemFlags, or indirection of some kind, or some limit (first 16, 32, ... memories are privileged).

view this post on Zulip Wasmtime GitHub notifications bot (May 19 2022 at 23:17):

cfallin opened issue #4166:

In #4163 we are introducing an alias analysis and redundant-load elimination / store-to-load-forwarding transform.

This initial implementation categorizes all memory accesses as one of four kinds: to a "heap", to a "table", to the "vmctx", or to everything else. These four categories are allowed to be optimized separately from each other; so e.g. a store to a table does not prevent a load from a heap from being merged with an earlier load, if otherwise to the same address.

This is correct, and simple, and allows us to keep just four bits in MemFlags and four u32s for the "last store" vector, per instruction. However, it is somewhat more imprecise than we would like, especially in the future when we expect multiple modules, memories, tables, etc. to become more common.

Thus, we should investigate ways of efficiently representing an arbitrary number of heaps or tables as separate categories of abstract state. This may require an extended MemFlags, or indirection of some kind, or some limit (first 16, 32, ... memories are privileged).

view this post on Zulip Wasmtime GitHub notifications bot (May 19 2022 at 23:17):

cfallin labeled issue #4166:

In #4163 we are introducing an alias analysis and redundant-load elimination / store-to-load-forwarding transform.

This initial implementation categorizes all memory accesses as one of four kinds: to a "heap", to a "table", to the "vmctx", or to everything else. These four categories are allowed to be optimized separately from each other; so e.g. a store to a table does not prevent a load from a heap from being merged with an earlier load, if otherwise to the same address.

This is correct, and simple, and allows us to keep just four bits in MemFlags and four u32s for the "last store" vector, per instruction. However, it is somewhat more imprecise than we would like, especially in the future when we expect multiple modules, memories, tables, etc. to become more common.

Thus, we should investigate ways of efficiently representing an arbitrary number of heaps or tables as separate categories of abstract state. This may require an extended MemFlags, or indirection of some kind, or some limit (first 16, 32, ... memories are privileged).

view this post on Zulip Wasmtime GitHub notifications bot (May 20 2022 at 17:58):

fitzgen commented on issue #4166:

One possibility is that we have "heap0", "heap1", "heap2", and finally "heap_other" (or even just heap0 and heap_other).

The CG has talked about using hints for which memories need to be fast and use virtual memory tricks in browsers which can't use those tricks for every memory. Maybe we could use those same hints to map onto heap0/1/2 vs other.

view this post on Zulip Wasmtime GitHub notifications bot (May 20 2022 at 17:59):

fitzgen commented on issue #4166:

or some limit (first 16, 32, ... memories are privileged).

Ah I think this is the same thing I was getting at with heap0/1/2 vs heap_other.

view this post on Zulip Wasmtime GitHub notifications bot (May 20 2022 at 19:24):

bjorn3 commented on issue #4166:

One possibility is that we have "heap0", "heap1", "heap2", and finally "heap_other" (or even just heap0 and heap_other).

That won't help for stack slots though. Those are really important for cg_clif. Maybe we could have a side table recording for each instruction which alias set it is part of?

view this post on Zulip Wasmtime GitHub notifications bot (May 20 2022 at 19:28):

cfallin commented on issue #4166:

@bjorn3 yes, that could work, as long as it is optional (for memory-overhead reasons). The advantage of MemFlags now is that it's a u8 (or maybe extended to 16 or 32 bits) that can ride along in the InstructionData.


Last updated: Jan 24 2025 at 00:11 UTC