glaebhoerl opened issue #4371:
(This is not really an issue but a heads-up; a discussion would've been more appropriate had those been enabled. Feel free to close for any reason.)
I was reading https://github.com/bytecodealliance/rfcs/pull/27, and was pleasantly surprised to find a "scoped hashmap" data structure being used in the wild.
A couple of years ago I wrote an optimized version of such a structure, with the motivating use case being the lexically-scoped context in a compiler: https://gist.github.com/glaebhoerl/d62d2b19365ae0d7c29102d0a5a6ab03. Feel free to take this and use it if you think it's worthwhile.
I'm not really sure of all the tradeoffs w.r.t. wasmtime's own implementation, especially as regards the particular use case, although what's visible at a glance is that the wasmtime version requires a loop when decrementing a level (leaving a scope) to restore the previous state, whereas the other doesn't (unless there is shadowing, which is N/A here), that having been the whole motivation for its existence.
The basic idea is reminiscent of a generational arena: upon popping a scope, stale entries aren't physically removed (or even touched); rather, based on what is effectively a generation ID (called a scope ID in the code), they're ignored/overwritten in subsequent gets/inserts.
(The version in the gist builds on hashbrown's RawTable to make the hash table aware of stale entries on a low level; there was also an earlier sketch without that integration which was simpler and didn't involve
unsafe
code.)
cfallin commented on issue #4371:
Thanks for letting us know @glaebhoerl!
I haven't gotten to the point in the egraph exploration yet where I'm doing detailed performance evaluation and tuning, but when I do, I'll keep this in mind.
cfallin labeled issue #4371:
(This is not really an issue but a heads-up; a discussion would've been more appropriate had those been enabled. Feel free to close for any reason.)
I was reading https://github.com/bytecodealliance/rfcs/pull/27, and was pleasantly surprised to find a "scoped hashmap" data structure being used in the wild.
A couple of years ago I wrote an optimized version of such a structure, with the motivating use case being the lexically-scoped context in a compiler: https://gist.github.com/glaebhoerl/d62d2b19365ae0d7c29102d0a5a6ab03. Feel free to take this and use it if you think it's worthwhile.
I'm not really sure of all the tradeoffs w.r.t. wasmtime's own implementation, especially as regards the particular use case, although what's visible at a glance is that the wasmtime version requires a loop when decrementing a level (leaving a scope) to restore the previous state, whereas the other doesn't (unless there is shadowing, which is N/A here), that having been the whole motivation for its existence.
The basic idea is reminiscent of a generational arena: upon popping a scope, stale entries aren't physically removed (or even touched); rather, based on what is effectively a generation ID (called a scope ID in the code), they're ignored/overwritten in subsequent gets/inserts.
(The version in the gist builds on hashbrown's RawTable to make the hash table aware of stale entries on a low level; there was also an earlier sketch without that integration which was simpler and didn't involve
unsafe
code.)
cfallin labeled issue #4371:
(This is not really an issue but a heads-up; a discussion would've been more appropriate had those been enabled. Feel free to close for any reason.)
I was reading https://github.com/bytecodealliance/rfcs/pull/27, and was pleasantly surprised to find a "scoped hashmap" data structure being used in the wild.
A couple of years ago I wrote an optimized version of such a structure, with the motivating use case being the lexically-scoped context in a compiler: https://gist.github.com/glaebhoerl/d62d2b19365ae0d7c29102d0a5a6ab03. Feel free to take this and use it if you think it's worthwhile.
I'm not really sure of all the tradeoffs w.r.t. wasmtime's own implementation, especially as regards the particular use case, although what's visible at a glance is that the wasmtime version requires a loop when decrementing a level (leaving a scope) to restore the previous state, whereas the other doesn't (unless there is shadowing, which is N/A here), that having been the whole motivation for its existence.
The basic idea is reminiscent of a generational arena: upon popping a scope, stale entries aren't physically removed (or even touched); rather, based on what is effectively a generation ID (called a scope ID in the code), they're ignored/overwritten in subsequent gets/inserts.
(The version in the gist builds on hashbrown's RawTable to make the hash table aware of stale entries on a low level; there was also an earlier sketch without that integration which was simpler and didn't involve
unsafe
code.)
cfallin labeled issue #4371:
(This is not really an issue but a heads-up; a discussion would've been more appropriate had those been enabled. Feel free to close for any reason.)
I was reading https://github.com/bytecodealliance/rfcs/pull/27, and was pleasantly surprised to find a "scoped hashmap" data structure being used in the wild.
A couple of years ago I wrote an optimized version of such a structure, with the motivating use case being the lexically-scoped context in a compiler: https://gist.github.com/glaebhoerl/d62d2b19365ae0d7c29102d0a5a6ab03. Feel free to take this and use it if you think it's worthwhile.
I'm not really sure of all the tradeoffs w.r.t. wasmtime's own implementation, especially as regards the particular use case, although what's visible at a glance is that the wasmtime version requires a loop when decrementing a level (leaving a scope) to restore the previous state, whereas the other doesn't (unless there is shadowing, which is N/A here), that having been the whole motivation for its existence.
The basic idea is reminiscent of a generational arena: upon popping a scope, stale entries aren't physically removed (or even touched); rather, based on what is effectively a generation ID (called a scope ID in the code), they're ignored/overwritten in subsequent gets/inserts.
(The version in the gist builds on hashbrown's RawTable to make the hash table aware of stale entries on a low level; there was also an earlier sketch without that integration which was simpler and didn't involve
unsafe
code.)
sunfishcode commented on issue #4371:
As the author of the current scoped hashmap implementation, I can confirm that it isn't anything fancy :smile:. It was just the simplest thing that would support the features needed by the GVN algorithm. When it comes time to tune that, it'd be cool to see someone take a look at this optimized hashmap implementation!
Amanieu commented on issue #4371:
I've found that you don't actually need a scoped hashmap for GVN: if you include the basic block of the defining instruction in the map then you can use a dominance check (2 compares) to quickly filter out "out of scope" entries.
In the end the implementation is surprisingly simple: https://gist.github.com/Amanieu/c7d43dfdeebc094494cfdfcf55bc9c7a [^1] [^2]
[^1]: You can ignore the code in
begin_block
, that's just there because my IR needs to support multiple entry points.
[^2]: On a somewhat unrelated note, this code isn't run as a separate GVN pass. It's instead used during IR lowering just before register allocation, which allows for much finer-grained instruction reuse.
Last updated: Jan 24 2025 at 00:11 UTC