jameysharp opened PR #8125 from jameysharp:table-constant-bounds
to bytecodealliance:main
:
WebAssembly tables have a minimum size, but may optionally also have a maximum size. If the maximum is set to the same number of elements as the minimum, then we can emit an
iconst
instruction for bounds-checks on tables, rather than loading the bounds from memory.That's a win in its own right, but when optimization is enabled, this also allows bounds-checks to constant-fold if the table index is provided as an integer constant.
jameysharp requested wasmtime-core-reviewers for a review on PR #8125.
jameysharp requested alexcrichton for a review on PR #8125.
jameysharp requested abrown for a review on PR #8125.
jameysharp requested wasmtime-compiler-reviewers for a review on PR #8125.
alexcrichton submitted PR review:
Looks good to me!
Before I hit the big green button though, two questions/observations:
- As this gets more complicated I fear that keeping
func_environ.rs
in sync withdummy.rs
is going to get a bit cumbersome. I know that's a longstanding issue that we want to test the clif of wasmtime's output, but I'd probably caution going too far down the route of mirroring all table stuff in the two files. Either that, or do you have plans to move some of this instead tocranelift-wasm
?- One thing I'm a little sad about here is that we get to give Cranelift a bunch of constants but because they're involved in control flow it may not end up helping a whole lot. For example in the examples you added here the
brif
can becomebr
, but I don't think we currently optimize that. In lieu of adding more optimizations, do you think that the frontend will be able to detect two constants and omit the control flow entirely? Basically doing that form of constant folding in the frontend rather than in the middle-end.To be clear I don't think either of these need to be addressed before this PR lands, so you can feel free to hit the button as well, figured this'd be a good place to ask anyway!
cfallin commented on PR #8125:
One thing I'm a little sad about here is that we get to give Cranelift a bunch of constants but because they're involved in control flow it may not end up helping a whole lot. For example in the examples you added here the brif can become br, but I don't think we currently optimize that. In lieu of adding more optimizations, do you think that the frontend will be able to detect two constants and omit the control flow entirely? Basically doing that form of constant folding in the frontend rather than in the middle-end.
Having just talked to @jameysharp about his plans my understanding is that the next steps will address this: removing the branch-on-OOB in exchange for a select-null-pointer approach, and then separately, optimizing away the select if (and only if!) the choice-input is constant. The overall motivation/arc here is that we want to make
i32.iconst N ; call_indirect ...
as fast as possible, and in particular if we have non-nullable typed funcrefs and use the typed variant ofcall_indirect
, with a bit more too (e.g. removing the null check) we should eventually have load-the-function-pointer-and-call machine code.
cfallin edited a comment on PR #8125:
One thing I'm a little sad about here is that we get to give Cranelift a bunch of constants but because they're involved in control flow it may not end up helping a whole lot. For example in the examples you added here the brif can become br, but I don't think we currently optimize that. In lieu of adding more optimizations, do you think that the frontend will be able to detect two constants and omit the control flow entirely? Basically doing that form of constant folding in the frontend rather than in the middle-end.
Having just talked to @jameysharp about his plans my understanding is that the next steps will address this: removing the branch-on-OOB in exchange for a select-null-pointer approach, and then separately, optimizing away the select if (and only if!) the choice-input is constant. The overall motivation/arc here is that we want to make
i32.iconst N ; table.get; call_ref ...
as fast as possible, and in particular if we have non-nullable typed funcrefs and use the typed variant ofcall_ref
, with a bit more too (e.g. removing the null check) we should eventually have load-the-function-pointer-and-call machine code.
jameysharp commented on PR #8125:
Chris' response accurately reflects my plans for how to address your second question, Alex. I'm currently working on getting rid of the
trapnz
, and it works on x64, but I need to do a bit more work for the rest of the backends. #8122 was the first step on that. Once that is out of the way, the existing constant-folding optimizations get us most of the way to where we want to be, and as Chris said, eliminatingselect_spectre_guard
when its condition is constant does the rest.To your first question, I've been trying to push as much as I can into cranelift-wasm shared code, but many details (like "how do I discover the current table bounds", lazy table initialization, and GC refcount management) are unfortunately specific to a particular runtime, so I haven't been able to find much opportunity to share more than this. I'd love to hear suggestions though!
alexcrichton commented on PR #8125:
Oh nice, yeah changing to select{,_spectre} sounds like a nice way to solve this without much work in the frontend. And agreed that it'd be great for
i32.const N
+call_indirect
to be load/call in the most optimal case.I'd love to hear suggestions though!
Ah yeah if all the major bits go in cranelift-wasm then I think that's fine, I think I over-rotated on the
let bound = ...
bit that's basically the same infunc_environ.rs
anddummy.rs
.AFAIK there's no use case for keeping
dummy.rs
though other than "we haven't figure out a way to test Wasmtime". I might be just so inspired by this to take a stab at migrating all the tests to using Wasmtime's codegen...
jameysharp commented on PR #8125:
I would love to have CLIF and compiled-output tests that run through Wasmtime instead of using the dummy environment, personally! My current sense is there are plenty of reasons why that's difficult, but if you get it working I'm all for it.
jameysharp merged PR #8125.
Last updated: Dec 23 2024 at 12:05 UTC