jeffcharles requested abrown for a review on PR #9950.
jeffcharles opened PR #9950 from jeffcharles:winch-add-load-extend-splat-instructions
to bytecodealliance:main
:
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->
Part of #8093. Adds support on x64 with AVX2 for:
v128.load8x8_s
v128.load8x8_u
v128.load16x4_s
v128.load16x4_u
v128.load32x2_s
v128.load32x2_u
v128.load8_splat
v128.load16_splat
v128.load32_splat
v128.load64_splat
I've changed
wasm_load
on the macroassemblers to take aLoadKind
instead ofOption<ExtendKind>
to model the additional vector operations (vector extends and splats), but I'm open to changing that to a different abstraction.
jeffcharles requested wasmtime-compiler-reviewers for a review on PR #9950.
jeffcharles requested pchickey for a review on PR #9950.
jeffcharles requested wasmtime-core-reviewers for a review on PR #9950.
saulecabrera commented on PR #9950:
I can take this review.
saulecabrera requested saulecabrera for a review on PR #9950.
github-actions[bot] commented on PR #9950:
Subscribe to Label Action
cc @saulecabrera
<details>
This issue or pull request has been labeled: "winch"Thus the following users have been cc'd because of the following labels:
- saulecabrera: winch
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
jeffcharles updated PR #9950.
saulecabrera created PR review comment:
Could you return an error here instead of panicking? I've done a refactoring across the compiler to ensure that panics are minimal, particularly for unimplemented/unsupported features. This ensures that we can can successfully run spec tests even when there's partial support for Wasm proposals.
Perhaps we can introduce a new
Unimplemeted*
enum entry here https://github.com/bytecodealliance/wasmtime/blob/main/winch/codegen/src/codegen/error.rs#L7, something around the lines ofUnimplementedLoadKind
saulecabrera submitted PR review:
Looking good! Left some comments inline.
saulecabrera created PR review comment:
Nit: can we remove this enum entry and keep
Option<LoadKind>
instead? Or where you envisioning this approach to be easier to consume in the callee as opposed to using anOption
?
saulecabrera created PR review comment:
Given that we're starting with avx2 as our baseline, I wonder if we should emit the
vpmov*
variants here.
jeffcharles submitted PR review.
jeffcharles created PR review comment:
We could. I figured if they are equivalent instructions that are supported on older chipsets, we could emit those so there would be less work to support non-AVX CPUs in the future given we only need to support 128-bit vectors. One thing I'm not clear on is if
Vpmov*
instructions are faster or not. If they are faster, then it would probably make sense to emit those instead.
jeffcharles submitted PR review.
jeffcharles created PR review comment:
It's a little easier on the callee side IMO since we can avoid nesting a match inside another match or an if-let. It's possible that
None
is not the appropriate name for this variant and something likePlain
orSimple
might be more appropriate.
saulecabrera submitted PR review.
saulecabrera created PR review comment:
My expectation is that they are probably more performant, but I haven't performed any meaningful benchmarks. Aside from the potential performance benefit, my comment here is for consistency mostly: given that we aren't planning on doing any checks/emission for anything other than avx2 for SIMD at least initially, I'd prefer if we could default to emitting avx2 only to start. My expectation is that when thinking about fallbacks for anything < avx2, it will require substantial work regardless given that we'd probably want to find a way to share that implementation with what already exists in Cranelift.
alexcrichton submitted PR review.
alexcrichton created PR review comment:
It's been a bit since I last checked, and I also haven't benchmarked, but IIRC the Intel optimization guide recommended either using exclusively VEX instructions (ones with the
v*
prefix) or exclusively non-VEX instructions. It's got something subtle to do with CPU state and such. It generally has to do with the fact that VEX instructions mutate the entire register contents as opposed to SSE which only mutate the lower 128-bits. This means that SSE instructions can accidentally create false dependencies between instructions.The rule of thumb I used in Cranelift at least was to try to unconditionally prefer
v*
VEX instructions when AVX is enabled as I think that's what Intel assumes compilers are doing to make things faster
saulecabrera submitted PR review.
saulecabrera created PR review comment:
I don't want to bikeshed too much here, I think the comment is clear enough, so I'm good if you'd like to leave it this way.
jeffcharles updated PR #9950.
jeffcharles updated PR #9950.
saulecabrera submitted PR review:
Thanks!
jeffcharles updated PR #9950.
jeffcharles updated PR #9950.
jeffcharles updated PR #9950.
jeffcharles updated PR #9950.
saulecabrera submitted PR review.
saulecabrera created PR review comment:
let unsupported = ["spec_testsuite/simd_align.wast"];
This is why the PR is still failing in CI. I tested the change in my arm64 machine FWIW.
jeffcharles updated PR #9950.
jeffcharles updated PR #9950.
jeffcharles requested saulecabrera for a review on PR #9950.
saulecabrera merged PR #9950.
Last updated: Jan 24 2025 at 00:11 UTC