I was thinking about https://github.com/bytecodealliance/wasmtime/pull/6557 and https://github.com/bytecodealliance/wasmtime/pull/6531 and it got me wondering about Winch and target features (mostly for x86). At a high level, how should Winch deal with target feature support? Depending on the answer to this, there's a number of sub-questions:
To me these don't necessarily have obvious answers. A knee-jerk reaction might be to support everything in the limit of time, but that I feel like is quite a lot of duplication with Cranelift which already handles all of this sort of target-feature-specific lowerings. I feel like "support everything" in the long term is probably only viable if more is shared with Cranelift rather than duplicating the lowering rules effectively.
The other alternative of "support exactly this one feature set" seems most reasonable to me from a maintainability point of view, but it has the obvious drawback of supporting less hardware or not going out of its way to support better lowerings (but I suppose Winch isn't "all about speed" anyway).
I suppose more-or-less these PRs got me thinking about all the lowering rules we have for various supported CPU features on x86 and if there's a way it can be supported in Winch without duplication. I don't think this is trivial do to at the moment, but was curious if others had thought about this too.
cc @Saúl Cabrera @Chris Fallin
One thought that occurs to me: the way that SpiderMonkey handles some of this duplication is to put some lowering sequences into the MacroAssembler, and then share that between tiers. So one could in theory do masm.tricky_simd(regA, regB, regTemp)
in both baseline and optimizing backends. One way of adopting that idea could be to have MachInst variants that are pseudoinsts encapsulating some whole sequence. I'm not sure how much I like that but it is one direction that could work
Regarding concrete support-points in the configuration space: I do think it'd be nice to at least support base-SSE2 and SSE4.2; maybe those two are enough overall for a baseline tier
if we have to pick one, "just SSE4.2" seems OK; we already have the ability to run Wasmtime on SSE2 (or will shortly anyway) with Cranelift
At least from my experience porting all the simd stuff to SSE2 is probably not worth the effort to put in two locations. It's somewhat error-prone and was "questionably worth it" the first time around and almost certainly not worth it to do again.
I do like though that today temporary registers are internal details of the lowerings where the masm approach would have to burn a register to reserve it for temporary use or, as you mentioned, expose it as an API implementation detail that a temporary is needed. That being said I don't really see an alternative to share code here.
Those are good points Alex, thanks for bringing this up. The ideal state that I've had in my mind is one in which Winch has full parity with Cranelift, so in this case my expectations lean towards the "support everything" approach that you've described. I'm not coming at this from the speed angle necessarily, but mainly from the embedder's usability and developer experience. To me it seems that giving the embedder a guarantee that both compilers behave the same under the same set of circumstances would be the ideal one. On the other hand, I must admit that I haven't put a lot of thought on how to get there, it was not until I started exploring these instructions when I realized that my assumptions were wrong, since initially I expected that x86_64
backend's emit
module would handle any fallback implementations. All that said, I'm curious to know if you or Chris have any ideas on how we could share some of these lowerings; even though I think that the duplication introduced by the two PRs above is not as heavy today, my expectation is that with the introduction of proposals like SIMD, a more manageable solution is desirable, so I'd be open to exploring any potential avenues.
One thought that I had, which I haven't dived too deep into, but I thought it could work: would introducing pseudo instructions that handle any alternative emission of instructions make sense, something similar to what we have for Division? https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/codegen/src/isa/x64/inst/emit.rs#L667
I can also add an agenda item for next week's meeting so that we can discuss this in a bit more detail if everyone is ok with that.
(I posted this without noticing Chris's answer, sorry)
Yup, glad to see we're thinking along the same lines :-)
nah that all definitely makes sense, so I guess I would phrase my thinking as we shouldn't duplicate the lowerings across both winch and cranelift while still trying to achieve full support in both
Pseudo-instructions are probably the way to go but I think we may want to explore better ways of defining and modelling them as right now you've got to edit a lot of files to add a pseudo-instruction, but if they were more ergonomic and/or easier to understand I think it'd make more sense to have more "macro instructions" like this
Last updated: Dec 23 2024 at 13:07 UTC