alexcrichton commented on issue #6224:
I'll also note that this doesn't enable the runtest for insertlane yet because there are other lowerings that use SSE4.1 instructions but a future commit will enable the insertlane test for testing without SSE4.1
penzn commented on issue #6224:
Curious - is this just for testing or are there important users that only have SSE2? Even low-power x86(32/64) CPUs have had SSE4 for a very long time, and the standard standard was written with SSE4 in mind, web implementations don't support anything lower. In case someone would try to accelerate their code using instructions in this PR on SSE2 machine they are likely to get surprised.
alexcrichton commented on issue #6224:
Just for testing and completeness. We occasionally get reports about users getting an error message that their machine didn't support SSE4.2 which was the basline required by Wasmtime originally. Part of this is on Wasmtime since we require SSE4.2 even if the module doesn't actually use simd instructions and that error could probably be improved by first detecting SIMD and then requiring SSE4.2 or something like that.
Otherwise though I was basically just curious if it was even possible to take the requirement from SSE4.2 to SSE2. I wanted to learn x64 a bit more and the various instructions and it ended up not being too bad and it fits well in the instruction selection framework that Cranelift has, so I figured I'd land it.
My guess though is that this'll only be here for compatibility in niche cases. I definitely have no plans myself to use work like this to shape future standards or anything like that, I think that the SSE4.2 baseline is quite reasonable and well thought out.
cfallin commented on issue #6224:
@penzn FWIW, the current discussion on this and Alex's work were spawned by a downstream user trying to run Wasmtime on a Penryn-era (Core 2) machine a month or so ago, so these users do still exist, even if rarely (I'm not finding the issue at the moment, but all the related ones are linked from #3810, and we've had pretty extensive discussions on that cluster of issues over the past 1.5 years). It's also just nice to be able to say that we support our full feature set on any x86_64 machine (i.e. only SSE2), and dynamically detect the presence of better options (almost always switch to our SSE4.2 lowerings when available), rather than the other way around where we have a dynamically-detected error path or a "hole" in our support matrix. Wasm is all about portability, after all!
penzn commented on issue #6224:
Some instructions were added only because 4.1/4.2 support is assumed. Here is one example, also see notes from the meeting that approved it. On the other hand adding fallbacks makes sense.
The tricky part of SIMD is that it needs to be fast as well as correct, you probably don't want to get into a situation when someone would accelerate their code using SIMD operations only to find out it got slower (probably even worse if compiler would quietly do it for them). Performance is pretty hard to specify precisely, criteria for adding instructions was something along the lines of "no slowdown, speedup for at least one platform" (i.e. swizzle).
cfallin commented on issue #6224:
Right, and that came up in earlier discussions here too. From my perspective at least, it makes sense to tune the feature for newer hardware (and we do), but it kind of also sucks to just abandon users with old hardware completely. Given the choice between "user with 2007 laptop cannot run Wasmtime at all" and "user with 2007 laptop can run Wasmtime, but it's slow just like everything else on this machine" I think we'll take the latter :-)
Last updated: Nov 22 2024 at 16:03 UTC