SIMD · cranelift · Zulip Chat Archive

Cranelift code generator. Contribute to bytecodealliance/cranelift development by creating an account on GitHub.

Andrew Brown (Feb 14 2020 at 21:55):

Add initial support for EVEX encodings by abrown · Pull Request #1370 · bytecodealliance/cranelift

This has been discussed in issue #1244. A short description of what this does, why it is needed: since the Wasm SIMD spec has re-added instructions (e.g. i64x2.mul) that are optimally encoded in V...

Andrew Brown (Jun 04 2020 at 17:41):

Before @Benjamin Bouvier comes back online, I am giving anyone interested a heads up of a four PR sequence adding some remaining legalizations for Wasm SIMD instructions. The sequence is: #1765 -> #1820 -> #1821 -> #1822. The blocking issue for me to finish this PRs has been that the spec defines semantics that make it very difficult to lower efficiently on x86. To get around this in the short term, I've added some disabled-by-default flags, assert_no_nans and assert_in_bounds, that allow users to reassure Cranelift that it doesn't need to do the extra work. I'm interested in feedback so please take a look if interested!

Andrew Brown (Jun 09 2020 at 22:46):

Implement fcvt_to_sint_sat (f32x4 -> i32x4) for x86 by abrown · Pull Request #1820 · bytecodealliance/wasmtime

This is a follow-on to #1765 and should be merged after that PR. The most notable change here is the addition of the ISA-specific flags assert_no_nans and assert_in_bounds which are disabled by def...

Dan Gohman (Jun 09 2020 at 22:49):

Andrew Brown (Jun 09 2020 at 22:49):

Dan Gohman (Jun 09 2020 at 22:49):

Andrew Brown (Jun 09 2020 at 22:49):

Andrew Brown (Jun 09 2020 at 22:51):

The idea of adding assert_no_nans (or any other flag that would make runtime guarantees) is a tricky one. With #1820 I added only a first step in that direction by allowing users to assert at runtime that their program will not have NaNs--this allows Cranelift to generate faster code. Of course, this is a hammer approach to a screwdriver problem because it affects, e.g., every f32x4.max in the program. This is bad: 1) the user has to apply the guarantee to the whole program, and 2) the user usually can't make this guarantee since they don't understand all the parts of the code they are running

Andrew Brown (Jun 09 2020 at 22:53):

I thought of this flag as an experimental thing and not the end solution. The end solution would have to involve programmer guarantees through, e.g., new C attributes like __attribute__((no_nans)) which would have to propagate through a Wasm custom section to Cranelift.

Andrew Brown (Jun 09 2020 at 22:54):

So, it could be set for Wasm code... initially for experimental benchmarking with the hammer flag and eventually through a more fine-grained mechanism

Andrew Brown (Jun 09 2020 at 22:56):

Dan Gohman (Jun 09 2020 at 22:57):

Andrew Brown (Jun 09 2020 at 22:59):

Eventually... I would like to try out this experimental flag approach with some benchmarks and see if it is worth it to pursue further

Dan Gohman (Jun 09 2020 at 22:59):

Andrew Brown (Jun 09 2020 at 22:59):

If it is and there is support from other people, then we could standardize it; if not, we just remove the flags

Dan Gohman (Jun 09 2020 at 23:00):

Andrew Brown (Jun 09 2020 at 23:03):

Um... it's been a bit since I looked at it but I thought that had to do with quieting NaNs

Andrew Brown (Jun 09 2020 at 23:04):

Dan Gohman (Jun 09 2020 at 23:04):

Dan Gohman (Jun 09 2020 at 23:05):

Andrew Brown (Jun 09 2020 at 23:05):

Dan Gohman (Jun 09 2020 at 23:06):

Implement fcvt_to_sint_sat (f32x4 -> i32x4) for x86 by abrown · Pull Request #1820 · bytecodealliance/wasmtime

Andrew Brown (Jun 09 2020 at 23:07):

Dan Gohman (Jun 09 2020 at 23:09):

Dan Gohman (Jun 09 2020 at 23:17):

Andrew Brown (Jun 09 2020 at 23:20):

v8/v8

The official mirror of the V8 Git repository. Contribute to v8/v8 development by creating an account on GitHub.

Andrew Brown (Jun 09 2020 at 23:20):

Andrew Brown (Jun 09 2020 at 23:24):

Need to drop but I'll check back in tomorrow; I'm interested in hearing more opinions on this experimental flag idea... and Dan if you just want to review that PR that would be great!!!

Dan Gohman (Jun 09 2020 at 23:26):

Julian Seward (Jun 11 2020 at 10:20):

@Andrew Brown can you say some more about the difficulties in implementing fcvt_to_sint_sat (f32x4 -> i32x4)efficiently? What sequence do you have to generate?

Andrew Brown (Jun 11 2020 at 15:30):

I can't say I suffered through that sequence much since I followed what v8 does (with one misunderstanding about their use of CMPEQPS). I think the bigger problem is that this and many other instructions have inefficient lowerings on x86 compared to other architectures. I've raised the issues and measured benchmarks and what I have concluded is that to make up for the extra instructions the spec semantics require we need several optimizations (like this one) and something along the lines of the guarantees I proposed in #1820. As noted above the "experimental flag" approach is not the best long-term solution but it can reduce the number of instructions emitted for several of the most inefficient instructions.

v8/v8

The official mirror of the V8 Git repository. Contribute to v8/v8 development by creating an account on GitHub.

WebAssembly/simd

Branch of the spec repo scoped to discussion of SIMD in WebAssembly - WebAssembly/simd

Improve bitselect codegen with knowledge of operand origin · Issue #1187 · bytecodealliance/wasmtime

Feature As discussed in WebAssembly/simd#192, the Wasm SIMD bitselect instruction could be lowered to one of the x86 BLEND* family of instructions instead using 3-4 instructions. Benefit Potentiall...

Benjamin Bouvier (Jun 11 2020 at 15:41):

Re: the flag, i was trying to think briefly in terms of security behaviors. The only case in which it's fine to set the flag is when you're running Cranelift on your own hardware; otherwise, it seems that setting the flag could open the door for correctness issues in the best case, security vulnerabilities in the worst one. So it might not be applicable to the browser use case, for one thing.

Benjamin Bouvier (Jun 11 2020 at 15:44):

I hear the performance improvements it can bring, and from my memories of the wasm spec in general (around max/min), I understand there might be some nice wins from doing this. I would be tempted to have this if we had a clear understanding of what's the worst that can happen when the flag isn't respected, and what we can do for this.

Benjamin Bouvier (Jun 11 2020 at 15:45):

Or considering alternatives; maybe we want some other low-level CLIR instructions that don't have these checks, and the emitter can choose to emit one or the other, or group the unsafe ones in a code sequence guarded by a check that the inputs are not NaNs. (Mostly equivalent to hoisting the NaN checks in one place)

Benjamin Bouvier (Jun 11 2020 at 15:46):

Could the experiments be done without checking in all the code upstream (in wasmtime master)?

Andrew Brown (Jun 11 2020 at 16:27):

I agree with what you said about experimental flags, but I have convinced myself that a per-instruction attribute (the better, long-term solution) set by the programmer (not knowing the runtime hardware) would also be just as safe--I mean, the user is already trusting the programmer to write correct code so they can also trust the programmer to make a per-instruction guarantee

I don't think I understand this concern: if the user doesn't apply the flag, the code runs slower; if the user applies the flag, they may have unexpected data (if they assumed a NaN was converted to a zero but then was not).

Andrew Brown (Jun 11 2020 at 16:31):

On the alternatives: hoisting doesn't work if the operands of the guaranteed instruction are created inside a loop. And I could just keep all my optimizations local and just test locally but I guess the point of discussing this is to figure out if you all agree that Cranelift should try out this approach, from a philosophical standpoint, first as an experimental flag and possibly later with a per-instruction attribute.

Andrew Brown (Jun 11 2020 at 16:34):

(looks like we are all online if and I would be open to discussing on Zoom for a few minutes if that is quicker)

Julian Seward (Jun 11 2020 at 16:41):

I must be honest and say I am not in favour of assert_no_nans. The effect of it is to make the meaning of CLIR dependent on some external context, where it wasn't before. "Under-the-table" semantics, I think of it as.

As Dan says, we can't use it for translating wasm. What would perhaps be better is to wait for v2 of the wasm simd spec, which Lars tells me might have some variants of existing insns that have cheaper to implement on Intel. Then we'd have a cheaper implementation available exactly for those insns which are currently expensive, without assert_no_nans, and without creating the possibility of tagging arbitrary cranelift insns with a "cheap semantics please" flag in cases where that has no meaning. Specifically, we could add cheap-variant CLIR ops for exactly those which are currently problematic, and for no others.

Andrew Brown (Jun 11 2020 at 16:53):

I don't have much confidence in v2 of the spec... my experience so far has been that it is very difficult raise these types of lowering issues and get some type of resolution for x86 (hence looking at "under-the-table" semantics :slight_smile:). I agree that a solution at the spec level would be best but, in the absence of that, this approach seemed second-best and perhaps even a way to influence the spec, e.g. "if this flag is necessary to achieve near-native performance, why don't we just add some Wasm SIMD instructions?"

Andrew Brown (Jun 11 2020 at 16:56):

I think you can in certain scenarios (e.g. performance experiments) but by default this flag would be off; users would have to know what they're doing before using something like this.

Andrew Brown (Jun 12 2020 at 22:25):

Implement fcvt_to_sint_sat (f32x4 -> i32x4) for x86 by abrown · Pull Request #1876 · bytecodealliance/wasmtime

This is the same as #1820 without the runtime guarantee flags.

Andrew Brown (Jun 12 2020 at 22:27):

I tagged @Julian Seward as the reviewer since you looked at the last one--thanks for the reviews!

Stream: cranelift

Topic: SIMD

Andrew Brown (Dec 13 2019 at 22:50):

Andrew Brown (Feb 14 2020 at 21:55):

Andrew Brown (Jun 04 2020 at 17:41):

Andrew Brown (Jun 09 2020 at 22:46):

Dan Gohman (Jun 09 2020 at 22:49):

Andrew Brown (Jun 09 2020 at 22:49):

Dan Gohman (Jun 09 2020 at 22:49):

Andrew Brown (Jun 09 2020 at 22:49):

Andrew Brown (Jun 09 2020 at 22:51):

Andrew Brown (Jun 09 2020 at 22:53):

Andrew Brown (Jun 09 2020 at 22:54):

Andrew Brown (Jun 09 2020 at 22:56):

Dan Gohman (Jun 09 2020 at 22:57):

Andrew Brown (Jun 09 2020 at 22:59):

Dan Gohman (Jun 09 2020 at 22:59):

Andrew Brown (Jun 09 2020 at 22:59):

Dan Gohman (Jun 09 2020 at 23:00):

Andrew Brown (Jun 09 2020 at 23:03):

Andrew Brown (Jun 09 2020 at 23:04):

Dan Gohman (Jun 09 2020 at 23:04):

Dan Gohman (Jun 09 2020 at 23:05):

Andrew Brown (Jun 09 2020 at 23:05):

Dan Gohman (Jun 09 2020 at 23:06):

Andrew Brown (Jun 09 2020 at 23:07):

Andrew Brown (Jun 09 2020 at 23:07):

Dan Gohman (Jun 09 2020 at 23:09):

Dan Gohman (Jun 09 2020 at 23:17):

Andrew Brown (Jun 09 2020 at 23:20):

Andrew Brown (Jun 09 2020 at 23:20):

Andrew Brown (Jun 09 2020 at 23:24):

Dan Gohman (Jun 09 2020 at 23:26):

Julian Seward (Jun 11 2020 at 10:20):

Andrew Brown (Jun 11 2020 at 15:30):

Benjamin Bouvier (Jun 11 2020 at 15:41):

Benjamin Bouvier (Jun 11 2020 at 15:44):

Benjamin Bouvier (Jun 11 2020 at 15:45):

Benjamin Bouvier (Jun 11 2020 at 15:46):

Andrew Brown (Jun 11 2020 at 16:27):

Andrew Brown (Jun 11 2020 at 16:31):

Andrew Brown (Jun 11 2020 at 16:34):

Julian Seward (Jun 11 2020 at 16:41):

Andrew Brown (Jun 11 2020 at 16:53):

Andrew Brown (Jun 11 2020 at 16:56):

Andrew Brown (Jun 12 2020 at 22:25):

Andrew Brown (Jun 12 2020 at 22:27):