Stream: cranelift

Topic: SIMD


view this post on Zulip Andrew Brown (Dec 13 2019 at 22:50):

If you have a spare moment could you review some of the fixes I pushed as I was running the SIMD spec tests in wasmtime? The reviews should be rather short (<50 lines): https://github.com/bytecodealliance/cranelift/pulls/abrown.

Cranelift code generator. Contribute to bytecodealliance/cranelift development by creating an account on GitHub.

view this post on Zulip Andrew Brown (Feb 14 2020 at 21:55):

If anyone is interested in AVX-512 support in Cranelift, I just pushed https://github.com/bytecodealliance/cranelift/pull/1370 to add the ability to encode instructions using the EVEX format.

This has been discussed in issue #1244. A short description of what this does, why it is needed: since the Wasm SIMD spec has re-added instructions (e.g. i64x2.mul) that are optimally encoded in V...

view this post on Zulip Andrew Brown (Jun 04 2020 at 17:41):

Before @Benjamin Bouvier comes back online, I am giving anyone interested a heads up of a four PR sequence adding some remaining legalizations for Wasm SIMD instructions. The sequence is: #1765 -> #1820 -> #1821 -> #1822. The blocking issue for me to finish this PRs has been that the spec defines semantics that make it very difficult to lower efficiently on x86. To get around this in the short term, I've added some disabled-by-default flags, assert_no_nans and assert_in_bounds, that allow users to reassure Cranelift that it doesn't need to do the extra work. I'm interested in feedback so please take a look if interested!

view this post on Zulip Andrew Brown (Jun 09 2020 at 22:46):

Tagging some people per @Benjamin Bouvier's comment in https://github.com/bytecodealliance/wasmtime/pull/1820: @Dan Gohman, @Chris Fallin, @Julian Seward

This is a follow-on to #1765 and should be merged after that PR. The most notable change here is the addition of the ISA-specific flags assert_no_nans and assert_in_bounds which are disabled by def...

view this post on Zulip Dan Gohman (Jun 09 2020 at 22:49):

If assert_no_nans is set and a NaN happens, it's undefined behavior?

view this post on Zulip Andrew Brown (Jun 09 2020 at 22:49):

Yes

view this post on Zulip Dan Gohman (Jun 09 2020 at 22:49):

So we won't be able to set that for wasm code

view this post on Zulip Andrew Brown (Jun 09 2020 at 22:49):

(I'm writing up a more full explanation... one sec)

view this post on Zulip Andrew Brown (Jun 09 2020 at 22:51):

The idea of adding assert_no_nans (or any other flag that would make runtime guarantees) is a tricky one. With #1820 I added only a first step in that direction by allowing users to assert at runtime that their program will not have NaNs--this allows Cranelift to generate faster code. Of course, this is a hammer approach to a screwdriver problem because it affects, e.g., every f32x4.max in the program. This is bad: 1) the user has to apply the guarantee to the whole program, and 2) the user usually can't make this guarantee since they don't understand all the parts of the code they are running

view this post on Zulip Andrew Brown (Jun 09 2020 at 22:53):

I thought of this flag as an experimental thing and not the end solution. The end solution would have to involve programmer guarantees through, e.g., new C attributes like __attribute__((no_nans)) which would have to propagate through a Wasm custom section to Cranelift.

view this post on Zulip Andrew Brown (Jun 09 2020 at 22:54):

So, it could be set for Wasm code... initially for experimental benchmarking with the hammer flag and eventually through a more fine-grained mechanism

view this post on Zulip Andrew Brown (Jun 09 2020 at 22:56):

Does that explain the idea better?

view this post on Zulip Dan Gohman (Jun 09 2020 at 22:57):

Ok, so that would be something we'd coordinate with the wasm simd proposal?

view this post on Zulip Andrew Brown (Jun 09 2020 at 22:59):

Eventually... I would like to try out this experimental flag approach with some benchmarks and see if it is worth it to pursue further

view this post on Zulip Dan Gohman (Jun 09 2020 at 22:59):

Ah, ok.

view this post on Zulip Andrew Brown (Jun 09 2020 at 22:59):

If it is and there is support from other people, then we could standardize it; if not, we just remove the flags

view this post on Zulip Dan Gohman (Jun 09 2020 at 23:00):

So, looking at the code, why is there a band with an all-ones operand?

view this post on Zulip Andrew Brown (Jun 09 2020 at 23:03):

Um... it's been a bit since I looked at it but I thought that had to do with quieting NaNs

view this post on Zulip Andrew Brown (Jun 09 2020 at 23:04):

(or it could be a mistake as I was looking at V8 at the time)

view this post on Zulip Dan Gohman (Jun 09 2020 at 23:04):

We run with signaling-NaN signaling disabled, so we can probably skip that

view this post on Zulip Dan Gohman (Jun 09 2020 at 23:05):

We = all Cranelift-using wasm engines

view this post on Zulip Andrew Brown (Jun 09 2020 at 23:05):

:+1: and can you point me to where that is happening?

view this post on Zulip Dan Gohman (Jun 09 2020 at 23:06):

https://github.com/bytecodealliance/wasmtime/pull/1820/files#diff-f5aa2e18f8d875720c00f30b86cb69d4R1003

This is a follow-on to #1765 and should be merged after that PR. The most notable change here is the addition of the ISA-specific flags assert_no_nans and assert_in_bounds which are disabled by def...

view this post on Zulip Andrew Brown (Jun 09 2020 at 23:07):

No, I mean the FP configuration

view this post on Zulip Andrew Brown (Jun 09 2020 at 23:07):

...if any

view this post on Zulip Dan Gohman (Jun 09 2020 at 23:09):

oh, it's the default mxcsr setting

view this post on Zulip Dan Gohman (Jun 09 2020 at 23:17):

Also, I'm trying to figure out that bxor at the end. Should that be a band_not?

view this post on Zulip Andrew Brown (Jun 09 2020 at 23:20):

Here's what I referenced in V8: https://github.com/v8/v8/blob/master/src/compiler/backend/x64/code-generator-x64.cc#L3011-L3028

The official mirror of the V8 Git repository. Contribute to v8/v8 development by creating an account on GitHub.

view this post on Zulip Andrew Brown (Jun 09 2020 at 23:20):

I documented some of the other instructions better than this one...

view this post on Zulip Andrew Brown (Jun 09 2020 at 23:24):

Need to drop but I'll check back in tomorrow; I'm interested in hearing more opinions on this experimental flag idea... and Dan if you just want to review that PR that would be great!!!

view this post on Zulip Dan Gohman (Jun 09 2020 at 23:26):

oh, I think the cmpeqps in there isn't generating an all-ones

view this post on Zulip Julian Seward (Jun 11 2020 at 10:20):

@Andrew Brown can you say some more about the difficulties in implementing fcvt_to_sint_sat (f32x4 -> i32x4)efficiently? What sequence do you have to generate?

view this post on Zulip Andrew Brown (Jun 11 2020 at 15:30):

I can't say I suffered through that sequence much since I followed what v8 does (with one misunderstanding about their use of CMPEQPS). I think the bigger problem is that this and many other instructions have inefficient lowerings on x86 compared to other architectures. I've raised the issues and measured benchmarks and what I have concluded is that to make up for the extra instructions the spec semantics require we need several optimizations (like this one) and something along the lines of the guarantees I proposed in #1820. As noted above the "experimental flag" approach is not the best long-term solution but it can reduce the number of instructions emitted for several of the most inefficient instructions.

The official mirror of the V8 Git repository. Contribute to v8/v8 development by creating an account on GitHub.
Branch of the spec repo scoped to discussion of SIMD in WebAssembly - WebAssembly/simd
Feature As discussed in WebAssembly/simd#192, the Wasm SIMD bitselect instruction could be lowered to one of the x86 BLEND* family of instructions instead using 3-4 instructions. Benefit Potentiall...

view this post on Zulip Benjamin Bouvier (Jun 11 2020 at 15:41):

Re: the flag, i was trying to think briefly in terms of security behaviors. The only case in which it's fine to set the flag is when you're running Cranelift on your own hardware; otherwise, it seems that setting the flag could open the door for correctness issues in the best case, security vulnerabilities in the worst one. So it might not be applicable to the browser use case, for one thing.

view this post on Zulip Benjamin Bouvier (Jun 11 2020 at 15:44):

I hear the performance improvements it can bring, and from my memories of the wasm spec in general (around max/min), I understand there might be some nice wins from doing this. I would be tempted to have this if we had a clear understanding of what's the worst that can happen when the flag isn't respected, and what we can do for this.

view this post on Zulip Benjamin Bouvier (Jun 11 2020 at 15:45):

Or considering alternatives; maybe we want some other low-level CLIR instructions that don't have these checks, and the emitter can choose to emit one or the other, or group the unsafe ones in a code sequence guarded by a check that the inputs are not NaNs. (Mostly equivalent to hoisting the NaN checks in one place)

view this post on Zulip Benjamin Bouvier (Jun 11 2020 at 15:46):

Could the experiments be done without checking in all the code upstream (in wasmtime master)?

view this post on Zulip Andrew Brown (Jun 11 2020 at 16:27):

The only case in which it's fine to set the flag is when you're running Cranelift on your own hardware

I agree with what you said about experimental flags, but I have convinced myself that a per-instruction attribute (the better, long-term solution) set by the programmer (not knowing the runtime hardware) would also be just as safe--I mean, the user is already trusting the programmer to write correct code so they can also trust the programmer to make a per-instruction guarantee

I would be tempted to have this if we had a clear understanding of what's the worst that can happen when the flag isn't respected

I don't think I understand this concern: if the user doesn't apply the flag, the code runs slower; if the user applies the flag, they may have unexpected data (if they assumed a NaN was converted to a zero but then was not).

view this post on Zulip Andrew Brown (Jun 11 2020 at 16:31):

On the alternatives: hoisting doesn't work if the operands of the guaranteed instruction are created inside a loop. And I could just keep all my optimizations local and just test locally but I guess the point of discussing this is to figure out if you all agree that Cranelift should try out this approach, from a philosophical standpoint, first as an experimental flag and possibly later with a per-instruction attribute.

view this post on Zulip Andrew Brown (Jun 11 2020 at 16:34):

(looks like we are all online if and I would be open to discussing on Zoom for a few minutes if that is quicker)

view this post on Zulip Julian Seward (Jun 11 2020 at 16:41):

I must be honest and say I am not in favour of assert_no_nans. The effect of it is to make the meaning of CLIR dependent on some external context, where it wasn't before. "Under-the-table" semantics, I think of it as.

As Dan says, we can't use it for translating wasm. What would perhaps be better is to wait for v2 of the wasm simd spec, which Lars tells me might have some variants of existing insns that have cheaper to implement on Intel. Then we'd have a cheaper implementation available exactly for those insns which are currently expensive, without assert_no_nans, and without creating the possibility of tagging arbitrary cranelift insns with a "cheap semantics please" flag in cases where that has no meaning. Specifically, we could add cheap-variant CLIR ops for exactly those which are currently problematic, and for no others.

view this post on Zulip Andrew Brown (Jun 11 2020 at 16:53):

I don't have much confidence in v2 of the spec... my experience so far has been that it is very difficult raise these types of lowering issues and get some type of resolution for x86 (hence looking at "under-the-table" semantics :slight_smile:). I agree that a solution at the spec level would be best but, in the absence of that, this approach seemed second-best and perhaps even a way to influence the spec, e.g. "if this flag is necessary to achieve near-native performance, why don't we just add some Wasm SIMD instructions?"

view this post on Zulip Andrew Brown (Jun 11 2020 at 16:56):

As Dan says, we can't use it for translating Wasm

I think you can in certain scenarios (e.g. performance experiments) but by default this flag would be off; users would have to know what they're doing before using something like this.

view this post on Zulip Andrew Brown (Jun 12 2020 at 22:25):

Ok, to resolve this discussion: I closed #1820 (which had guarantee flags) and re-opened it without flags as https://github.com/bytecodealliance/wasmtime/pull/1876.

This is the same as #1820 without the runtime guarantee flags.

view this post on Zulip Andrew Brown (Jun 12 2020 at 22:27):

I tagged @Julian Seward as the reviewer since you looked at the last one--thanks for the reviews!


Last updated: Jan 24 2025 at 00:11 UTC