Stream: git-wasmtime

Topic: wasmtime / PR #2146 Enhance machinst x64 instruction sele...


view this post on Zulip Wasmtime GitHub notifications bot (Aug 19 2020 at 14:02):

bnjbvr opened PR #2146 from x64-isel to main:

This implements a few instruction selection optimizations for x64:

On embenchen in Spidermonkey, this is between a 3 to 7% speedup in compilation time, and i see generated code runtime speedups up to 25%.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 19 2020 at 14:02):

bnjbvr requested cfallin for a review on PR #2146.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 00:14):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 00:14):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 00:14):

cfallin created PR Review Comment:

Idle thought (not necessary for this PR): we should consolidate these tree-matching helpers among the backends (i.e. with aarch64) as we're doing for the ABI support; there's nothing machine-specific in the function below.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 00:14):

cfallin created PR Review Comment:

I think I'd prefer to lift the op != Opcode::GetPinnedReg condition out of this helper and to its callsite; otherwise it's an oddly specific predicate to have as a general-purpose helper, IMHO.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 00:14):

cfallin created PR Review Comment:

small_cst => small_constant for clarity?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 00:14):

cfallin created PR Review Comment:

code-style nit, but match swap_operands { FcmpOperands::Swap => ..., FcmpOperands::DontSwap => ... } would be a bit more idiomatic I think.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 00:14):

cfallin created PR Review Comment:

This helper works but will call ctx.get_input() multiple times; could we factor this a bit more so that we pass one LowerInput around? Maybe lowerinput_to_*() and then wrap these with input_to_*()?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 00:14):

cfallin created PR Review Comment:

match input_to_imm(...) { Some(shift_amt) if shift_amt <= 3 => Some(...), _ => None }

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 00:14):

cfallin created PR Review Comment:

I'm not sure exactly what the right factoring is, but there seems to be a lot of duplication across all the points at which an fcmp / icmp is lowered. I think it'd be much better to centralize the common bits (e.g. the match on FloatCC below) into a helper.

At a high level, it seems what you want is, for any input that's a bool, "lower this somehow to the machine's condition flags and give me one or more conditions that indicate true". This would allow you to factor the input side (icmp or fcmp) and output side (csel for int or float, cond branch) into separate pieces.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 00:14):

cfallin created PR Review Comment:

We should clarify an important rule/invariant here: hte target of the branch must also be a successor of the actual terminator (otherwise the CFG is incorrect, which would confuse the RA). Also perhaps we should note when this instruction might be used (sequence of multiple cond branches deriving form one comparison) and pointing to an example (fcmp lowering).

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 00:14):

cfallin created PR Review Comment:

Comment here for why the reg may not be virtual: we may directly return a machine register when we know that register holds the result of an opcode (by definition).

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 00:14):

cfallin created PR Review Comment:

Why only size == 1? Actually, I'm not sure I understand the need for any sort of extension: surely the output of a select, having the same type as the inputs, has the same property of "upper bits don't matter"? So we can always do a full-register cmove, no matter the type of value held by the register.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:48):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:49):

bnjbvr created PR Review Comment:

I'm ok to do this, but since the helper is unused in other contexts that don't want to check for get_pinned_reg, this would be over-generalizing the helper without any clear benefits. Should we instead move this helper to the machinst directory, and rename it has_lowering_side_effect or something like this instead?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:55):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:55):

bnjbvr created PR Review Comment:

Yeah, I've thought about this and realized that such an helper would have to return a condition code, or a conjunction (AND, for Equal), or disjunction (OR, for NotEqual) of those. And then, if you look at select, we can't handle properly the result for Equal, so we'd have to use something else. This is one of these cases where the code duplication seemed better than having overly complicated helpers that would have to shoehorn many edge cases. I'll try something...

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:57):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 15:57):

bnjbvr created PR Review Comment:

Because cmove can handle 16, 32 and 64-bits operands, but no 8-bits operands. But you're probably right about upper bits not mattering here.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 18:43):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 18:43):

cfallin created PR Review Comment:

Hmm, yeah, that sounds fine. I think the thing that stuck out the most to me was the the function name essentially had the same complexity as its body :-) "Lowering side effect" is much better.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 18:46):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 18:46):

cfallin created PR Review Comment:

Hmm, you would definitely have a better handle on this than I would -- it just seemed to want some factoring somewhere. I'm OK with landing this bit as-is, for what it's worth, if nothing easy occurs to us; we can always come back and refactor later.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 18:47):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 20 2020 at 18:47):

bjorn3 created PR Review Comment:

You can pretend that it is a 32bit int, as all bits above the size of the value are undefined, right? Also doing 32bit cmov for 16bit ints would safe a prefix byte.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 05:28):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 05:28):

julian-seward1 created PR Review Comment:

+1 for Chris' initial comment (produce a sequence of insns, the last one of which sets the condition codes somehow, and return the required test (Z, NZ, C, NB, etc) as the result of the lowering. However you refactor this, it might be helpful to have a design that makes it easy to take advantage of the fact that most integer insns set the condition codes, at least the Z and S flags, in some useful way. So for example then you could do the transformations

GCC and LLVM both do this kind of trickery, nowadays quite extensively.

That would seem to be easy(er) if the only thing the helper returns is the final condition code -- in particular, that the helper doesn't assume that the next instruction will be a cmp. Also [maybe obvious] once you have such a helper, you can use it also to generate the insns + test for a CLIF conditional move (select?) and simply emit a cmov<test> rather than j<test>.

Maybe having different helpers for conditions derived from FP and non-FP inputs might be clearer? The set of tricks you can do in the integer-input case seems quite different from the FP-input case.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 05:29):

julian-seward1 edited PR Review Comment.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 05:30):

julian-seward1 edited PR Review Comment.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 05:46):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 05:46):

julian-seward1 created PR Review Comment:

Not to mention it would avoid a potential partial register use stall. Basically 16-bit integer insns of any kind are bad. +1 for @bjorn3's proposal of doing the cmov at 32 bits even for the 8- and 16-bit sizes.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 05:51):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 05:51):

julian-seward1 created PR Review Comment:

Clearer might be lower_to_amode ?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 05:59):

julian-seward1 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 05:59):

julian-seward1 created PR Review Comment:

All of the above assuming of course that neither operand is in memory. If that is the case then we'd have to do it at the specified size else risk having unexpected traps at page boundaries.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 06:26):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 06:26):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 06:26):

bjorn3 created PR Review Comment:

Maybe also add the expected index of the return param used for input?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 09:27):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 09:27):

bnjbvr created PR Review Comment:

Good idea, added to todo list.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 09:51):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 09:51):

bnjbvr created PR Review Comment:

Good point, added a note!

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 09:52):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 09:52):

bnjbvr created PR Review Comment:

I'm not sure what do you mean by this?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 10:30):

bjorn3 submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 10:30):

bjorn3 created PR Review Comment:

You are currently ignoring the return param index part of inputs.inst. It would be better to check it against a value given by the caller and return None if they don't match.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 10:40):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 10:40):

bnjbvr created PR Review Comment:

Ah, great point; the aarch64 backend suffers from the same issue. I'll fix it when refactoring helpers.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 10:40):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 10:40):

bnjbvr created PR Review Comment:

Nice, it reduces the amount of code even for regular select.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 13:41):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 13:41):

bnjbvr created PR Review Comment:

Actually, there might be a bug somewhere else, because i'm seeing a Icmpif with two results, and this seems to cause failures. I wonder if sometimes a result is replaced and the result index isn't getting updated correctly. Saving this for another yak shaving episode.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 13:59):

bnjbvr submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 13:59):

bnjbvr created PR Review Comment:

Found a way to factor this out that's not too terrible, and also implemented it correctly for Trapff lowering (probably not tested through Spidermonkey). Thanks for insisting on this!

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 13:59):

bnjbvr updated PR #2146 from x64-isel to main:

This implements a few instruction selection optimizations for x64:

On embenchen in Spidermonkey, this is between a 3 to 7% speedup in compilation time, and i see generated code runtime speedups up to 25%.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 14:02):

bnjbvr requested cfallin for a review on PR #2146.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 21 2020 at 16:17):

cfallin submitted PR Review.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2020 at 14:21):

bnjbvr updated PR #2146 from x64-isel to main:

This implements a few instruction selection optimizations for x64:

On embenchen in Spidermonkey, this is between a 3 to 7% speedup in compilation time, and i see generated code runtime speedups up to 25%.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2020 at 14:21):

bnjbvr updated PR #2146 from x64-isel to main:

This implements a few instruction selection optimizations for x64:

On embenchen in Spidermonkey, this is between a 3 to 7% speedup in compilation time, and i see generated code runtime speedups up to 25%.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 24 2020 at 15:00):

bnjbvr merged PR #2146.


Last updated: Oct 23 2024 at 20:03 UTC