wasmtime / PR #12006 Cranelift: Tweak cost function to pe... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #12006 Cranelift: Tweak cost function to pe...

Wasmtime GitHub notifications bot (Nov 07 2025 at 18:08):

fitzgen opened PR #12006 from fitzgen:dont-select-select to bytecodealliance:main:

selects cannot be speculated through on some of our targets (e.g. x64) so strongly prefer not choosing them.

Using a target-specific cost function, so that we only pessimize select when it makes sense, is left for follow up work and is tracked in #12005.

Wasmtime GitHub notifications bot (Nov 07 2025 at 18:08):

fitzgen requested alexcrichton for a review on PR #12006.

Wasmtime GitHub notifications bot (Nov 07 2025 at 18:08):

fitzgen requested wasmtime-compiler-reviewers for a review on PR #12006.

Wasmtime GitHub notifications bot (Nov 07 2025 at 18:10):

fitzgen edited PR #12006:

selects cannot be speculated through on some of our targets (e.g. x64) so strongly prefer not choosing them.

Using a target-specific cost function, so that we only pessimize select when it makes sense for the target, is left for follow up work and is tracked in #12005. FWIW, no CPUs on the market do value speculation today, as far as I am aware, so this particular case is somewhat hypothetical (but the larger goal of target-specific cost functions would still be useful).

Wasmtime GitHub notifications bot (Nov 07 2025 at 18:15):

cfallin commented on PR #12006:

selects cannot be speculated through on some of our targets (e.g. x64) so strongly prefer not choosing them.

FWIW, I think this is losing a little nuance and the reality doesn't merit a cost of 50 (!). In more detail, select (cmove) works like any other multi-input operator on modern out-of-order CPUs; the main difference is that it has three inputs (flags/condition, source for the CMOVcc, old register value for the CMOVcc). When all three inputs are ready, the instruction will execute.

When folks say that "select isn't speculated through" what they mean is that the CPU doesn't have a condition predictor (like the branch predictor) that will allow the instruction to speculatively go without the condition input ready. It doesn't mean, however, that the instruction blocks all speculation or serves as a pipeline barrier/flush. That would be far worse!

Supporting source: Agner Fog's instruction latency tables show on a reasonable x86-64 baseline (Skylake, circa 2015), CMOVcc reg/reg form is 1 uop and has a latency of 1 cycle and a reciprocal throughput of 0.5 (so two CMOVccs can complete per cycle).

Wasmtime GitHub notifications bot (Nov 07 2025 at 18:20):

alexcrichton requested cfallin for a review on PR #12006.

Wasmtime GitHub notifications bot (Nov 07 2025 at 18:48):

fitzgen updated PR #12006.

Wasmtime GitHub notifications bot (Nov 07 2025 at 18:51):

fitzgen commented on PR #12006:

@cfallin I was indeed not thinking in a nuanced way about this, thanks for the clarifications/reality check.

I removed the select special case, so it will hit the default cost, which will be slightly greater than the cost of adds/etc.

I left imul at cost 10 though.

Last updated: Feb 24 2026 at 05:28 UTC