alexcrichton opened issue #12789:
I have vague recollections of discussing this in the past but I don't see a dedicated issue for this. Currently in the aarch64 backend the
JTSequenceinstruction, a jump table whichbr_tablein wasm uses, includes acsdbinstruction for spectre mitigations and preventing speculation. The introduction of this in https://github.com/bytecodealliance/wasmtime/pull/4555 ran some benchmarks and found this to have little impact, but I've been made aware locally that this can have a much larger impact on macOS. IIRC this is macOS specific, but I forget.To reproduce this I was using a
coremark.wasm(such as this one) and I found that it prints a score of ~15k by default withwasmtime. I commented out thecsdb, re-built wasmtime, ad the score jumped up to ~38k. Effectively, this instruction definitely has a noticable cost on at least macOS.Do others remember any historical discussion we've had about this? Is this a macOS "bug" fixed in some future version of macOS silicon? Is this something fundamental that we stand by? (in comparison v8 performs over 2x better than Wasmtime on this same benchmark, presumably because it doesn't use
csdbbut I can't easily find out, but that'd be a least one data point).
alexcrichton added the cranelift:area:aarch64 label to Issue #12789.
alexcrichton commented on issue #12789:
Another data point: that same repository hosting
coremark.wasmcontains alua.wasmwith a few assorted "benchmarks" where Wasmtime/Cranelift are 6x slower than v8, and one smoke test locally shows that this is explained bycsdbas well. My branch of Wasmtime withcsdbremoved performs on-par with v8, where the stock CLI I can reproduce the original results.
cfallin commented on issue #12789:
There's an extensive Zulip thread here about this very topic! I was also chasing this down at the time and hoping to be able to justify removing
csdb.Unfortunately it seems that no one who really knows if it's necessary (as in "actually necessary on known microarchitectures") can comment straightforwardly, and the architecture docs say that we must do it if we want to be Spectre-safe. I was trying to argue from a place of "I'm not aware of value speculation that would actually cause issues in our
br_tableimplementation" but reading between the lines, it seems that wasn't fully supported.It's pretty unfortunate that others can do benchmarking and show huge gains relative to Wasmtime+Cranelift on the basis of our Spectre safety mitigations that the arch docs say we must have!
IIRC, one workaround discussed at the time was that
wasmtime-clicould potentially disable Spectre mitigations, on the basis that only one instance was running (or, I guess, only do so for a single-instance component). Or alternately, we could add a-C i-like-to-live-dangerously-and-spectre-cannot-hurt-me=trueoption.
alexcrichton commented on issue #12789:
I posted #12798 to at least provide the option to test this out locally disabled.
I also feel like we've talked about this before, but do we feel that we have a bullet-proof-enough spectre story that it's worth taking a 2x performance hit, by default, on a popular platform that people more-often-than-not benchmark on? I realize that's a bit of a loaded question, but I'm hesitant to have hand-wavy reasons that none of us understand to take such a large performance hit.
cfallin commented on issue #12789:
Yeah, I'm with you on this one honestly: surveys of peer engines, and various benchmarking, have shown that we're alone on this one and I think we should consider changing the default to avoid such a large unilateral penalty.
Taking the view "what if we didn't have this mitigation today and someone proposed it", I would have asked for benchmarks, I would see the 2x penalty, and I would require extraordinary evidence that this is actually necessary to mitigate a real security issue. We don't have that, only handwavy docs-say-we-should-do-this, and I don't think I would have seen that as sufficient for on-by-default.
So I guess I'm saying: I approved #12798 and I'm happy to approve another PR that flips the default if you want. We could also discuss further at the next Wasmtime meeting if you think that's needed.
cfallin closed issue #12789:
I have vague recollections of discussing this in the past but I don't see a dedicated issue for this. Currently in the aarch64 backend the
JTSequenceinstruction, a jump table whichbr_tablein wasm uses, includes acsdbinstruction for spectre mitigations and preventing speculation. The introduction of this in https://github.com/bytecodealliance/wasmtime/pull/4555 ran some benchmarks and found this to have little impact, but I've been made aware locally that this can have a much larger impact on macOS. IIRC this is macOS specific, but I forget.To reproduce this I was using a
coremark.wasm(such as this one) and I found that it prints a score of ~15k by default withwasmtime. I commented out thecsdb, re-built wasmtime, ad the score jumped up to ~38k. Effectively, this instruction definitely has a noticable cost on at least macOS.Do others remember any historical discussion we've had about this? Is this a macOS "bug" fixed in some future version of macOS silicon? Is this something fundamental that we stand by? (in comparison v8 performs over 2x better than Wasmtime on this same benchmark, presumably because it doesn't use
csdbbut I can't easily find out, but that'd be a least one data point).
cfallin reopened issue #12789:
I have vague recollections of discussing this in the past but I don't see a dedicated issue for this. Currently in the aarch64 backend the
JTSequenceinstruction, a jump table whichbr_tablein wasm uses, includes acsdbinstruction for spectre mitigations and preventing speculation. The introduction of this in https://github.com/bytecodealliance/wasmtime/pull/4555 ran some benchmarks and found this to have little impact, but I've been made aware locally that this can have a much larger impact on macOS. IIRC this is macOS specific, but I forget.To reproduce this I was using a
coremark.wasm(such as this one) and I found that it prints a score of ~15k by default withwasmtime. I commented out thecsdb, re-built wasmtime, ad the score jumped up to ~38k. Effectively, this instruction definitely has a noticable cost on at least macOS.Do others remember any historical discussion we've had about this? Is this a macOS "bug" fixed in some future version of macOS silicon? Is this something fundamental that we stand by? (in comparison v8 performs over 2x better than Wasmtime on this same benchmark, presumably because it doesn't use
csdbbut I can't easily find out, but that'd be a least one data point).
tschneidereit commented on issue #12789:
I agree with this: we wouldn't be likely to accept this as a regression enabled by default.
If we do end up switching the default, we should make sure to highlight that fact very prominently in the change log at least, though. Alternatively, we could change the default for the cli and not set a default for embedders, forcing them to make a choice on this.
And relatedly, should that choice be about this particular mitigation, or about all of them, with the ability to be more fine-grained if desired, but the obvious path being to either enable or disable all of them?
Last updated: Mar 23 2026 at 16:19 UTC