tlb shootdowns (followup from discussion in weekly) · wasmtime

Stream: wasmtime

Topic: tlb shootdowns (followup from discussion in weekly)

Chris Fallin (Feb 28 2024 at 17:24):

@Andrew Brown @Alex Crichton -- following up on the discussion of TLB shootdowns and a new Intel ISA feature to do this without IPIs -- for reference, here's the single instruction in AArch64 that does this (from the macOS kernel): https://github.com/apple/darwin-xnu/blob/2ff845c2e033bd0ff64b5b6aa6063a1f8f65aa32/osfmk/arm64/tlb.h#L207

Chris Fallin (Feb 28 2024 at 17:25):

AFAICT, this is standardized Arm, too, not an Apple extension

Alex Crichton (Feb 28 2024 at 17:33):

oh nice, I'm trying to profile locally if I can get good scaling but I'm seeing it level off around 4 cores (ish) again, sort of hard to test though b/c there's no taskset on macos

Alex Crichton (Feb 28 2024 at 17:33):

I know though that on aarch64 linux I don't see great scaling

Chris Fallin (Feb 28 2024 at 17:34):

I seem to remember trying this once on aarch64 and finding good scaling (or at least, not seeing IPIs); I think aarch64 linux uses that instruction too; but maybe it's a more recent ISA level?

Alex Crichton (Feb 28 2024 at 17:38):

on the aarch64 ba server in a perf profile I see __flush_tlb_range at the top and a nop instruction after tlbi vale1is, x1 has some samples

Alex Crichton (Feb 28 2024 at 17:39):

the hottest instruction in that function though is a dsb ish

Chris Fallin (Feb 28 2024 at 17:40):

ah, so that's probably a fence to give synchronous semantics (other threads observe new as soon as this syscall returns); still expensive on some cores, darn

L. Pereira (Feb 28 2024 at 22:13):

yeah, it waits (data sync barrier) for every pending operation related to cache, tlb, branch predictors, and that kind of stuff. "ish" means it's shareable between cores within the same processor but not necessarily other devices in the same soc

Last updated: Apr 09 2025 at 11:03 UTC