Wasmtime and multi-tenancy · wasmtime

Stream: wasmtime

Topic: Wasmtime and multi-tenancy

Tyler Rockwood (Apr 09 2024 at 16:33):

Does any one here run wasmtime in a multi-tenant fashion? I would love to hear about security precautions folks take. Do folks run multiple customers' code in the same process? Or is process/node isolation common?

Ashanti Mutinta (Apr 09 2024 at 20:07):

Same process is usually okay imo because the design of wasm itself provides enough security. You can be able to limit access to the system through things like preopend files and whitelist certain ips. I think bigger issue in this regard might be resource tracking which wasmtime provided enough support for

Tyler Rockwood (Apr 09 2024 at 20:21):

Yeah I'm aware, more thinking on the side of defense in depth and what others have committed to running in a production environment. For example in regards to Spectre attacks, I know Wasmtime has mitigations, but would be interested to see if others using Wasmtime are concerned about this and in a production setting are employing additional safeguards. Cloudflare has some interesting posts about using V8 in workers: https://developers.cloudflare.com/workers/reference/security-model/. Wondering if anyone else has anything to share about doing this same thing with Wasmtime.

Same process is usually okay
Usually okay is not a satisfying answer when it comes to security :smile:

Security model · Cloudflare Workers docs

This article includes an overview of Cloudflare security architecture, and then addresses two frequently asked about issues: V8 bugs and Spectre.

fitzgen (he/him) (Apr 10 2024 at 17:30):

I am not sure how much I can say specifically about the security details of what fastly does here

but in general:

if you are doing multi-tenancy, you are almost definitely doing some form of running multiple wasms in the same process. one of wasm's biggest strengths is that instantiation is way faster and cheaper than process creation (last I measured Wasmtime was at ~5us to instantiate a module on my old thinkpad)
you probably want to use the pooling allocator for multi-tenancy, it provides faster instantiation and allows you to do better capacity planning
you can configure various defense in depth measures, such as configuring guard pages before each linear memory in addition to after, so that if there is eg a codegen bug that treats offsets as signed rather than unsigned, malicious wasm can't use that to read the 2GiB of host memory just before its linear memory: https://docs.rs/wasmtime/latest/wasmtime/struct.Config.html#method.guard_before_linear_memory
configuring at least 4GiB of guard pages after the memory ensures that both (a) wasm memory accesses don't need explicit bounds checks and so it runs faster and (b) even if our code to access memory was buggy (eg we forgot an explicit bounds check that we should have otherwise emitted) the buggy access couldn't access anything but the guard page anyways (of course, in the face of serious miscompiles, all bets are off, that's why we invest so much in fuzzing and collaborating with researchers on formal verification!)

regarding spectre in particular, it is hard to say anything super concrete because spectre is so context/application specific and we're talking about probabilities, likelihoods, and leakage bit rates. I should also mention that I am not a spectre expert here, and there are other folks who know more than me, but I will do my best to share what I know/believe.

first, most people using wasmtime are doing the "disposable instance" paradigm where they create a new wasm instance per eg http request and then throw the instance away after the http request is handled. the wasm isn't generally running long enough to exfiltrate anything. so you'd have to do this across multiple wasm instances, but you can't guarantee you're getting the same memory/table slot each time so the subsequent instances aren't necessarily leaking you the same data you were originally accessing. this makes spectre attacks much harder in practice.

second, lets also think about how a spectre attack has to do things like prime the branch predictor with a bunch of the same branches (eg the same indirect call target repeatedly) and then do a different branch that it shouldn't be able to and that would otherwise trap (eg a call indirect to an out of bounds table index) but speculatively doesn't trap to get just a tiny bit of leaked information. that operation potentially gets them a tiny bit of information via speculative execution leakage through side channels, but also the operation traps and that kills the wasm program, so how do they even observe the data via the micro arch side channel because they've already been killed because the operation trapped? in JS for example, you can just catch an exception or you get an undefined result for an out-of-bounds array access or whatever but the important thing is that you can keep running. in wasm, your whole instance gets killed. so you'd have to instantiate another instance immediately after the first one got killed and have that read the speculated data via the micro arch side channel, but this seems really difficult and unlikely that you could time the eg http requests so precisely that you could observe these micro arch side effects. and you have to repeatedly do this kind of thing a ton of times to actually effectively mount an attack in practice. and, again, you have no guarantee that your subsequent instances are getting the same table/memory slot, or even the same host process in the case of a multi-process host, so you have to overcome that impediment to your leakage bit rate as well.

and then on top of all that, we have various spectre mitigations in place

finally, hosts can also do things like

disable access to timers or make them coarser grained to slow the leakage bit rate and generally make attacks more difficult (doubt that this kind of thing is really that effective tho: https://gruss.cc/files/fantastictimers.pdf)
monitor perf counters per guest and isolate or kill guests that seem like they are doing fishy stuff (eg an inordinate amount of branch mispredictions)
- a spin off of this technique that we want to implement in wasmtime itself eventually is to only start turning on heavier-weight spectre mitigations dynamically when it seems like they might actually be necessary: https://github.com/bytecodealliance/wasmtime/issues/8175
configure max_unused_warm_slots = 0 to effectively remove affinity between modules and memory slots, making it even more likely that subsequent instantiations of the same guest module end up with different memory slots, which hurts the attackers potential leakage bit rate, as discussed above

there's also some general security information in https://docs.wasmtime.dev/security.html and some information that is mostly focused on wasmtime's development practices (rather than defense in depth approaches for the host) in https://bytecodealliance.org/articles/security-and-correctness-in-wasmtime

Security and Correctness in Wasmtime

The essence of software engineering is making trade-offs, and sometimesengineers even trade away security for other priorities. When it comes torunning untrusted code from unknown sources, however, exceptionally strongsecurity is simply ...

Spectre mitigations: add a mode that monitors branch mispredictions and dynamically turns on fences · Issue #8175 · bytecodealliance/wasmtime

In discussion today with @fitzgen, @jameysharp, @elliottt and @lpereira, we were considering the idea to dynamically monitor branch mispredictions and isolate execution of any Wasm instance that ha...

Tyler Rockwood (Apr 10 2024 at 18:26):

Thank you @fitzgen (he/him) for all this, I am digesting this but it is a bunch of great information. Really appreciate your time writing this up! This gives me a great deal of information good information and additional potential safeguards (ie. we don't use the disposable instance paradigm, but could in our multi-tenant environment).

Chris Fallin (Apr 10 2024 at 20:50):

To add just a tiny bit to @fitzgen's excellent summary (I'm on eclipse PTO all week but this topic is catnip to me...):

The disposable-instance paradigm (one-instance-per-request) is not only great for Spectre but great for security in general: it guarantees (to the extent that one can, modulo unexpected system statefulness...) a level of isolation between server requests that is not normally present in a "workers"-style runtime. For example some FaaS platforms have observable persistent state in global variables / heap between requests and it's up to guest logic to control access to that properly. Using the system sandbox boundary for this very fine-grained separation lets us focus efforts on the sandbox strength then enjoy its benefits widely.
One thing you'll run into with disposable instances is lack of a chance to warmup (one user's "warmed-up state" is another user's "leaked state from previous requests"). This is why wizer exists and is also the impetus for some "AOT JS compilation" work I'm currently doing; in general, it's a deep design decision that ripples through the rest of the architecture.
The conventional wisdom is to separate trust domains with address spaces (aka processes, unless one controls the paging hardware directly!) and browsers have gone this way; Wasmtime explicitly doesn't, for efficiency/scalability; because we rely on software fault isolation (SFI) instead, we have a bigger TCB that includes the compiler itself (Cranelift). We have several parallel efforts going on to formally verify the parts that matter for bounds-checking; veri-isle (offline ahead-of-time verification of the compiler logic, nothing you need to turn on) and proof-carrying code, which would explicitly validate bounds-check logic in code after it's compiled. I intend to get back to the latter and push it into "on by default" status once I'm done with JS perf stuff!
Nick mentioned the lfence idea (8175) and I intend to drive this as well when I get a chance; the use of lfences to disable speculation entirely is well-established in Intel whitepapers on Spectre mitigations, and separately the idea of monitoring perf counters for signatures has been studied by researchers and is apparently in prod in some FaaS platforms, so it seems like a safe bet to rely on the combination of the two.

GitHub - bytecodealliance/wizer: The WebAssembly Pre-Initializer

The WebAssembly Pre-Initializer. Contribute to bytecodealliance/wizer development by creating an account on GitHub.

Verification of addressing / memory accesses with VeriWasm or similar · Issue #6090 · bytecodealliance/wasmtime

I've been working on VeriWasm in Wasmtime the last few weeks, and decided to dump a summary of my investigation so far and ideas for the right approach to take. (I'm doing this partly because I've ...

Tyler Rockwood (Apr 10 2024 at 21:41):

Thanks Chris for these notes as well! I've been following veri-isle and PCC from the grandstands :)

This is why wizer exists

I am aware of Wizer, but unfortunately for the binaries our customers write that has blow up the binary size too much for system (we have to cap binaries at 10MB, and some of the binaries we've seen are pushed over that limit with Wizer). We're also very latency sensitive, which is the current reasoning for keeping our VMs warm. But that is on the table to be relaxed in our multi-tenant environment.

and is also the impetus for some "AOT JS compilation"

I'm pretty interested in seeing where this goes, we are currently looking at QuickJS with precompiled bytecode, but SpiderMonkey does look appealing with all of these performance investments you've been making.

Ralph (Apr 11 2024 at 00:14):

speaking for Azure at Microsoft, we do not trust ourselves (let alone the community here that we think is great). So we invested in moving hypervisor technology forward in time. We built hyperlight, which creates new vms for wasm protection in roughly ~100 microseconds per request, enabling us to host things in hostile environments. YouTube - Microsoft Build 2023 Inside Azure Innovations - Hyperlight is a year old example.

Ralph (Apr 11 2024 at 00:15):

currently, we use wamr in hyperlight; we would like to host wasmtime as well, and that work is ongoing. Our objective here is to have microvms that support wasm components oob, but enable anyone to use hyperlight to create guest microvms for any workload.

Ralph (Apr 11 2024 at 00:18):

this quite literally has nothing to do with any particular vuln in any of the BCA runtimes! It has to do with the fact that humans make mistakes. So we had to move to a technology that made it possible to host a runtime per-request in its own vm, but at a speed that enables us to take advantage of wasm cold start speed AND the portability and language agnosticity of wasm components. specifically wasi:http and the other capabilities in wasi:cloud-core.

Ralph (Apr 11 2024 at 00:19):

the subtleties of what @Chris Fallin and @fitzgen (he/him) have laid out above are in addition to the protection hyperlight gives us. In fact, some of those features we can integrate in interesting ways once we bring wasmtime into the guest work.

Ralph (Apr 11 2024 at 00:20):

the wamr team assures us that we'll get component support late this calendar year or early next, so we feel good there. Otherwise, we'll invest in the wasmtime no-deps work with wasmtime-min that has recently appeared. Components are our objective there.

Ralph (Apr 11 2024 at 00:25):

we intend to make this publicly available as a technology some time this summer; we were delayed in doing so by some suggestions of design changes that bring components closer to us than we had originally imagined. So at the moment we're thinking of publishing this work as oss in later summer.

Ralph (Apr 11 2024 at 00:29):

finally: I personally expect that we'll get to a point in the future where wasmtime or wamr or other runtimes ARE trusted as much as the current versions of hypervisors are; but we're not there yet.

Tyler Rockwood (Apr 11 2024 at 14:48):

Thanks for sharing Ralph! That is indeed very interesting and amazing - would love to see it when it's open sourced.

Ralph (Apr 11 2024 at 16:38):

there will definitely be an announcement here when we do. It's built to protect wasm components, so this forum would be one of the first to hear it.

Last updated: Apr 08 2025 at 22:03 UTC