Does any one here run wasmtime in a multi-tenant fashion? I would love to hear about security precautions folks take. Do folks run multiple customers' code in the same process? Or is process/node isolation common?
Same process is usually okay imo because the design of wasm itself provides enough security. You can be able to limit access to the system through things like preopend files and whitelist certain ips. I think bigger issue in this regard might be resource tracking which wasmtime provided enough support for
Yeah I'm aware, more thinking on the side of defense in depth and what others have committed to running in a production environment. For example in regards to Spectre attacks, I know Wasmtime has mitigations, but would be interested to see if others using Wasmtime are concerned about this and in a production setting are employing additional safeguards. Cloudflare has some interesting posts about using V8 in workers: https://developers.cloudflare.com/workers/reference/security-model/. Wondering if anyone else has anything to share about doing this same thing with Wasmtime.
Same process is usually okay
Usually okay is not a satisfying answer when it comes to security :smile:
I am not sure how much I can say specifically about the security details of what fastly does here
but in general:
regarding spectre in particular, it is hard to say anything super concrete because spectre is so context/application specific and we're talking about probabilities, likelihoods, and leakage bit rates. I should also mention that I am not a spectre expert here, and there are other folks who know more than me, but I will do my best to share what I know/believe.
first, most people using wasmtime are doing the "disposable instance" paradigm where they create a new wasm instance per eg http request and then throw the instance away after the http request is handled. the wasm isn't generally running long enough to exfiltrate anything. so you'd have to do this across multiple wasm instances, but you can't guarantee you're getting the same memory/table slot each time so the subsequent instances aren't necessarily leaking you the same data you were originally accessing. this makes spectre attacks much harder in practice.
second, lets also think about how a spectre attack has to do things like prime the branch predictor with a bunch of the same branches (eg the same indirect call target repeatedly) and then do a different branch that it shouldn't be able to and that would otherwise trap (eg a call indirect to an out of bounds table index) but speculatively doesn't trap to get just a tiny bit of leaked information. that operation potentially gets them a tiny bit of information via speculative execution leakage through side channels, but also the operation traps and that kills the wasm program, so how do they even observe the data via the micro arch side channel because they've already been killed because the operation trapped? in JS for example, you can just catch an exception or you get an undefined
result for an out-of-bounds array access or whatever but the important thing is that you can keep running. in wasm, your whole instance gets killed. so you'd have to instantiate another instance immediately after the first one got killed and have that read the speculated data via the micro arch side channel, but this seems really difficult and unlikely that you could time the eg http requests so precisely that you could observe these micro arch side effects. and you have to repeatedly do this kind of thing a ton of times to actually effectively mount an attack in practice. and, again, you have no guarantee that your subsequent instances are getting the same table/memory slot, or even the same host process in the case of a multi-process host, so you have to overcome that impediment to your leakage bit rate as well.
and then on top of all that, we have various spectre mitigations in place
finally, hosts can also do things like
max_unused_warm_slots = 0
to effectively remove affinity between modules and memory slots, making it even more likely that subsequent instantiations of the same guest module end up with different memory slots, which hurts the attackers potential leakage bit rate, as discussed abovethere's also some general security information in https://docs.wasmtime.dev/security.html and some information that is mostly focused on wasmtime's development practices (rather than defense in depth approaches for the host) in https://bytecodealliance.org/articles/security-and-correctness-in-wasmtime
Thank you @fitzgen (he/him) for all this, I am digesting this but it is a bunch of great information. Really appreciate your time writing this up! This gives me a great deal of information good information and additional potential safeguards (ie. we don't use the disposable instance paradigm, but could in our multi-tenant environment).
To add just a tiny bit to @fitzgen's excellent summary (I'm on eclipse PTO all week but this topic is catnip to me...):
Thanks Chris for these notes as well! I've been following veri-isle and PCC from the grandstands :)
This is why wizer exists
I am aware of Wizer, but unfortunately for the binaries our customers write that has blow up the binary size too much for system (we have to cap binaries at 10MB, and some of the binaries we've seen are pushed over that limit with Wizer). We're also very latency sensitive, which is the current reasoning for keeping our VMs warm. But that is on the table to be relaxed in our multi-tenant environment.
and is also the impetus for some "AOT JS compilation"
I'm pretty interested in seeing where this goes, we are currently looking at QuickJS with precompiled bytecode, but SpiderMonkey does look appealing with all of these performance investments you've been making.
speaking for Azure at Microsoft, we do not trust ourselves (let alone the community here that we think is great). So we invested in moving hypervisor technology forward in time. We built hyperlight, which creates new vms for wasm protection in roughly ~100 microseconds per request, enabling us to host things in hostile environments. YouTube - Microsoft Build 2023 Inside Azure Innovations - Hyperlight is a year old example.
currently, we use wamr in hyperlight; we would like to host wasmtime as well, and that work is ongoing. Our objective here is to have microvms that support wasm components oob, but enable anyone to use hyperlight to create guest microvms for any workload.
this quite literally has nothing to do with any particular vuln in any of the BCA runtimes! It has to do with the fact that humans make mistakes. So we had to move to a technology that made it possible to host a runtime per-request in its own vm, but at a speed that enables us to take advantage of wasm cold start speed AND the portability and language agnosticity of wasm components. specifically wasi:http and the other capabilities in wasi:cloud-core.
the subtleties of what @Chris Fallin and @fitzgen (he/him) have laid out above are in addition to the protection hyperlight gives us. In fact, some of those features we can integrate in interesting ways once we bring wasmtime into the guest work.
the wamr team assures us that we'll get component support late this calendar year or early next, so we feel good there. Otherwise, we'll invest in the wasmtime no-deps work with wasmtime-min that has recently appeared. Components are our objective there.
we intend to make this publicly available as a technology some time this summer; we were delayed in doing so by some suggestions of design changes that bring components closer to us than we had originally imagined. So at the moment we're thinking of publishing this work as oss in later summer.
finally: I personally expect that we'll get to a point in the future where wasmtime or wamr or other runtimes ARE trusted as much as the current versions of hypervisors are; but we're not there yet.
Thanks for sharing Ralph! That is indeed very interesting and amazing - would love to see it when it's open sourced.
there will definitely be an announcement here when we do. It's built to protect wasm components, so this forum would be one of the first to hear it.
Last updated: Nov 22 2024 at 16:03 UTC