We have a custom processor that we use deep inside most of our ASICs. It is dual stack based with 3 auto-increment address registers. Somewhat reminiscent of, I hate to say it, Forth. We have a new ASIC close to fabrication that has 256 of these processors and we are looking for a better software solution. It seems Webassembly might be the answer. We are looking for insight on how to do this. Comments please
One negative to Wasm for an use-case like this is that its instruction set is not quite minimal (so you end up having to have implementations for instructions like popcnt
, as compilers generally won’t provide a way to avoid them) and also because wasm is not really wasm without all of the runtime bits around it (e,g, entities like memories)
In general adding custom instructions to our architecture is what we do. Indeed we already have popcount and adding clz and clt are trivial. Embedded systems generally have limited and fixed memories so that might be a bit of a challenge, but you never know until you try.
Does anyone know if this has been done successfully before? We might learn from their agony
You could certainly use Wasm as part of a compiler pipeline; if the machine is Turing-complete and has enough memory, then it should be possible to AOT-compile Wasm bytecode to your custom ISA. Two things though:
any resemblance between Wasm's stack and a true hardware operand stack is IMHO at best an interesting parallel, and shouldn't be leveraged; the semantics with block exits and around unreachable code, as well as the multivalue extension, are weird enough that you probably want a full semantic transformation rather than a direct stack-code-to-stack-code rewrite.
the real meat of this question probably revolves around engineering cost and timeline details, details of your custom ISA, details of your application domain and performance requirements, and hundreds of other details that no one here is really able to evaluate properly, as I suspect you already know; it's sort of a quantitative scale between "sure technically possible but you shouldn't try to fit this into that" and "excellent fit". I personally imagine Wasm as in the same tier as sort of an idealized 32-bit RISC (modulo the stackiness), with all the operations you'd expect (full FPU, 32 and 64 bit values, etc) with a minimum of 64k data memory up to O(megabytes), and existing toolchains will happily generate code that uses a few megabytes, so evaluate what you will with that
Just as I thought, WASM's stackiness is very primitive so its only a handy way to describe things.
The real issue for us is not to run general applications but to be able to run high-level code that is targeted for our embedded device. We see the security as an asset to our systems. Memory is an issue as 64k is not far off maximum.
Should we just look to Rust?
Wasm has a pretty specific security guarantee: it gives a strong sandbox boundary. Is that what your use-case needs (e.g. downloading and running untrusted code)?
I ask because at least in the embedded realm, security could mean “against malicious firmware” or it could mean “against memory unsafety vulnerabilities”; for the latter note that within the sandbox the Wasm code can still corrupt its heap if written in an unsafe language, that would just be contained to the sandbox is all. Writing the Wasm module itself in rust protects against that. They’re really orthogonal dimensions.
It’d be useful to know your threat model. Is all the software within the Wasm module or is there something beneath it that the sandbox guards?
Our ASICs are secure in that they use PUFs for security keys etc., and our code and data within the ASIC needs to be secure BUT we need to allow 3rd party applications to run as well. So assuming that the application to be downloaded is from a secure source we are still vulnerable to 'ourselves and our customers', be they malicious or not. So we are thinking about only accepting WASM as input but we still seem to have security holes. We currently have a supervisor processor that receives incoming encrypted apps into a sandbox. We decrypt it and signature check it prior to loading it into a code memory anywhere on the ASIC. Even that may not be enough. We are certainly memory constrained but could we leave the code in the sandbox and run a lightweight interpreter (or something) over it to ensure that the code cannot sneek out of its bounds. The sandbox should then have been converted to our ISA and moved to code memory.
I guess we don't care if a 3rd party app corrupts its own heap as long as we can detect it, trap it, delete it or something.
It seems that if we start with a simple WASM interpreter that takes 3rd party apps and runs them regardless of optimization then we at least have another layer of security.
We are not afraid of stack machines. They can be made to run faster than RISC machines and have superscalar capabilities. Our current pipelines are running at over 10GHz, so a few extra cycles for a client app is no big deal!
We can take a leaf out of Elon's book and start simple and optimize.
Last updated: Dec 23 2024 at 12:05 UTC