I have a question related to the static analysis of components in the WebAssembly component model.
Let's say I have a wasm component (#1) with an export A.
I have another component #2 that imports A and B.
Now I compose a third component out of #1 and #2, and therefore it only imports B (since the A import is internally satisfied).
Can someone looking at component #3 tell that it contains functionality from A? Is it guaranteed that the wit file extracted from component #3 will reveal the presence of A? Or could that information be obfuscated?
The exports of component 3 won't inherently reveal anything about components 1 or 2. The actual encoded bytes of component 3 will (with existing tooling) reveal everything about the composition.
You could conceptually have a tool that "compiles" the composition of 1 and 2 into a core module that behaves the same as the composition, but I'm not aware of anyone doing that (and arguably that wouldn't be "composing" in the component model sense any more).
Thanks for the reply, Lann. Consider a component that by its imports and exports appears only to do screen rendering, but in fact it also "secretly" makes HTTP requests to a server? I know if I parse the bytecode or monitor the network I may discover the functionality, but if I understand your answer, then I can't necessarily "trust" the declared imports and exports tell the full story.
If this is true, I don't really understand the claims I've heard about static analyzability in the component model, for example in some of the recent WasmCon talks. I suppose if I'm the author of all of the components then I can statically analyze the final product, but I can't really statically analyze a composition if it uses any registry components.
James Mart said:
Consider a component that by its imports and exports appears only to do screen rendering, but in fact it also "secretly" makes HTTP requests to a server? I know if I parse the bytecode or monitor the network I may discover the functionality, but if I understand your answer, then I can't necessarily "trust" the declared imports and exports tell the full story.
No, that is absolutely not a thing a component can do. Regardless of how it is structured (or obfuscated) internally, the only way it can perform any kind of IO is through its imports and exports.
I might be mistaken, but I think the source of confusion might have been what Lann said about hypothetically converting a composed component into a core Wasm module. At that point any guarantees about internal component structure would be destroyed, but the resulting core Wasm module still couldn't just decide to interact with the outside world arbitrarily; it can still only call the outside world via its imports and be called from the outside world via its exports.
I think there's another thing here: the use of the word "secret" implies some wasm module that can't be understood. So far as I can tell, there's no such thing as an opaque module -- it's a spec, after all, so it can be examined internally easily enough.
Where @James Mart says, ts imports and exports appears only to do screen rendering, but in fact it also "secretly" makes HTTP requests to a server
-- another way to say this is that the wams runtime controls any interaction between the module inside the sandbox and the outside world. As a result, with a compliant wasm runtime, that "secret" http call will fail miserably because it's not a declared export it can call.
I'm still, after this thread, not really clear about your objective, @James Mart, or is it merely curiosity? You say earlier can't necessarily "trust" the declared imports and exports tell the full story.
That seems to imply you're trying to say you are not convinced that the inner code can call the http endpoint secretly. And the answer to that is yes, you can, because the runtime will not permit any outbound calls from any module without permitting it explicitly. Any such call with bonk. It doesn't matter whether you've parsed the actual wasm or merely the exports: without a) the export declared and b) the runtime permitting the use of that call the network request will bonk.
if you GIVE modules full OS-style permissions AND if they therefore invoke an host export http request function AND you didn't scan the module beforehand then you might be surprised to find the "secret" call will work.
do I understand things correctly? This is, by the way, precisely the same guarantee that core wasm gives you: that you can't make calls across the sandbox boundary without the host runtime giving that permission, whether your exported http api is WIT or whether it's a custom declared api (or js bindings for that matter).
the module or component cannot, of itself, make that outbound call actually work. Only the runtime can permit that, and only for the specific exported api it presents to the module to call.
Someone correct me here, because it's a fairly important set of points, if I'm misunderstanding something.
Typically, this is the https://www.ibm.com/topics/log4j scenario, in which a dependency of a dependency used log4j without the jdni fix, enabling an inner depedency to maliciously execute remote code inside an environment's secure boundary.
this just won't work unless you give that component permission to do this.
is that the kind of situation you're thinking about?
@James Mart I think I understand your question better now. A more-concrete example may help:
Component A imports a "database" interface.
All interaction with the "real world" must ultimately happen through the outermost component's imports and exports. You might not be able to tell that the composed components are interacting with a database (as in scenarios 2 and 3), but you can see all of the ways it is permitted to interact with the outside world.
and more, even without that, you can examine the core wasm in or linked by the component itself and examine what each one tries to do. Right now that's hard but all the tools exist; I'm quite sure that we'll have this tooling very soon that will be mostly point-and-click.
One of the very neat things about the component model is that you can accidentally ship log4j and without complete OS-style exports to use, absolutely nothing will happen.
@Lann Martin , just to close your examples, in Scenario 3 you can only SEE the networking import, but without the host's database export support, component c's database usage will fail. Yes?
you can't "see" it (without looking at the assembly codepath itself, which you can do) but the inner also can't USE it without the host's permission/export implementation.
Ah, to clarify (can't edit :rolling_eyes:): in scenario 3, component C is an "adapter": its database export is used to fulfill component A's import.
we do need Zulip to support Mermaid diagrams here, it would help
:-)
The two points are:
and 4. off-by-two errors
Thanks for the replies, everyone.
@Ralph I know the internal structure can be examined with various tooling so nothing is ultimately "secret." But in terms of being able to trustlessly execute third-party components, I don't want to have to examine the bytecode of each to ensure they aren't misbehaving. That was the perspective I was coming from.
But I understand better now, thanks to all of your replies. What I'm realizing is that, even if you create a composition of multiple internal components, you can't embed IO (like an HTTP request) into a module in such a way that it hides that capability from the final components imports (the way that you can embed a non-persistent in-memory database as in @Lann Martin's Scenario 2 above). Ultimately, IO is definitionally an interaction with the outside world and would therefore need to be imported by the final component.
Yup!
James Mart has marked this topic as resolved.
Last updated: Nov 22 2024 at 17:03 UTC