CM question: transfer the canonical ABI data to Wasm stack · general

Hello, I have a question regarding how runtime transfer the canonical ABI data to Wasm stack when handling a call into the canon-lift function. When a component is loaded by the runtime, the runtime can have the IDL info from component, so it can understand the canonical ABI data format according to the spec. Then it needs to transfer the ABI data into the Wasm stack for calling the associated core function. Since the core function type is also given from the core module. The runtime must have some knowledge about how to map the canonical ABI data to the core function arguments. That implies the core function type must be predefined (even standardized) rather than variant? If not, then how the runtime can finish this step? Currently wit-bindgen creates the core function with its type and it looks it generates different type of core functions between C/C++ and Java. I appreciate that anyone can help me to explain it. thanks!

Dan Gohman (Aug 24 2023 at 02:58):

Hosts can have a fixed world, or set of worlds, that they support. That way, they can have statically-generated host bindings for the worlds they'll support.

Dan Gohman (Aug 24 2023 at 03:02):

The core-wasm code on the inside of a component may have different types, and the inner core types may depend on the programming language. Source languages can use the options on the canon lift and canon lower to select different behaviors.

Wang Xin (Aug 24 2023 at 03:54):

@Dan Gohman Thank you! Host side only needs to get the data from the canonic ABI defined data format, and process the data in any host side native function and return the results in canonical ABI format, right? However my question is about how runtime handles the Wasm side canon life call. Before the runtime calls into the inner core function, how it can map the canonical ABI data to the Wasm stack? I assume the runtime has the responsibility to prepare the wasm stack for calling the inner core function, right?

Luke Wagner (Aug 24 2023 at 16:50):

Great questions, thanks for asking. From a host perspective, once I have fixed the world that I am supporting, the Canonical ABI defines a fixed core function type for each import and export in the world as well as the runtime contract for how data is copied into and out of linear memory. Thus, with a fixed world, the host can mostly just statically follow these rules for each import/export. There is a little runtime dynamism, though, that requires reflecting on the component's internal canon lift and canon lower definitions to see:

The host needs to extract these bits of information from the component binary at load time, record it in metadata associated with the component, and then use this metadata at runtime to do what the component said to do.

Overall, a host implementation for a fixed world is mostly similar to a host for a fixed core module signature, just with a few additional points of component-determined dynamic behavior. I hope that helps, happy to discuss further!

Wang Xin (Aug 24 2023 at 23:37):

@Luke Wagner Thank you very much! For the "fixed core function type", it seems the proposal doesn't specify how to determin the exact typel. Currently we relies on how wit-bindgen generates the core function to assume the rule of core function type for a given IDL function. If a runtime uses the rule observed from wit-bindgen output, is it safe?

Luke Wagner (Aug 25 2023 at 00:34):

@Wang Xin Great question! It's a bit buried in the middle of CanonicalABI.md (see flatten_functype in this section), but the Component Model does indeed define this mapping. wit-bindgen should conform to flatten_functype, so it's valid to use wit-bindgen to derive the core function type as a reference. Alternatively, you could run flatten_functype in canonical-abi/definitions.py directly (it's tested here).

lum1n0us (Aug 25 2023 at 01:03):

We've observed that for one function in WIT , bind-gen will generate different function signatures for C and Java which leads to different core functype.

interface types {
  record coord {
    x: u32,
    y: u32,
  }

  record monster {
    name: string,
    hp: u32,
    pos: coord,
    elite: bool
  }

  type error = u32
}

world caller {
  use types.{coord, error, monster}

  import get-positions: func(m: list<monster>) -> list<coord>
  export run: func()
}

world callee {
  use types.{coord, error, monster}

  export get-positions: func(m: list<monster>) -> list<coord>
}

__attribute__((__import_module__("$root"), __import_name__("get-positions")))
void __wasm_import_caller_get_positions(int32_t, int32_t, int32_t);

@Export(name = "get-positions")
private static int wasmExportGetPositions(int p0, int p1) { }

That actually brings two different oore.functype, (import "get-positions" (func (param i32 i32 i32)) and (export "get-positions" (func (param i32 i32) (result i32)), targets the same component type get-positions: func(m: list<monster>) -> list<coord>.

We are not sure that is an issue of "bind-gen" or the flexible for "core module" ?

Dan Gohman (Aug 25 2023 at 01:14):

Those caller and callee ABIs aren't directly linked to each other. These are the ABIs that toolchains use to talk to the canonical ABI on the inside of a component. Effectively, these are the ABIs that are used to talk to the canon lift and canon lower constructs, which allows linkers to generate adapters that translate from one side to the other.

lum1n0us (Aug 25 2023 at 02:17):

Yes. From the angle of "aren't directly linked to each other", it is understandable. But if from the angle of "the Canonical ABI defines a fixed core function type for each import and export in a fixed world", how to think about it ?

Dan Gohman (Aug 25 2023 at 13:43):

Luke Wagner (Aug 25 2023 at 18:42):

@lum1n0us I think the way to think of it is that, from a host perspective, you are always running a single root component whose imports are supplied by the host and whose exports are called by the host. Multi-component scenarios work by linking N components into a single root component (that encapsulates the N components, describing how they are linked together).

If the root component only contains a single core module, then the core module's imports and exports are derived via the Canonical ABI from the world as discussed above. (Dan makes a good point that I forgot to mention earlier which is that the Canonical ABI produces different core function types for the same Wit-level function type depending on whether the function type is imported vs exported; that is the context parameter to flatten_functype.)

If instead the root component contains multiple nested components or modules, it is possible to "fuse" them all together to produce a single core module (using multi-memory) whose imports and exports are defined by the Canonical ABI (just like the single-module case). It is this "fusion" process that will synthesize core function "adapters" that sit in-between one component's core exports and another component's core imports, doing the copying between, and that is why import and export core function types don't have to be the exact same -- because when two components are linked together, there is always a generated adapter function that sits in-between.

In any case, at runtime, the host can always treat the running component as a single (possibly fused) core module. It is also possible, to enable better code-sharing between components and to preserve offsets in custom sections, to not fuse a component into a single core module but instead produce a list of core modules that are linked together according to some engine-internal metadata produced at compile-time (iirc, Wasmtime does this).

Wang Xin (Aug 27 2023 at 02:55):

Hi @Luke Wagner I would like to confirm that the proposal didn't specifiy the in-memory data structure for the canonical ABI. For example, in lift_flat_record, the python code only defines a python dict type for the record:
def lift_flat_record(cx, vi, fields):
record = {}
for f in fields:
record[f.label] = lift_flat(cx, vi, f.t)
return record
Since it is the same runtime that takes care of the serialization and de-serialization (even it is possible just runtime internal data structure), it seems no problem to leave the in-memory data structure to runtime implementation. Is my understanding right?

Luke Wagner (Aug 27 2023 at 03:02):

@Wang Xin Yes, that's right, the lifted values are abstract and their representation is not exposed to wasm. Moreover, the spec ensures that when passing values between components, the lifted values don't have to be materialized; the fused adapter can copy directly from one linear memory to the other.

Wang Xin (Aug 27 2023 at 03:10):

@Luke Wagner all my questions are around how to handle the static linking requirements in the component model. It seems it is close to enable static linking for the import and export core modules (since only sligtly difference in the function type, we may think about solving it). The "fuse" you mentioned sounds pretty like static linking with some adapt wasm bytecode (or a function?) to fit the function type difference?

Wang Xin (Aug 27 2023 at 03:27):

forgot to mention static link of the core modules also requires modules following the wasm-ld convesion.

Luke Wagner (Aug 27 2023 at 05:00):

@Wang Xin yes, it could be implemented via static linking. The adapter functions mentioned above do have to be generated/compiled into new wasm functions, but those can then be statically linked with all the original core modules in the components. (It is also possible to dynamically link compiled module code in order to share code (like libc or language runtime) between components, like a DLL, as an optimization.)

Luke Wagner (Aug 27 2023 at 05:02):

Oh, but one difference to mention from traditional static linking is that there will be multiple linear memories. This can be supported using wasm's multi-memory feature, though.

Stream: general

Topic: CM question: transfer the canonical ABI data to Wasm stack

Wang Xin (Aug 24 2023 at 00:33):