Stream: wit-bindgen

Topic: Possible Canonical ABI issue


view this post on Zulip Gordon Smith (Mar 13 2024 at 18:42):

Just looking at: https://github.com/WebAssembly/component-model/blob/main/design/mvp/canonical-abi/definitions.py#L405-L406

case Own()          : return lift_own(cx, load_int(cx.opts, ptr, 4), t)
case Borrow()       : return lift_borrow(cx, load_int(cx.opts, ptr, 4), t)

Looks like it should be:

case Own()          : return lift_own(cx, load_int(cx, ptr, 4), t)
case Borrow()       : return lift_borrow(cx, load_int(cx, ptr, 4), t)

view this post on Zulip Dan Gohman (Mar 13 2024 at 20:55):

Indeed, that looks like a bug. Would you mind filing an issue in the component-model repo?

view this post on Zulip Gordon Smith (Mar 14 2024 at 10:00):

I will do, while I have your attention should the despecialize function convert a string to a list<char> ?
(Based on the docs here: https://github.com/WebAssembly/component-model/blob/main/design/mvp/Explainer.md#specialized-value-types)

view this post on Zulip Lann Martin (Mar 14 2024 at 12:26):

Specialized types are not (necessarily) despecialized in the canonical ABI. For strings in particular multiple unicode encodings are supported.

view this post on Zulip Dan Gohman (Mar 14 2024 at 12:26):

That despecialize function implements the Canonical ABI's definition of despecialization; in the Canonicial ABI, list<char> is represented like list<u32>, while string is represented like list<u8> or list<u16> where the u8s or u16s are Unicode code units. Or the latin1+utf16 representation.

view this post on Zulip Dan Gohman (Mar 14 2024 at 12:26):

There's a mention of this here, although we should document this subtlety more clearly.

view this post on Zulip Lann Martin (Mar 14 2024 at 13:01):

Conceptually the cabi isn't really despecializing strings: that list<u8> is still constrained to be valid utf-8

view this post on Zulip Gordon Smith (Mar 14 2024 at 13:16):

Speaking of Unicode - having the various conversions done inside the ABI didn't really sit well with me as folks tend to have different preferences as to what implementation to use (certainly in the c world) - I would have preferred if they were treated in a similar fashion to realloc function and left up to the consumer?

view this post on Zulip Lann Martin (Mar 14 2024 at 13:37):

Unicode strings are ubiquitous. If the component model didn't have this functionality it would have been reinvented everywhere.

view this post on Zulip Joel Dice (Mar 14 2024 at 13:37):

The ABI needs to be aware of encodings so that the host can automatically convert between them. For example, if you have a component that expects UTF-16 composed with another component that expects UTF-8, it's up to the host to convert them. Even if you were to leave it up to the consumer to do the conversion, the consumer would at least need to know what encoding they were converting from.

view this post on Zulip Dan Gohman (Mar 14 2024 at 15:26):

Also worth noting is that it doesn't need to do any "interesting" conversions, like normalization, case conversion, anything that needs to be aware of locales, non-Unicode encodings, or anything requiring codepoint tables. It's just translating Unicode scalar values from one encoding to another, which needs a lot less code than, say, realloc.

view this post on Zulip Gordon Smith (Mar 14 2024 at 16:01):

@Dan Gohman Thats a fair point - I am trying to create a c++ ABI implementation and didn't want to have a dependency on ICU!
@Lann Martin I wasn't suggesting removing the Unicode support, just relocating the "encode" function to be a part of CallContext.opts

view this post on Zulip Till Schneidereit (Mar 14 2024 at 21:02):

besides the "you'd have to be able to convert to any other representation" point Joel made, another thing that's different to realloc is that unicode handling is part of the guarantees the component model gives: if you receive a string, you're guaranteed that it's well-formed. If the conversion happened in-content, it couldn't be combined with a validation pass, so it'd be strictly more expensive


Last updated: Nov 22 2024 at 16:03 UTC