Hey folks, I'd like to have your opinions on https://github.com/AssemblyScript/universal-strings and what your thoughts are about the foregoing discussion in https://github.com/WebAssembly/gc/issues/145. In particular I'd like to understand the potential strategic conflicts leading to tough discussions like these over something that might well be a generally superior solution benefiting the entire ecosystem of (managed) languages and developers? Feel free to PM me if there's non-public intel you'd like to share. Really want to understand the big picture better.
It doesn't support UTF-8, which Rust mandates for strings. While the proposal suggests sanitizing the string at the boundary, this would require an unnecessary validation step when both sides are guaranteed to produce an UTF-8 string.
Avoid alloc+copy->garbage at the boundary in between two Wasm GC-enabled languages and/or JavaScript
While this is the case for GC-enabled languages, for non-GC-enabled languages it will require two copies instead of a single one.
Thanks! There is indeed a validation step necessary in this case, with https://github.com/AssemblyScript/universal-strings/pull/2 proposing an idea to avoid redundant validation steps or validating implicitly. Also note that even with any alternative there will be a validation step somewhere iff the binding specified either UTF-8 or UTF-16, i.e. enforcing well-formedness, because one has to guarantee the invariant somehow. Also open to further ideas! Regarding 2. I don't quite see where there are two copies, and how it differs from let's say UTF-16 to UTF-16. Can you elaborate where you are seeing the issue, and how it is less efficient than interface types for example?
There are two copies as you first have to copy the string from linear memory to a stringref and then back to linear memory in the other module.
I see now, interesting. Very good point, agree that this should be avoided! Have ideas already :)
One implicit mechanism I can imagine is if there is a string.new
at one side of the boundary, and a string.lower
immediately at the other, the engine can copy from the source to the target. Anything more formalized than that achieving the same will do as well.
Interesting observation there is that a module does not even have to fully support GC for this to work, and is the common case in systems languages using linear memory exclusively. May just legalize string.new
and string.lower
at the boundary independently of GC.
Here you go: https://github.com/AssemblyScript/universal-strings#integration-with-linear-memory-based-languages
@dcode Modules are typically compiled separately. When the compiler sees a string.new
being passed to an import, it doesn't know whether the export will do a string.lower
. It'd have to emit code to create a GC object to pass, because that might be what the export needs.
Good point, yeah. Perhaps the engine may create both entry points upon compilation, and use the optimized one where it sees fit?
In order to let the linker pick the which version to use at link time, while avoiding duplicating the entire function, the compiler would presumably split the code which makes the call into a separate function.
Suppose we call these split-out funtions the "adapter functions"
Heh, nice, fair point, just that these are taken care of by the engine, i.e. one does not have to author, ship, publish or install adapter functions (per environment), and there is zero size overhead in modules.
Except that tools auto-generate these so developers don't author them manaully, and there's no extra work to "ship, publish, or install", and they're not per-environment.
The code size part does get to an interesting design question -- should wasm define a fixed set of supported string formats, or should it let source languages define their own formats?
Can you elaborate what the expected process of adding IT to and later using it with a module is? For instance, will it require creating multiple binaries depending on what other modules or hosts a module integrates with? Or just the adapters that wrap a module? How does it behave when for example a dependency is switched out with a compatible one written in another language?
This is the "fusion" part of the IT proposal. You produce a module with adapters that translate between your concrete types and the abstract IT types, and the host / other module has adapters that translate from the abstract IT types to its concrete types
These two halves are fused at link time to produce the complete adapter function. So you only ship the code for your half.
I see, thanks! Yeah, only shipping the code for your half is crucial there.
yeah, that part is one of the most important aspects of ITs: building on the whole-system view the runtime has, we can have a system where content modules don't have to agree on a serialization format as a least-common denominator, as RPC mechanisms usually have to
Last updated: Jan 24 2025 at 00:11 UTC