WebAssembly Strings · general · Zulip Chat Archive

Hey folks, I'd like to have your opinions on https://github.com/AssemblyScript/universal-strings and what your thoughts are about the foregoing discussion in https://github.com/WebAssembly/gc/issues/145. In particular I'd like to understand the potential strategic conflicts leading to tough discussions like these over something that might well be a generally superior solution benefiting the entire ecosystem of (managed) languages and developers? Feel free to PM me if there's non-public intel you'd like to share. Really want to understand the big picture better.

AssemblyScript/universal-strings

Document scoped to discussion of Universal Strings in WebAssembly - AssemblyScript/universal-strings

GC story for strings? · Issue #145 · WebAssembly/gc

As of the MVP document, strings can be expressed as either an (array i8) or (array i16) per a language's string encoding, but with only one character at a time being accessible with array.get a...

bjorn3 (Oct 11 2020 at 08:09):

Deleted (Oct 11 2020 at 16:23):

Thanks! There is indeed a validation step necessary in this case, with https://github.com/AssemblyScript/universal-strings/pull/2 proposing an idea to avoid redundant validation steps or validating implicitly. Also note that even with any alternative there will be a validation step somewhere iff the binding specified either UTF-8 or UTF-16, i.e. enforcing well-formedness, because one has to guarantee the invariant somehow. Also open to further ideas! Regarding 2. I don't quite see where there are two copies, and how it differs from let's say UTF-16 to UTF-16. Can you elaborate where you are seeing the issue, and how it is less efficient than interface types for example?

Mention an "is_valid" bit can save work by kripken · Pull Request #2 · AssemblyScript/universal-strings

GitHub is home to over 50 million developers working together to host and review code, manage projects, and build software together.

bjorn3 (Oct 11 2020 at 16:27):

There are two copies as you first have to copy the string from linear memory to a stringref and then back to linear memory in the other module.

Deleted (Oct 11 2020 at 16:44):

I see now, interesting. Very good point, agree that this should be avoided! Have ideas already :)

Deleted (Oct 11 2020 at 16:49):

One implicit mechanism I can imagine is if there is a string.new at one side of the boundary, and a string.lower immediately at the other, the engine can copy from the source to the target. Anything more formalized than that achieving the same will do as well.

Deleted (Oct 11 2020 at 17:10):

Interesting observation there is that a module does not even have to fully support GC for this to work, and is the common case in systems languages using linear memory exclusively. May just legalize string.new and string.lower at the boundary independently of GC.

Deleted (Oct 11 2020 at 18:06):

AssemblyScript/universal-strings

Document scoped to discussion of Universal Strings in WebAssembly - AssemblyScript/universal-strings

Dan Gohman (Oct 11 2020 at 19:15):

@dcode Modules are typically compiled separately. When the compiler sees a string.new being passed to an import, it doesn't know whether the export will do a string.lower. It'd have to emit code to create a GC object to pass, because that might be what the export needs.

Deleted (Oct 11 2020 at 19:31):

Good point, yeah. Perhaps the engine may create both entry points upon compilation, and use the optimized one where it sees fit?

Dan Gohman (Oct 11 2020 at 20:05):

In order to let the linker pick the which version to use at link time, while avoiding duplicating the entire function, the compiler would presumably split the code which makes the call into a separate function.

Dan Gohman (Oct 11 2020 at 20:05):

Deleted (Oct 11 2020 at 20:14):

Heh, nice, fair point, just that these are taken care of by the engine, i.e. one does not have to author, ship, publish or install adapter functions (per environment), and there is zero size overhead in modules.

Dan Gohman (Oct 12 2020 at 16:01):

Except that tools auto-generate these so developers don't author them manaully, and there's no extra work to "ship, publish, or install", and they're not per-environment.

Dan Gohman (Oct 12 2020 at 16:04):

The code size part does get to an interesting design question -- should wasm define a fixed set of supported string formats, or should it let source languages define their own formats?

Deleted (Oct 12 2020 at 16:06):

Can you elaborate what the expected process of adding IT to and later using it with a module is? For instance, will it require creating multiple binaries depending on what other modules or hosts a module integrates with? Or just the adapters that wrap a module? How does it behave when for example a dependency is switched out with a compatible one written in another language?

Dan Gohman (Oct 12 2020 at 16:07):

This is the "fusion" part of the IT proposal. You produce a module with adapters that translate between your concrete types and the abstract IT types, and the host / other module has adapters that translate from the abstract IT types to its concrete types

Dan Gohman (Oct 12 2020 at 16:08):

These two halves are fused at link time to produce the complete adapter function. So you only ship the code for your half.

Deleted (Oct 12 2020 at 16:10):

Till Schneidereit (Oct 14 2020 at 11:07):

yeah, that part is one of the most important aspects of ITs: building on the whole-system view the runtime has, we can have a system where content modules don't have to agree on a serialization format as a least-common denominator, as RPC mechanisms usually have to

Stream: general

Topic: WebAssembly Strings

Deleted (Oct 10 2020 at 17:50):

bjorn3 (Oct 11 2020 at 08:09):

Deleted (Oct 11 2020 at 16:23):

bjorn3 (Oct 11 2020 at 16:27):

Deleted (Oct 11 2020 at 16:44):

Deleted (Oct 11 2020 at 16:49):

Deleted (Oct 11 2020 at 17:10):

Deleted (Oct 11 2020 at 18:06):

Dan Gohman (Oct 11 2020 at 19:15):

Deleted (Oct 11 2020 at 19:31):

Dan Gohman (Oct 11 2020 at 20:05):

Dan Gohman (Oct 11 2020 at 20:05):

Deleted (Oct 11 2020 at 20:14):

Dan Gohman (Oct 12 2020 at 16:01):

Dan Gohman (Oct 12 2020 at 16:04):

Deleted (Oct 12 2020 at 16:06):

Dan Gohman (Oct 12 2020 at 16:07):

Dan Gohman (Oct 12 2020 at 16:08):

Deleted (Oct 12 2020 at 16:10):

Till Schneidereit (Oct 14 2020 at 11:07):