wasmtime / issue #4309 Implement support for utf16+latin1... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / issue #4309 Implement support for utf16+latin1...

Wasmtime GitHub notifications bot (Jun 23 2022 at 21:46):

alexcrichton opened issue #4309:

Currently support for utf16+latin1 is not implemented in Wasmtime, but we'll need to finish this and test it before the component model is considered done.

In general I'd expect that this would use the encoding_rs crate for the internal details of latin1 to avoid open-coding that in Wasmtime itself.

Lowering

Lowering a string into wasm is currently unimplemented. I think that this is required to implement the store_string_to_latin1_or_utf16 function in the canonical ABI explainer. My current understanding is that even if we could implement something more optimal in Rust we can't do that because the semantics of lowering are already specified.

I believe the pseudo-code there does most of the fiddly bits but some small helpers in encoding_rs are probably going to be required.

Lifting

Calculation of the byte length and actually getting the string are unimplemented. I think that we're free to use encoding_rs here however we see fit. Probably the decode_latin1 function will be useful here.

Other notes

I am personally unfamilar with latin1 as an encoding. I don't know if an arbitrary list of types are guaranteed to be valid latin1 or not. (the infallibility of decode_latin1 seems odd to me).

Using encoding_rs may be a better option for utf16 decoding we currently do (and maybe even utf8 since encoding_rs can probably do simd things that the standard library can't). If someone's intrepid it might be interesting to try to benchmark this and see if it's beneficial to use encoding_rs for almost everything.

Wasmtime GitHub notifications bot (Jun 23 2022 at 21:46):

alexcrichton labeled issue #4309:

Currently support for utf16+latin1 is not implemented in Wasmtime, but we'll need to finish this and test it before the component model is considered done.

In general I'd expect that this would use the encoding_rs crate for the internal details of latin1 to avoid open-coding that in Wasmtime itself.

Lowering

Lowering a string into wasm is currently unimplemented. I think that this is required to implement the store_string_to_latin1_or_utf16 function in the canonical ABI explainer. My current understanding is that even if we could implement something more optimal in Rust we can't do that because the semantics of lowering are already specified.

I believe the pseudo-code there does most of the fiddly bits but some small helpers in encoding_rs are probably going to be required.

Lifting

Calculation of the byte length and actually getting the string are unimplemented. I think that we're free to use encoding_rs here however we see fit. Probably the decode_latin1 function will be useful here.

Other notes

I am personally unfamilar with latin1 as an encoding. I don't know if an arbitrary list of types are guaranteed to be valid latin1 or not. (the infallibility of decode_latin1 seems odd to me).

Using encoding_rs may be a better option for utf16 decoding we currently do (and maybe even utf8 since encoding_rs can probably do simd things that the standard library can't). If someone's intrepid it might be interesting to try to benchmark this and see if it's beneficial to use encoding_rs for almost everything.

Wasmtime GitHub notifications bot (Aug 08 2022 at 16:01):

alexcrichton closed issue #4309:

Currently support for utf16+latin1 is not implemented in Wasmtime, but we'll need to finish this and test it before the component model is considered done.

In general I'd expect that this would use the encoding_rs crate for the internal details of latin1 to avoid open-coding that in Wasmtime itself.

Lowering

Lowering a string into wasm is currently unimplemented. I think that this is required to implement the store_string_to_latin1_or_utf16 function in the canonical ABI explainer. My current understanding is that even if we could implement something more optimal in Rust we can't do that because the semantics of lowering are already specified.

I believe the pseudo-code there does most of the fiddly bits but some small helpers in encoding_rs are probably going to be required.

Lifting

Calculation of the byte length and actually getting the string are unimplemented. I think that we're free to use encoding_rs here however we see fit. Probably the decode_latin1 function will be useful here.

Other notes

I am personally unfamilar with latin1 as an encoding. I don't know if an arbitrary list of types are guaranteed to be valid latin1 or not. (the infallibility of decode_latin1 seems odd to me).

Using encoding_rs may be a better option for utf16 decoding we currently do (and maybe even utf8 since encoding_rs can probably do simd things that the standard library can't). If someone's intrepid it might be interesting to try to benchmark this and see if it's beneficial to use encoding_rs for almost everything.

Last updated: Apr 17 2025 at 13:10 UTC