Stream: git-wasmtime

Topic: wasmtime / issue #6824 Stdout unicode output is not shown...


view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 00:32):

mush42 opened issue #6824:

The problem

Given the following rust app that prints some Arabic text to the terminal:

use std::io::{self, Write};


fn main() {
    let text = "مرحبا بكم".to_string();

    let mut stdout = io::stdout().lock();
    stdout.write_all(text.as_bytes()).unwrap();
}

Compiling the app and running the .wasm module using wasmtime. gives the following garbled output:

مرحبا بكم

Expected

The app should print the given text to the terminal:

"مرحبا بكم

Things work as expected on an alternative runtime like wasmer

Extra information

Platform: Windows 11 64-bit
wasmtime invoked from cmd.exe

Best
Musharraf

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 15:17):

kpreisser commented on issue #6824:

Hi,
without knowing the details of Rust and the WASI APIs, this looks like the WASM module is using UTF-8 to print the string to stdout, resulting in this byte sequence:
D9 85 D8 B1 D8 AD D8 A8 D8 A7 20 D8 A8 D9 83 D9 85

Wasmtime on Windows probably doesn't set the console encoding using SetConsoleOutputCP, so with default Windows settings it gets interpreted using the OEM codepage encoding, e.g. 437. With CP437, this byte sequence results in the string:
مرحبا بكم.

For example, when using conhost.exe as default console application, the current console encoding is shown in the properties (the set console encoding will be kept even after the child process exits):
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/190f07a9-c0c0-441b-99be-966a526ac5cd)

AFAIK, recent Windows 10 versions (Version 1909 and higher) support using UTF-8 as input and output console encoding (see these docs for mode details). (But I'm not familiar enough with WASI to know whether it specifies/assumes that strings written to stdout/stderr should use a specific encoding like UTF-8.)

For example, when starting an app that calls SetConsoleOutputCP(65001) and then starting wasmtime with the example module again, the string should be printed correctly:
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/69b40bd5-ab7c-4f43-8d39-ad1b191957d3)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 15:21):

kpreisser edited a comment on issue #6824:

Hi,
without knowing the details of Rust and the WASI APIs, this looks like the WASM module is using UTF-8 to print the string to stdout, resulting in this byte sequence:
D9 85 D8 B1 D8 AD D8 A8 D8 A7 20 D8 A8 D9 83 D9 85

Wasmtime on Windows probably doesn't set the console encoding using SetConsoleOutputCP, so with default Windows settings it gets interpreted using the OEM codepage encoding, e.g. 437 on US English. With CP437, this byte sequence results in the string:
مرحبا بكم.

For example, when using conhost.exe as default console application, the current console encoding is shown in the properties (the set console encoding will be kept even after the child process exits):
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/190f07a9-c0c0-441b-99be-966a526ac5cd)

AFAIK, recent Windows 10 versions (Version 1909 and higher) support using UTF-8 as input and output console encoding (see these docs for mode details). (But I'm not familiar enough with WASI to know whether it specifies/assumes that strings written to stdout/stderr should use a specific encoding like UTF-8.)

For example, when starting an app that calls SetConsoleOutputCP(65001) and then starting wasmtime with the example module again, the string should be printed correctly:
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/69b40bd5-ab7c-4f43-8d39-ad1b191957d3)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 15:23):

bjorn3 commented on issue #6824:

If stdout is a console, libstd will use WriteConsoleW for writing which accepts UTF-16. Is SetConsoleOutputCP also necessary for the UTF-16 apis? Code pages should only matter for the *A api's, not the *W api's, right?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 15:25):

cfallin commented on issue #6824:

cc @sunfishcode or @peterhuene perhaps?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 15:32):

kpreisser commented on issue #6824:

Code pages should only matter for the *A api's, not the *W api's, right?

Yes, as I understand the docs the current console page should only matter for WriteFile and WriteConsoleA, but not for WriteConsoleW (but I haven't tested it).

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 15:33):

kpreisser edited a comment on issue #6824:

Hi,
without knowing the details of Rust and the WASI APIs, this looks like the WASM module is using UTF-8 to print the string to stdout, resulting in this byte sequence:
D9 85 D8 B1 D8 AD D8 A8 D8 A7 20 D8 A8 D9 83 D9 85

Wasmtime on Windows probably doesn't set the console encoding using SetConsoleOutputCP, so with default Windows settings it gets interpreted using the OEM codepage encoding, e.g. 437 on US English (edit: when using APIs accepting char *, like WriteFile or WriteConsoleA). With CP437, this byte sequence results in the string:
مرحبا بكم.

For example, when using conhost.exe as default console application, the current console encoding is shown in the properties (the set console encoding will be kept even after the child process exits):
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/190f07a9-c0c0-441b-99be-966a526ac5cd)

AFAIK, recent Windows 10 versions (Version 1909 and higher) support using UTF-8 as input and output console encoding (see these docs for mode details). (But I'm not familiar enough with WASI to know whether it specifies/assumes that strings written to stdout/stderr should use a specific encoding like UTF-8.)

For example, when starting an app that calls SetConsoleOutputCP(65001) and then starting wasmtime with the example module again, the string should be printed correctly:
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/69b40bd5-ab7c-4f43-8d39-ad1b191957d3)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 15:56):

kpreisser edited a comment on issue #6824:

Hi,
without knowing the details of Rust and the WASI APIs, this looks like the WASM module is using UTF-8 to print the string to stdout, resulting in this byte sequence:
D9 85 D8 B1 D8 AD D8 A8 D8 A7 20 D8 A8 D9 83 D9 85

Wasmtime on Windows probably doesn't set the console encoding using SetConsoleOutputCP, so with default Windows settings it gets interpreted using the OEM codepage encoding, e.g. 437 on US English (edit: when using APIs accepting char */void *, like WriteFile or WriteConsoleA). With CP437, this byte sequence results in the string:
مرحبا بكم.

For example, when using conhost.exe as default console application, the current console encoding is shown in the properties (the set console encoding will be kept even after the child process exits):
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/190f07a9-c0c0-441b-99be-966a526ac5cd)

AFAIK, recent Windows 10 versions (Version 1909 and higher) support using UTF-8 as input and output console encoding (see these docs for mode details). (But I'm not familiar enough with WASI to know whether it specifies/assumes that strings written to stdout/stderr should use a specific encoding like UTF-8.)

For example, when starting an app that calls SetConsoleOutputCP(65001) and then starting wasmtime with the example module again, the string should be printed correctly:
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/69b40bd5-ab7c-4f43-8d39-ad1b191957d3)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 15:56):

kpreisser edited a comment on issue #6824:

Hi,
without knowing the details of Rust and the WASI APIs, this looks like the WASM module is using UTF-8 to print the string to stdout, resulting in this byte sequence:
D9 85 D8 B1 D8 AD D8 A8 D8 A7 20 D8 A8 D9 83 D9 85

Wasmtime on Windows probably doesn't set the console encoding using SetConsoleOutputCP, so with default Windows settings it gets interpreted using the OEM codepage encoding, e.g. 437 on US English (edit: when using APIs like WriteFile or WriteConsoleA). With CP437, this byte sequence results in the string:
مرحبا بكم.

For example, when using conhost.exe as default console application, the current console encoding is shown in the properties (the set console encoding will be kept even after the child process exits):
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/190f07a9-c0c0-441b-99be-966a526ac5cd)

AFAIK, recent Windows 10 versions (Version 1909 and higher) support using UTF-8 as input and output console encoding (see these docs for mode details). (But I'm not familiar enough with WASI to know whether it specifies/assumes that strings written to stdout/stderr should use a specific encoding like UTF-8.)

For example, when starting an app that calls SetConsoleOutputCP(65001) and then starting wasmtime with the example module again, the string should be printed correctly:
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/69b40bd5-ab7c-4f43-8d39-ad1b191957d3)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 16:00):

kpreisser edited a comment on issue #6824:

Hi,
without knowing the details of Rust and the WASI APIs, this looks like the WASM module is using UTF-8 to print the string to stdout, resulting in this byte sequence:
D9 85 D8 B1 D8 AD D8 A8 D8 A7 20 D8 A8 D9 83 D9 85

Wasmtime on Windows probably doesn't set the console encoding using SetConsoleOutputCP, so with default Windows settings it gets interpreted using the OEM codepage encoding, e.g. 437 on US English (edit: when using APIs like WriteFile or WriteConsoleA). With CP437, this byte sequence results in the string:
مرحبا بكم.

For example, when using conhost.exe as default console application, the current console encoding is shown in the properties (the set console encoding will be kept even after the child process exits):
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/190f07a9-c0c0-441b-99be-966a526ac5cd)

AFAIK, recent Windows 10 versions (Version 1909 and higher) support using UTF-8 as input and output console encoding (see these docs for mode details). (But I'm not familiar enough with WASI to know whether it specifies/assumes that strings written to stdout/stderr should use a specific encoding like UTF-8.)

For example, when starting an app that calls SetConsoleOutputCP(65001) and then starting wasmtime with the example module again, the string should be printed correctly:
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/f310d006-71e8-4ef3-a592-11e050207540)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 16:01):

kpreisser edited a comment on issue #6824:

Hi,
without knowing the details of Rust and the WASI APIs, this looks like the WASM module is using UTF-8 to print the string to stdout, resulting in this byte sequence:
D9 85 D8 B1 D8 AD D8 A8 D8 A7 20 D8 A8 D9 83 D9 85

Wasmtime on Windows probably doesn't set the console encoding using SetConsoleOutputCP, so with default Windows settings it gets interpreted using the OEM codepage encoding, e.g. 437 on US English (edit: when using APIs like WriteFile or WriteConsoleA). With CP437, this byte sequence results in the string:
مرحبا بكم.

For example, when using conhost.exe as default console application, the current console encoding is shown in the properties (the set console encoding will be kept even after the child process exits):
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/190f07a9-c0c0-441b-99be-966a526ac5cd)

AFAIK, recent Windows 10 versions (Version 1909 and higher) support using UTF-8 as input and output console encoding (see these docs for mode details). (But I'm not familiar enough with WASI to know whether it specifies/assumes that strings written to stdout/stderr should use a specific encoding like UTF-8.)

When starting an app that calls SetConsoleOutputCP(65001) (e.g. by running chcp 65001 on the command-line) and then starting wasmtime with the example module again, the string should be printed correctly:
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/f310d006-71e8-4ef3-a592-11e050207540)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 16:30):

peterhuene commented on issue #6824:

I don't have access to my Windows machine this week as I'm working away from home, but I can try to reproduce and investigate on a VM.

From what I can tell of the various async wrappers (wasi's, tokio's, etc), we should still be ending up in a call to the write_all impl of std::io::Stdio, which should detect the handle as a console (as we do not call SetStdHandle in Wasmtime), convert the UTF-8 bytes to UTF-16, and use WriteConsoleW to print.

Thus I'd expect the wasm Rust program would behave the same as it would as a native Rust program, but obviously something is amiss.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 18:09):

peterhuene commented on issue #6824:

I was able to reproduce this on a VM (after some battling getting the fact I was using a Windows ARM VM since I have an M2 Mac).

I'll investigate shortly.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 19:16):

pchickey commented on issue #6824:

Thanks for this report and discussion, all - I'm pretty ignorant on this topic.

I don't have a windows machine, so I tried to reproduce the issue using the windows runners in CI, and it appears that in CI the encoding is handled correctly (https://github.com/bytecodealliance/wasmtime/actions/runs/5811016470/job/15753287837#step:16:929) - but, from my reading above, that could be due to the way GitHub CI captures output with different defaults than cmd.exe.

I looked at the implementations and the wasi-common implementations (the wasi-cap-std-sync and wasi-tokio tests) bypass rust test output capture (showing the arabic welcome above) because they use std::fs::File's Write::write_vectored on the stdout/stderr files directly https://github.com/bytecodealliance/wasmtime/blob/pch/sync_wasi_cli/crates/wasi-common/cap-std-sync/src/stdio.rs#L131 whereas the preview2 implementations write by way of tokio::io::std{out,err}'s AsyncWrite impls, which in turn internally use std::io::std{out,err}'s Write impl.

Unfortunately the preview 2 implementations dont actually produce output due to another issue. I'll fix that in parallel, but this might shed some light...

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 19:46):

peterhuene commented on issue #6824:

I'm having a difficult time debugging from an emulated x86-64 debugger running on a Windows ARM VM; it seems like I can hit a breakpoint inside the if !is_console(...) check in the stdlib was for seeing if it needed to convert the output to UTF-16 and use WriteConsoleW, which would imply it is writing the raw UTF-8 bytes to the console (presumably with WriteFile), hence the console code page issue.

Unfortunately, it appears the debugger or the cross-compiled Wasmtime is doing things that make it impossible to actually inspect locals or step reliably through (misaligned memory access, AVs, etc); the joys of debugging a x86-64 Windows process running on an M2 Mac.

Therefore, I can't reliably trust what I'm seeing.

Unless someone else debugs this, this will have to wait until next week when I am back in front of a proper Windows machine.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 19:48):

peterhuene edited a comment on issue #6824:

I'm having a difficult time debugging from an emulated x86-64 debugger running on a Windows ARM VM; it seems like I can hit a breakpoint inside the if !is_console(...) check in the stdlib for seeing if it needs to convert the output to UTF-16 and use WriteConsoleW; this would imply it is writing the raw UTF-8 bytes to the console (presumably with WriteFile), hence the console code page issue.

Unfortunately, it appears the debugger or the cross-compiled Wasmtime is doing things that make it impossible to actually inspect locals or step reliably through (misaligned memory access, AVs, etc); the joys of debugging a x86-64 Windows process running on an M2 Mac.

Therefore, I can't reliably trust what I'm seeing.

Unless someone else debugs this, this will have to wait until next week when I am back in front of a proper Windows machine.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 20:00):

peterhuene edited a comment on issue #6824:

I don't have access to my Windows machine this week as I'm working away from home, but I can try to reproduce and investigate on a VM.

From what I can tell of the various async wrappers (wasi's, tokio's, etc), we should still be ending up in a call to the Write impl of std::io::Stdio, which should detect the handle as a console (as we do not call SetStdHandle in Wasmtime), convert the UTF-8 bytes to UTF-16, and use WriteConsoleW to print.

Thus I'd expect the wasm Rust program would behave the same as it would as a native Rust program, but obviously something is amiss.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 21:14):

peterhuene edited a comment on issue #6824:

I'm having a difficult time debugging from an emulated x86-64 debugger running on a Windows ARM VM; it seems like I can hit a breakpoint inside the if !is_console(...) check in the stdlib for seeing if it needs to convert the output to UTF-16 and use WriteConsoleW; this would imply it is writing the raw UTF-8 bytes to the console (presumably with WriteFile), hence the console code page issue.

Unfortunately, it appears the debugger or the cross-compiled Wasmtime is doing things that make it impossible to actually inspect locals or step reliably through (misaligned memory access, AVs, etc); the joys of debugging a x86-64 Windows process running on an M2 Mac.

Therefore, I can't reliably trust what I'm seeing.

Unless someone else debugs this, this will have to wait until next week when I am back in front of a proper Windows machine.

Update: apparently an ARM windbg incorrectly sets symbol breakpoints (hence I can't easily break on kernel32!WriteConsoleW or ntdll!NtWriteFile) and can't correctly display the disassembly of a x64 user mode debuggee. I'll try later with a x64 windbg under emulation and see if that works.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 21:16):

peterhuene edited a comment on issue #6824:

I'm having a difficult time debugging from an emulated x86-64 debugger running on a Windows ARM VM; it seems like I can hit a breakpoint inside the if !is_console(...) check in the stdlib for seeing if it needs to convert the output to UTF-16 and use WriteConsoleW; this would imply it is writing the raw UTF-8 bytes to the console (presumably with WriteFile), hence the console code page issue.

Unfortunately, it appears the debugger from the "C/C++" VS code extension is doing things that make it impossible to actually inspect locals or step reliably through (misaligned memory access, AVs, etc); the joys of debugging a x86-64 Windows process running on an M2 Mac.

Therefore, I can't reliably trust what I'm seeing.

Unless someone else debugs this, this will have to wait until next week when I am back in front of a proper Windows machine.

Update: apparently an ARM windbg incorrectly sets symbol breakpoints (hence I can't easily break on kernel32!WriteConsoleW or ntdll!NtWriteFile) and can't correctly display the disassembly of a x64 user mode debuggee. I'll try later with a x64 windbg under emulation and see if that works.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 23:32):

peterhuene edited a comment on issue #6824:

I'm having a difficult time debugging from an emulated x86-64 debugger running on a Windows ARM VM; it seems like I can hit a breakpoint inside the if !is_console(...) check in the stdlib for seeing if it needs to convert the output to UTF-16 and use WriteConsoleW; this would imply it is writing the raw UTF-8 bytes to the console (presumably with WriteFile), hence the console code page issue.

Unfortunately, it appears the debugger from the "C/C++" VS code extension is doing things that make it impossible to actually inspect locals or step reliably through (misaligned memory access, AVs, etc); the joys of debugging a x86-64 Windows process running on an M2 Mac.

Therefore, I can't reliably trust what I'm seeing.

Unless someone else debugs this, this will have to wait until next week when I am back in front of a proper Windows machine.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 09 2023 at 23:33):

peterhuene edited a comment on issue #6824:

I'm having a difficult time debugging from an ARM debugger with a x86-64 Wasmtime running under emulation.

Unless someone else debugs this, this will have to wait until next week when I am back in front of a proper Windows machine.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 10 2023 at 00:06):

peterhuene commented on issue #6824:

Indeed, as Pat mentioned in the link above, the problem lies with treating the stdout/stderr handles as a File with (&*self.0.as_filelike_view::<File>()).write_vectored(bufs).

This redirects the Write implementation to File which ultimately writes the UTF-8 bytes directly to the console _without_ letting the stdlib do the conversion to UTF-16 and call WriteConsoleW.

@sunfishcode, as the cap-std author and the last person to touch that line in the WASI preview1 code, is there a reason for not wanting to simply call the Stdout/Stderr impls of write_vectored?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 10 2023 at 00:15):

peterhuene edited a comment on issue #6824:

Indeed, as Pat mentioned in the link above, the problem lies with treating the stdout/stderr handles as a File with (&*self.0.as_filelike_view::<File>()).write_vectored(bufs).

This redirects to the Write implementation of File which ultimately writes the UTF-8 bytes directly to the console _without_ letting the stdlib do the conversion to UTF-16 and call WriteConsoleW.

@sunfishcode, as the cap-std author and the last person to touch that line in the WASI preview1 code, is there a reason for not wanting to simply call the Stdout/Stderr impls of write_vectored?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 10 2023 at 00:16):

peterhuene edited a comment on issue #6824:

Indeed, as Pat mentioned in the link above, the problem lies with treating the stdout/stderr handles as a File with (&*self.0.as_filelike_view::<File>()).write_vectored(bufs).

This redirects to the Write implementation of File which ultimately writes the UTF-8 bytes directly to the console _without_ letting the stdlib do the conversion to UTF-16 and then call WriteConsoleW.

@sunfishcode, as the cap-std author and the last person to touch that line in the WASI preview1 code, is there a reason for not wanting to simply call the Stdout/Stderr impls of write_vectored?

view this post on Zulip Wasmtime GitHub notifications bot (Aug 10 2023 at 02:09):

pchickey commented on issue #6824:

I expect the answer is that we had no idea it had a different impl. Lets switch it!

view this post on Zulip Wasmtime GitHub notifications bot (Aug 10 2023 at 08:05):

kpreisser edited a comment on issue #6824:

Code pages should only matter for the *A api's, not the *W api's, right?

Yes, as I understand the docs the current console code page should only matter for WriteFile and WriteConsoleA, but not for WriteConsoleW (but I haven't tested it).

view this post on Zulip Wasmtime GitHub notifications bot (Sep 12 2023 at 21:33):

pchickey commented on issue #6824:

@mush42 sorry, this fell off my radar - I caught up with @peterhuene in person the other day and we each thought the other was taking care of it.

We just hit merge on https://github.com/bytecodealliance/wasmtime/pull/6825. Can you build the latest wasmtime main and try to reproduce again? It should be fixed, but we don't have any windows users handy who can check.

view this post on Zulip Wasmtime GitHub notifications bot (Sep 12 2023 at 22:27):

alexcrichton closed issue #6824:

The problem

Given the following rust app that prints some Arabic text to the terminal:

use std::io::{self, Write};


fn main() {
    let text = "مرحبا بكم".to_string();

    let mut stdout = io::stdout().lock();
    stdout.write_all(text.as_bytes()).unwrap();
}

Compiling the app and running the .wasm module using wasmtime. gives the following garbled output:

مرحبا بكم

Expected

The app should print the given text to the terminal:

"مرحبا بكم

Things work as expected on an alternative runtime like wasmer

Extra information

Platform: Windows 11 64-bit
wasmtime invoked from cmd.exe

Best
Musharraf

view this post on Zulip Wasmtime GitHub notifications bot (Sep 13 2023 at 03:12):

mush42 commented on issue #6824:

@mush42 sorry, this fell off my radar - I caught up with @peterhuene in person the other day and we each thought the other was taking care of it.

We just hit merge on #6825. Can you build the latest wasmtime main and try to reproduce again? It should be fixed, but we don't have any windows users handy who can check.

I can confirm that the issue has been fixed in #6825. Thanks!


Last updated: Jan 24 2025 at 00:11 UTC