mush42 opened issue #6824:
The problem
Given the following rust app that prints some Arabic text to the terminal:
use std::io::{self, Write}; fn main() { let text = "مرحبا بكم".to_string(); let mut stdout = io::stdout().lock(); stdout.write_all(text.as_bytes()).unwrap(); }
Compiling the app and running the
.wasm
module usingwasmtime.
gives the following garbled output:مرحبا بكم
Expected
The app should print the given text to the terminal:
"مرحبا بكم
Things work as expected on an alternative runtime like wasmer
Extra information
Platform: Windows 11 64-bit
wasmtime
invoked fromcmd.exe
Best
Musharraf
kpreisser commented on issue #6824:
Hi,
without knowing the details of Rust and the WASI APIs, this looks like the WASM module is using UTF-8 to print the string to stdout, resulting in this byte sequence:
D9 85 D8 B1 D8 AD D8 A8 D8 A7 20 D8 A8 D9 83 D9 85
Wasmtime on Windows probably doesn't set the console encoding using
SetConsoleOutputCP
, so with default Windows settings it gets interpreted using the OEM codepage encoding, e.g. 437. With CP437, this byte sequence results in the string:
مرحبا بكم
.For example, when using
conhost.exe
as default console application, the current console encoding is shown in the properties (the set console encoding will be kept even after the child process exits):
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/190f07a9-c0c0-441b-99be-966a526ac5cd)AFAIK, recent Windows 10 versions (Version 1909 and higher) support using UTF-8 as input and output console encoding (see these docs for mode details). (But I'm not familiar enough with WASI to know whether it specifies/assumes that strings written to stdout/stderr should use a specific encoding like UTF-8.)
For example, when starting an app that calls
SetConsoleOutputCP(65001)
and then starting wasmtime with the example module again, the string should be printed correctly:
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/69b40bd5-ab7c-4f43-8d39-ad1b191957d3)
kpreisser edited a comment on issue #6824:
Hi,
without knowing the details of Rust and the WASI APIs, this looks like the WASM module is using UTF-8 to print the string to stdout, resulting in this byte sequence:
D9 85 D8 B1 D8 AD D8 A8 D8 A7 20 D8 A8 D9 83 D9 85
Wasmtime on Windows probably doesn't set the console encoding using
SetConsoleOutputCP
, so with default Windows settings it gets interpreted using the OEM codepage encoding, e.g. 437 on US English. With CP437, this byte sequence results in the string:
مرحبا بكم
.For example, when using
conhost.exe
as default console application, the current console encoding is shown in the properties (the set console encoding will be kept even after the child process exits):
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/190f07a9-c0c0-441b-99be-966a526ac5cd)AFAIK, recent Windows 10 versions (Version 1909 and higher) support using UTF-8 as input and output console encoding (see these docs for mode details). (But I'm not familiar enough with WASI to know whether it specifies/assumes that strings written to stdout/stderr should use a specific encoding like UTF-8.)
For example, when starting an app that calls
SetConsoleOutputCP(65001)
and then starting wasmtime with the example module again, the string should be printed correctly:
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/69b40bd5-ab7c-4f43-8d39-ad1b191957d3)
bjorn3 commented on issue #6824:
If stdout is a console, libstd will use
WriteConsoleW
for writing which accepts UTF-16. IsSetConsoleOutputCP
also necessary for the UTF-16 apis? Code pages should only matter for the*A
api's, not the*W
api's, right?
cfallin commented on issue #6824:
cc @sunfishcode or @peterhuene perhaps?
kpreisser commented on issue #6824:
Code pages should only matter for the
*A
api's, not the*W
api's, right?Yes, as I understand the docs the current console page should only matter for
WriteFile
andWriteConsoleA
, but not forWriteConsoleW
(but I haven't tested it).
kpreisser edited a comment on issue #6824:
Hi,
without knowing the details of Rust and the WASI APIs, this looks like the WASM module is using UTF-8 to print the string to stdout, resulting in this byte sequence:
D9 85 D8 B1 D8 AD D8 A8 D8 A7 20 D8 A8 D9 83 D9 85
Wasmtime on Windows probably doesn't set the console encoding using
SetConsoleOutputCP
, so with default Windows settings it gets interpreted using the OEM codepage encoding, e.g. 437 on US English (edit: when using APIs acceptingchar *
, likeWriteFile
orWriteConsoleA
). With CP437, this byte sequence results in the string:
مرحبا بكم
.For example, when using
conhost.exe
as default console application, the current console encoding is shown in the properties (the set console encoding will be kept even after the child process exits):
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/190f07a9-c0c0-441b-99be-966a526ac5cd)AFAIK, recent Windows 10 versions (Version 1909 and higher) support using UTF-8 as input and output console encoding (see these docs for mode details). (But I'm not familiar enough with WASI to know whether it specifies/assumes that strings written to stdout/stderr should use a specific encoding like UTF-8.)
For example, when starting an app that calls
SetConsoleOutputCP(65001)
and then starting wasmtime with the example module again, the string should be printed correctly:
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/69b40bd5-ab7c-4f43-8d39-ad1b191957d3)
kpreisser edited a comment on issue #6824:
Hi,
without knowing the details of Rust and the WASI APIs, this looks like the WASM module is using UTF-8 to print the string to stdout, resulting in this byte sequence:
D9 85 D8 B1 D8 AD D8 A8 D8 A7 20 D8 A8 D9 83 D9 85
Wasmtime on Windows probably doesn't set the console encoding using
SetConsoleOutputCP
, so with default Windows settings it gets interpreted using the OEM codepage encoding, e.g. 437 on US English (edit: when using APIs acceptingchar *
/void *
, likeWriteFile
orWriteConsoleA
). With CP437, this byte sequence results in the string:
مرحبا بكم
.For example, when using
conhost.exe
as default console application, the current console encoding is shown in the properties (the set console encoding will be kept even after the child process exits):
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/190f07a9-c0c0-441b-99be-966a526ac5cd)AFAIK, recent Windows 10 versions (Version 1909 and higher) support using UTF-8 as input and output console encoding (see these docs for mode details). (But I'm not familiar enough with WASI to know whether it specifies/assumes that strings written to stdout/stderr should use a specific encoding like UTF-8.)
For example, when starting an app that calls
SetConsoleOutputCP(65001)
and then starting wasmtime with the example module again, the string should be printed correctly:
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/69b40bd5-ab7c-4f43-8d39-ad1b191957d3)
kpreisser edited a comment on issue #6824:
Hi,
without knowing the details of Rust and the WASI APIs, this looks like the WASM module is using UTF-8 to print the string to stdout, resulting in this byte sequence:
D9 85 D8 B1 D8 AD D8 A8 D8 A7 20 D8 A8 D9 83 D9 85
Wasmtime on Windows probably doesn't set the console encoding using
SetConsoleOutputCP
, so with default Windows settings it gets interpreted using the OEM codepage encoding, e.g. 437 on US English (edit: when using APIs likeWriteFile
orWriteConsoleA
). With CP437, this byte sequence results in the string:
مرحبا بكم
.For example, when using
conhost.exe
as default console application, the current console encoding is shown in the properties (the set console encoding will be kept even after the child process exits):
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/190f07a9-c0c0-441b-99be-966a526ac5cd)AFAIK, recent Windows 10 versions (Version 1909 and higher) support using UTF-8 as input and output console encoding (see these docs for mode details). (But I'm not familiar enough with WASI to know whether it specifies/assumes that strings written to stdout/stderr should use a specific encoding like UTF-8.)
For example, when starting an app that calls
SetConsoleOutputCP(65001)
and then starting wasmtime with the example module again, the string should be printed correctly:
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/69b40bd5-ab7c-4f43-8d39-ad1b191957d3)
kpreisser edited a comment on issue #6824:
Hi,
without knowing the details of Rust and the WASI APIs, this looks like the WASM module is using UTF-8 to print the string to stdout, resulting in this byte sequence:
D9 85 D8 B1 D8 AD D8 A8 D8 A7 20 D8 A8 D9 83 D9 85
Wasmtime on Windows probably doesn't set the console encoding using
SetConsoleOutputCP
, so with default Windows settings it gets interpreted using the OEM codepage encoding, e.g. 437 on US English (edit: when using APIs likeWriteFile
orWriteConsoleA
). With CP437, this byte sequence results in the string:
مرحبا بكم
.For example, when using
conhost.exe
as default console application, the current console encoding is shown in the properties (the set console encoding will be kept even after the child process exits):
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/190f07a9-c0c0-441b-99be-966a526ac5cd)AFAIK, recent Windows 10 versions (Version 1909 and higher) support using UTF-8 as input and output console encoding (see these docs for mode details). (But I'm not familiar enough with WASI to know whether it specifies/assumes that strings written to stdout/stderr should use a specific encoding like UTF-8.)
For example, when starting an app that calls
SetConsoleOutputCP(65001)
and then starting wasmtime with the example module again, the string should be printed correctly:
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/f310d006-71e8-4ef3-a592-11e050207540)
kpreisser edited a comment on issue #6824:
Hi,
without knowing the details of Rust and the WASI APIs, this looks like the WASM module is using UTF-8 to print the string to stdout, resulting in this byte sequence:
D9 85 D8 B1 D8 AD D8 A8 D8 A7 20 D8 A8 D9 83 D9 85
Wasmtime on Windows probably doesn't set the console encoding using
SetConsoleOutputCP
, so with default Windows settings it gets interpreted using the OEM codepage encoding, e.g. 437 on US English (edit: when using APIs likeWriteFile
orWriteConsoleA
). With CP437, this byte sequence results in the string:
مرحبا بكم
.For example, when using
conhost.exe
as default console application, the current console encoding is shown in the properties (the set console encoding will be kept even after the child process exits):
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/190f07a9-c0c0-441b-99be-966a526ac5cd)AFAIK, recent Windows 10 versions (Version 1909 and higher) support using UTF-8 as input and output console encoding (see these docs for mode details). (But I'm not familiar enough with WASI to know whether it specifies/assumes that strings written to stdout/stderr should use a specific encoding like UTF-8.)
When starting an app that calls
SetConsoleOutputCP(65001)
(e.g. by runningchcp 65001
on the command-line) and then starting wasmtime with the example module again, the string should be printed correctly:
![grafik](https://github.com/bytecodealliance/wasmtime/assets/13289184/f310d006-71e8-4ef3-a592-11e050207540)
peterhuene commented on issue #6824:
I don't have access to my Windows machine this week as I'm working away from home, but I can try to reproduce and investigate on a VM.
From what I can tell of the various async wrappers (wasi's, tokio's, etc), we should still be ending up in a call to the
write_all
impl ofstd::io::Stdio
, which should detect the handle as a console (as we do not callSetStdHandle
in Wasmtime), convert the UTF-8 bytes to UTF-16, and useWriteConsoleW
to print.Thus I'd expect the wasm Rust program would behave the same as it would as a native Rust program, but obviously something is amiss.
peterhuene commented on issue #6824:
I was able to reproduce this on a VM (after some battling getting the fact I was using a Windows ARM VM since I have an M2 Mac).
I'll investigate shortly.
pchickey commented on issue #6824:
Thanks for this report and discussion, all - I'm pretty ignorant on this topic.
I don't have a windows machine, so I tried to reproduce the issue using the windows runners in CI, and it appears that in CI the encoding is handled correctly (https://github.com/bytecodealliance/wasmtime/actions/runs/5811016470/job/15753287837#step:16:929) - but, from my reading above, that could be due to the way GitHub CI captures output with different defaults than cmd.exe.
I looked at the implementations and the wasi-common implementations (the
wasi-cap-std-sync
andwasi-tokio
tests) bypass rust test output capture (showing the arabicwelcome
above) because they usestd::fs::File
'sWrite::write_vectored
on the stdout/stderr files directly https://github.com/bytecodealliance/wasmtime/blob/pch/sync_wasi_cli/crates/wasi-common/cap-std-sync/src/stdio.rs#L131 whereas the preview2 implementations write by way of tokio::io::std{out,err}'sAsyncWrite
impls, which in turn internally usestd::io::std{out,err}
'sWrite
impl.Unfortunately the preview 2 implementations dont actually produce output due to another issue. I'll fix that in parallel, but this might shed some light...
peterhuene commented on issue #6824:
I'm having a difficult time debugging from an emulated x86-64 debugger running on a Windows ARM VM; it seems like I can hit a breakpoint inside the
if !is_console(...)
check in the stdlib was for seeing if it needed to convert the output to UTF-16 and useWriteConsoleW
, which would imply it is writing the raw UTF-8 bytes to the console (presumably withWriteFile
), hence the console code page issue.Unfortunately, it appears the debugger or the cross-compiled Wasmtime is doing things that make it impossible to actually inspect locals or step reliably through (misaligned memory access, AVs, etc); the joys of debugging a x86-64 Windows process running on an M2 Mac.
Therefore, I can't reliably trust what I'm seeing.
Unless someone else debugs this, this will have to wait until next week when I am back in front of a proper Windows machine.
peterhuene edited a comment on issue #6824:
I'm having a difficult time debugging from an emulated x86-64 debugger running on a Windows ARM VM; it seems like I can hit a breakpoint inside the
if !is_console(...)
check in the stdlib for seeing if it needs to convert the output to UTF-16 and useWriteConsoleW
; this would imply it is writing the raw UTF-8 bytes to the console (presumably withWriteFile
), hence the console code page issue.Unfortunately, it appears the debugger or the cross-compiled Wasmtime is doing things that make it impossible to actually inspect locals or step reliably through (misaligned memory access, AVs, etc); the joys of debugging a x86-64 Windows process running on an M2 Mac.
Therefore, I can't reliably trust what I'm seeing.
Unless someone else debugs this, this will have to wait until next week when I am back in front of a proper Windows machine.
peterhuene edited a comment on issue #6824:
I don't have access to my Windows machine this week as I'm working away from home, but I can try to reproduce and investigate on a VM.
From what I can tell of the various async wrappers (wasi's, tokio's, etc), we should still be ending up in a call to the
Write
impl ofstd::io::Stdio
, which should detect the handle as a console (as we do not callSetStdHandle
in Wasmtime), convert the UTF-8 bytes to UTF-16, and useWriteConsoleW
to print.Thus I'd expect the wasm Rust program would behave the same as it would as a native Rust program, but obviously something is amiss.
peterhuene edited a comment on issue #6824:
I'm having a difficult time debugging from an emulated x86-64 debugger running on a Windows ARM VM; it seems like I can hit a breakpoint inside the
if !is_console(...)
check in the stdlib for seeing if it needs to convert the output to UTF-16 and useWriteConsoleW
; this would imply it is writing the raw UTF-8 bytes to the console (presumably withWriteFile
), hence the console code page issue.Unfortunately, it appears the debugger or the cross-compiled Wasmtime is doing things that make it impossible to actually inspect locals or step reliably through (misaligned memory access, AVs, etc); the joys of debugging a x86-64 Windows process running on an M2 Mac.
Therefore, I can't reliably trust what I'm seeing.
Unless someone else debugs this, this will have to wait until next week when I am back in front of a proper Windows machine.
Update: apparently an ARM windbg incorrectly sets symbol breakpoints (hence I can't easily break on
kernel32!WriteConsoleW
orntdll!NtWriteFile
) and can't correctly display the disassembly of a x64 user mode debuggee. I'll try later with a x64 windbg under emulation and see if that works.
peterhuene edited a comment on issue #6824:
I'm having a difficult time debugging from an emulated x86-64 debugger running on a Windows ARM VM; it seems like I can hit a breakpoint inside the
if !is_console(...)
check in the stdlib for seeing if it needs to convert the output to UTF-16 and useWriteConsoleW
; this would imply it is writing the raw UTF-8 bytes to the console (presumably withWriteFile
), hence the console code page issue.Unfortunately, it appears the debugger from the "C/C++" VS code extension is doing things that make it impossible to actually inspect locals or step reliably through (misaligned memory access, AVs, etc); the joys of debugging a x86-64 Windows process running on an M2 Mac.
Therefore, I can't reliably trust what I'm seeing.
Unless someone else debugs this, this will have to wait until next week when I am back in front of a proper Windows machine.
Update: apparently an ARM windbg incorrectly sets symbol breakpoints (hence I can't easily break on
kernel32!WriteConsoleW
orntdll!NtWriteFile
) and can't correctly display the disassembly of a x64 user mode debuggee. I'll try later with a x64 windbg under emulation and see if that works.
peterhuene edited a comment on issue #6824:
I'm having a difficult time debugging from an emulated x86-64 debugger running on a Windows ARM VM; it seems like I can hit a breakpoint inside the
if !is_console(...)
check in the stdlib for seeing if it needs to convert the output to UTF-16 and useWriteConsoleW
; this would imply it is writing the raw UTF-8 bytes to the console (presumably withWriteFile
), hence the console code page issue.Unfortunately, it appears the debugger from the "C/C++" VS code extension is doing things that make it impossible to actually inspect locals or step reliably through (misaligned memory access, AVs, etc); the joys of debugging a x86-64 Windows process running on an M2 Mac.
Therefore, I can't reliably trust what I'm seeing.
Unless someone else debugs this, this will have to wait until next week when I am back in front of a proper Windows machine.
peterhuene edited a comment on issue #6824:
I'm having a difficult time debugging from an ARM debugger with a x86-64 Wasmtime running under emulation.
Unless someone else debugs this, this will have to wait until next week when I am back in front of a proper Windows machine.
peterhuene commented on issue #6824:
Indeed, as Pat mentioned in the link above, the problem lies with treating the stdout/stderr handles as a
File
with(&*self.0.as_filelike_view::<File>()).write_vectored(bufs)
.This redirects the
Write
implementation toFile
which ultimately writes the UTF-8 bytes directly to the console _without_ letting the stdlib do the conversion to UTF-16 and callWriteConsoleW
.@sunfishcode, as the cap-std author and the last person to touch that line in the WASI preview1 code, is there a reason for not wanting to simply call the
Stdout
/Stderr
impls ofwrite_vectored
?
peterhuene edited a comment on issue #6824:
Indeed, as Pat mentioned in the link above, the problem lies with treating the stdout/stderr handles as a
File
with(&*self.0.as_filelike_view::<File>()).write_vectored(bufs)
.This redirects to the
Write
implementation ofFile
which ultimately writes the UTF-8 bytes directly to the console _without_ letting the stdlib do the conversion to UTF-16 and callWriteConsoleW
.@sunfishcode, as the cap-std author and the last person to touch that line in the WASI preview1 code, is there a reason for not wanting to simply call the
Stdout
/Stderr
impls ofwrite_vectored
?
peterhuene edited a comment on issue #6824:
Indeed, as Pat mentioned in the link above, the problem lies with treating the stdout/stderr handles as a
File
with(&*self.0.as_filelike_view::<File>()).write_vectored(bufs)
.This redirects to the
Write
implementation ofFile
which ultimately writes the UTF-8 bytes directly to the console _without_ letting the stdlib do the conversion to UTF-16 and then callWriteConsoleW
.@sunfishcode, as the cap-std author and the last person to touch that line in the WASI preview1 code, is there a reason for not wanting to simply call the
Stdout
/Stderr
impls ofwrite_vectored
?
pchickey commented on issue #6824:
I expect the answer is that we had no idea it had a different impl. Lets switch it!
kpreisser edited a comment on issue #6824:
Code pages should only matter for the
*A
api's, not the*W
api's, right?Yes, as I understand the docs the current console code page should only matter for
WriteFile
andWriteConsoleA
, but not forWriteConsoleW
(but I haven't tested it).
pchickey commented on issue #6824:
@mush42 sorry, this fell off my radar - I caught up with @peterhuene in person the other day and we each thought the other was taking care of it.
We just hit merge on https://github.com/bytecodealliance/wasmtime/pull/6825. Can you build the latest wasmtime
main
and try to reproduce again? It should be fixed, but we don't have any windows users handy who can check.
alexcrichton closed issue #6824:
The problem
Given the following rust app that prints some Arabic text to the terminal:
use std::io::{self, Write}; fn main() { let text = "مرحبا بكم".to_string(); let mut stdout = io::stdout().lock(); stdout.write_all(text.as_bytes()).unwrap(); }
Compiling the app and running the
.wasm
module usingwasmtime.
gives the following garbled output:مرحبا بكم
Expected
The app should print the given text to the terminal:
"مرحبا بكم
Things work as expected on an alternative runtime like wasmer
Extra information
Platform: Windows 11 64-bit
wasmtime
invoked fromcmd.exe
Best
Musharraf
mush42 commented on issue #6824:
@mush42 sorry, this fell off my radar - I caught up with @peterhuene in person the other day and we each thought the other was taking care of it.
We just hit merge on #6825. Can you build the latest wasmtime
main
and try to reproduce again? It should be fixed, but we don't have any windows users handy who can check.I can confirm that the issue has been fixed in #6825. Thanks!
Last updated: Nov 22 2024 at 16:03 UTC