yurydelendik opened issue #4315:
While working on the SIMD tests at https://github.com/bytecodealliance/sightglass/pull/189, I found that this file is not executed properly. The app informs:
Size: 400x400 ojph error 0x000300A1 at ojph_codestream.cpp:4067: Error decoding a codeblock
Though expected something like:
Size: 400x400 Line 100: [0]: 37 36 35 36 37....
It appears to be running correctly on Node and SM, so I assume it is an issue of correctness. Attaching a compiled benchmark (minus bench_start/_end):
yurydelendik edited issue #4315:
While working on the SIMD tests at https://github.com/bytecodealliance/sightglass/pull/189, I found that this file is not executed properly. The app informs:
Size: 400x400 ojph error 0x000300A1 at ojph_codestream.cpp:4067: Error decoding a codeblock
Though expected something like:
Size: 400x400 Line 100: [0]: 37 36 35 36 37....
It appears to be running on Node and SM, so I assume it is an issue of correctness. Attaching the compiled benchmark (minus bench_start/_end):
yurydelendik edited issue #4315:
While working on the SIMD tests at https://github.com/bytecodealliance/sightglass/pull/189, I found that this file is not executed properly. The app informs:
Size: 400x400 ojph error 0x000300A1 at ojph_codestream.cpp:4067: Error decoding a codeblock
Though expected something like:
Size: 400x400 Line 100: [0]: 37 36 35 36 37....
It appears to be running on Node and SM, so I assume it is an issue of correctness. Attaching the compiled benchmark (minus bench_start/_end):
P.S. on Intel CPU x64
yurydelendik commented on issue #4315:
Also, works on wasmtime for ARM64
alexcrichton commented on issue #4315:
I've done some work to narrow this down but unfortunately I'm not being super successful at the moment. My strategy for narrowing this down has been:
- Modify the CLI to print the sha256 of memory after running the wasm
- Modify fuel to get checked before every single instruction instead of just loop headers and function entries
Next I have a script which runs the binary on aarch64 and x86_64 with a binary-search-of-sorts. This pins down the precise fuel count where with one more fuel the execution differs. My goal here is to find the precise wasm instruction at which execution diverges here. This quickly-ish found results and I started sprinkling "debug stores" within the
*.wat
. Basically I'd take a value of the stack, shove it in a temporary global, store the global to linear memory address 0, and then load the global back onto the stack. The goal here was a quick-and-dirty way of debugging the wasm value stack.In doing this I've narrowed this down to function 40 in the input file above. Specifically these two instructions:
0x4942 | fd 0c 00 01 | V128Const { value: V128([0, 1, 0, 1, 0, 1, 0, 1, 4, 5, 4, 5, 4, 5, 4, 5]) } | 00 01 00 01 | 00 01 04 05 | 04 05 04 05 | 04 05 0x4954 | fd 0e | I8x16Swizzle
Unfortunately this
v128.const
value is not actually going intoi8x16.swizzle
. With my "debug store" inserted between these two instructions it turns out that the value stored in memory is indeed not thev128.const
here. Instead some other bit-pattern gets stored in memory and naturally when that gets fed intoi8x16.swizzle
everything goes awry.Other than that though I have now hit a wall and am unable to make progress debugging this. Function 40 is huge and it's tough to explore the disassembly. Using
rr
I can find the portion of the function that actually creates the value which eventually gets fed intoi8x16.swizzle
. This _should_ feed in thev128.const
value but it is not. Unfortunately I can't figure out how to translate this random snippet of assembly back to the source file.I have confirmed, though, that 0.35.0 reproduces this issue so I don't believe this is a problem with recent developments like regalloc2, recent ISLE migrations, or the alias analysis pass added.
Others who are more familiar with the backend may know more about how to debug this though. IIRC there's some optimizations around constants, constant pools, and trying to not always reify something into a register or something like that, and something may be going awry there.
... aaaand as I am writing this up I took another look at the definition of the
Swizzle
instruction. This looks suspicious, specifically treating the mask as a writable reg since I think in this case it's the mask that's getting corrupted going into the swizzle instruction.I added a move from the input to a temporary register and that didn't actually fix the original program but it did surprisingly allow it to make more progress. I think that means there's actually at least two bugs here!
Doing my bisection again I think I found a much simpler reproduction, namely:
(module (func (param v128 v128 i32) (result v128) local.get 0 local.get 1 local.get 2 select ))
I don't think that our codegen for the
select
instruction is correct for v128 values. Namely this produces:0000000000000000 <_wasm_function_0>: 0: 55 push %rbp 1: 48 89 e5 mov %rsp,%rbp 4: 85 d2 test %edx,%edx 6: 0f 84 04 00 00 00 je 10 <_wasm_function_0+0x10> c: f3 0f 10 c8 movss %xmm0,%xmm1 10: 66 0f 6f c1 movdqa %xmm1,%xmm0 14: 48 89 ec mov %rbp,%rsp 17: 5d pop %rbp 18: c3 retq
where the
movss
I don't believe is correct, I think that needs to bemovdqa
or something sized larger.I'm running out of steam for tonight so I think this is as far as I'll get today. I'm actually quite worried that fuzzing hasn't discovered anything in this area. These are somewhat trivial bugs which in theory should be found almost immediately by any halfway decent fuzzing.
alexcrichton commented on issue #4315:
Ok I can confirm that with https://github.com/bytecodealliance/wasmtime/pull/4317 and https://github.com/bytecodealliance/wasmtime/pull/4318 the original test case now has the same behavior on arm64 and on x86_64
alexcrichton commented on issue #4315:
I realize now I forgot to say this earlier but thanks @yurydelendik for taking the time to file this! We'll be making an 0.38.1 release with these fixes soon and a few other fixes for Cranelift (unrelated to this).
We also plan on filing a CVE about these micompiles because while they don't affect hosts themselves this could affect in-wasm execution which is often just as important in some environments.
alexcrichton commented on issue #4315:
I believe everything is now merged so I'm going to close this.
alexcrichton closed issue #4315:
While working on the SIMD tests at https://github.com/bytecodealliance/sightglass/pull/189, I found that this file is not executed properly. The app informs:
Size: 400x400 ojph error 0x000300A1 at ojph_codestream.cpp:4067: Error decoding a codeblock
Though expected something like:
Size: 400x400 Line 100: [0]: 37 36 35 36 37....
It appears to be running on Node and SM, so I assume it is an issue of correctness. Attaching the compiled benchmark (minus bench_start/_end):
P.S. on Intel CPU x64
Last updated: Jan 24 2025 at 00:11 UTC