Stream: C#/.net-collaboration

Topic: return pointer not aligned


view this post on Zulip Joel Dice (Jun 03 2024 at 15:05):

@Scott Waye we're seeing "return pointer not aligned" failures in wit-bindgen CI again: https://github.com/bytecodealliance/wit-bindgen/actions/runs/9321335334/job/25669483093?pr=960. I think you tried to address that in https://github.com/bytecodealliance/wit-bindgen/commit/bcd136e583c51cb6819876478bdfd6bf96e1ea2d, but looks like maybe it needs more work?

I'm going to try to reproduce it and see if I can learn anything.

A language binding generator for WebAssembly interface types - full qualify Address to avoid collision with wit record of same name · bytecodealliance/wit-bindgen@bd057f2
* uncomment to force crash * fix GC hole with return areas * remove vscode files * add comment remove sln from git * format * Close all fixed blocks * correct tmp variable name use * correc...

view this post on Zulip Joel Dice (Jun 07 2024 at 00:26):

One of the tricks I used when working on the Java bindings was to allocate extra space in the return area (e.g. up to 8 extra bytes for 8-byte alignment) and then align the pointer explicitly before passing it to the host, i.e. so that it points to the first byte in the array that has the alignment we need. Shall we do the same for C#? If there's not a better fix on the horizon, I think we need to do something, because keeps popping up (e.g. here)

This addresses part of #964. For WIT functions returning result<T, E>, we now generate methods which return T and throw WitException such that the E value may be retrieved using WitException.Value....

view this post on Zulip Scott Waye (Jun 07 2024 at 11:25):

I'd like to try this locally first, give me until Monday please.

view this post on Zulip Scott Waye (Jun 07 2024 at 20:28):

good news is that it does repro locally for me.

view this post on Zulip Scott Waye (Jun 07 2024 at 21:31):

Is it possible to run the runtime tests with wasmtime? I.e. get a single wasm file so I can run wasmtime under lldb ?

view this post on Zulip Joel Dice (Jun 07 2024 at 21:36):

You can find the components using e.g. find target/ -name *.component.wasm, but they all use custom worlds, so you can't treat them as wasi:cli commands and run them using wasmtime run, unfortunately.

However, you could edit the run_test_from_dir function in tests/runtime/main.rs and add a config.debug_info(true); line near the top. Then run e.g. cargo test --no-default-features --features csharp-naot, which will print the binary name used (e.g. target/debug/deps/runtime-448f003ef70967db). You can run that under LLDB.

view this post on Zulip Joel Dice (Jun 07 2024 at 21:39):

From what I've heard there are issues with Wasmtime preserving debug info for components, but maybe those have been resolved?

view this post on Zulip Scott Waye (Jun 07 2024 at 21:43):

thanks, I will have a try

view this post on Zulip Scott Waye (Jun 07 2024 at 21:57):

hmmm, doesn't seem to want to resolve the breakpoints which is what you said was possibly still a problem

view this post on Zulip Joel Dice (Jun 07 2024 at 21:58):

Yeah, that's what happened last time I tried it, unfortunately. @Ralph might know the status of debugging components.

view this post on Zulip Scott Waye (Jun 07 2024 at 22:05):

The rust breakpoints work which might give a clue

view this post on Zulip Scott Waye (Jun 07 2024 at 22:08):

I think this is telling me it has failed in the first call_enum_error

(lldb) thread step-over
Process 45200 stopped
* thread #5, stop reason = step over
    frame #0: 0x00007ff7c04b31a6 runtime-f95e984830c7562b.exe`static union enum2$<core::result::Result<tuple$<>,anyhow::Error> > runtime::results::run_test(results=0x000000df38cff2a0, results=0x000000df38cff2a0, results=0x000000df38cff2a0, results=0x000000df38cff2a0, results=0x000000df38cff2a0, results=0x000000df38cff2a0, results=0x000000df38cff2a0, results=0x000000df38cff2a0, store=0x000000df38cff090) at results.rs:101
   98       );
   99
   100      assert_eq!(
-> 101          results.interface0.call_enum_error(&mut *store, 0.0)?,
   102          Err(E::A)
   103      );
   104      assert_eq!(
(lldb) thread step-over
Process 45200 stopped
* thread #5, stop reason = step over
    frame #0: 0x00007ff7c04b4101 runtime-f95e984830c7562b.exe`static union enum2$<core::result::Result<tuple$<>,anyhow::Error> > runtime::results::run_test(results=0x000000df38cff2a0, results=0x000000df38cff2a0, results=0x000000df38cff2a0, results=0x000000df38cff2a0, results=0x000000df38cff2a0, results=0x000000df38cff2a0, results=0x000000df38cff2a0, results=0x00007ff7c4bc7d48, store=0x000000df38cff090) at results.rs:166
   163      );
   164
   165      Ok(())
-> 166  }

view this post on Zulip Joel Dice (Jun 07 2024 at 22:16):

Yeah, I think that's just telling us what we already know: Wasmtime is trapping with a return pointer not aligned error, which gets returned to the call_enum_error caller as a Result::Err(...), which leads to an early return due to the ? operator.

view this post on Zulip Joel Dice (Jun 07 2024 at 22:16):

Basically you're only able to debug the host code, which isn't terribly useful here. It's the guest code we're interested in.

view this post on Zulip Joel Dice (Jun 07 2024 at 22:17):

And until we get the debug info issue sorted, we're stuck with e.g. Console.WriteLine-based debugging, AFAICT.

view this post on Zulip Scott Waye (Jun 07 2024 at 22:18):

Which has the annoying habit of moving heap related bugs around

view this post on Zulip Ralph (Jun 07 2024 at 22:18):

Scott, how are you debugging this component?

view this post on Zulip Scott Waye (Jun 07 2024 at 22:18):

Slightly interesting that the string Err test is ok

view this post on Zulip Ralph (Jun 07 2024 at 22:18):

Two months ago, components did not catch any breakpoints

view this post on Zulip Scott Waye (Jun 07 2024 at 22:18):

WIth lldb

view this post on Zulip Scott Waye (Jun 07 2024 at 22:19):

breakpoint set --file results.rs --line 91

view this post on Zulip Ralph (Jun 07 2024 at 22:19):

Right, I'll need the compile steps and the invocation

view this post on Zulip Scott Waye (Jun 07 2024 at 22:19):

And settings set target.run-args results::run

view this post on Zulip Ralph (Jun 07 2024 at 22:19):

Can llvm-dwarfdump pick up the dwarf in the component?

view this post on Zulip Scott Waye (Jun 07 2024 at 22:20):

And getting the exe name from

C:\github\wit-bindgen>cargo test --package wit-bindgen-cli --test runtime --features csharp-naot --no-default-features -- results::run --exact
   Compiling test-artifacts v0.0.0 (C:\github\wit-bindgen\crates\test-rust-wasm\artifacts)
   Compiling wit-bindgen-cli v0.26.0 (C:\github\wit-bindgen)
    Finished `test` profile [unoptimized + debuginfo] target(s) in 1m 16s
     Running tests\runtime\main.rs (target\debug\deps\runtime-f95e984830c7562b.exe)

view this post on Zulip Scott Waye (Jun 07 2024 at 22:21):

lldb invocation

lldb target\debug\deps\runtime-f95e984830c7562b.exe

view this post on Zulip Ralph (Jun 07 2024 at 22:21):

Trying to see where the symbols are inserted into the component

view this post on Zulip Joel Dice (Jun 07 2024 at 22:21):

Regarding the bug itself: the pattern I've seen so far is that ulong[] array bodies are only sometimes 8-byte aligned, which might mean they're really 4-byte aligned and 8-byte alignment is only coincidental. Not sure if that's a C# ABI thing or what.

view this post on Zulip Ralph (Jun 07 2024 at 22:21):

We can ask for sure

view this post on Zulip Ralph (Jun 07 2024 at 22:22):

Had a lovely lunch with the crew yesterday about this in fact

view this post on Zulip Joel Dice (Jun 07 2024 at 22:22):

I love chatting about alignment at lunch.

view this post on Zulip Ralph (Jun 07 2024 at 22:22):

Oh shit now I'm going to have to compile this and test the debugging

view this post on Zulip Ralph (Jun 07 2024 at 22:22):

What llvm version are you using?

view this post on Zulip Ralph (Jun 07 2024 at 22:22):

I know what's happening during my weekend!

view this post on Zulip Joel Dice (Jun 07 2024 at 22:23):

I'm using LLVM 18, but I can't reproduce this locally, so that might be a data point. Not sure what Scott's using ATM.

view this post on Zulip Scott Waye (Jun 07 2024 at 22:24):

17.0.1 Win x64

view this post on Zulip Joel Dice (Jun 07 2024 at 22:24):

AFAIK, this has only ever happened on Windows (i.e. components built on Windows), which seems bizarre. Wit-bindgen CI has been updated to use WASI-SDK 22 (and thus LLVM 18), and it's happening there, but we only test on Windows currently in CI.

view this post on Zulip Scott Waye (Jun 07 2024 at 22:25):

For debugging that is, for the build whatever is in Joel's branch which I assume is wasi-sdk 22

view this post on Zulip Joel Dice (Jun 07 2024 at 22:26):

technically there's nothing in wit-bindgen that forces you to use a specific wasi-sdk -- it really just controls what CI uses

view this post on Zulip Scott Waye (Jun 07 2024 at 22:27):

oh wait, we use WASI_SDK_PATH I think so 21 is what I seem to have, curious

view this post on Zulip Scott Waye (Jun 07 2024 at 22:30):

scott@surface5:~$ llvm-dwarfdump /mnt/c/github/wit-bindgen/target/runtime-tests/results/csharp-results/csharp-wasm.component.wasm
error: /mnt/c/github/wit-bindgen/target/runtime-tests/results/csharp-results/csharp-wasm.component.wasm: invalid version number: 65549
scott@surface5:~$ llvm-dwarfdump --version
LLVM (http://llvm.org/):
  LLVM version 17.0.4
  Optimized build.
scott@surface5:~$

Not terribly interesting

view this post on Zulip Ralph (Jun 07 2024 at 22:31):

Yup

view this post on Zulip Scott Waye (Jun 07 2024 at 22:31):

I used linux for that as I don't have a windows version of llvm-dwarfdump installed

view this post on Zulip Ralph (Jun 07 2024 at 22:31):

That's standard. Dwarf dump doesn't know the file type

view this post on Zulip Ralph (Jun 07 2024 at 22:31):

Ah, yes

view this post on Zulip Ralph (Jun 07 2024 at 22:31):

OK, so as a complete Linux guy, this windows thing could well be quite different

view this post on Zulip Ralph (Jun 07 2024 at 22:32):

Which means I'm likely going to have to set up a compete window build

view this post on Zulip Ralph (Jun 07 2024 at 22:32):

Boooooooooo

view this post on Zulip Joel Dice (Jun 07 2024 at 22:33):

@Scott Waye would you mind posting your copy of target/runtime-tests/results/csharp-results/csharp-wasm.component.wasm here? I'd like to run it on Linux to see what happens. Also, I'll post my Linux-built copy.

view this post on Zulip Scott Waye (Jun 07 2024 at 22:33):

Unless we can repro with a host program that uses the rust code, and link with wasm-tools or the new thing if that is ready now

view this post on Zulip Joel Dice (Jun 07 2024 at 22:34):

csharp-wasm.component.wasm

view this post on Zulip Scott Waye (Jun 07 2024 at 22:35):

csharp-wasm.component.wasm
Sure thing

view this post on Zulip Joel Dice (Jun 07 2024 at 22:36):

I'm going to hack my tests/runtime/main.rs to run the component without trying to rebuild it.

view this post on Zulip Joel Dice (Jun 07 2024 at 22:44):

ok, so I am able to reproduce it with Scott's build. So the runtime behavior is consistent, at least.

view this post on Zulip Scott Waye (Jun 07 2024 at 22:45):

ok, that's good

view this post on Zulip Scott Waye (Jun 07 2024 at 22:47):

sorry its late here. I'm out tomorrow but will have a look again on Sunday. Have a good evening

view this post on Zulip Joel Dice (Jun 07 2024 at 22:49):

You too!

When you have a chance, please try copying my build to target/runtime-tests/results/csharp-results/csharp-wasm.component.wasm, applying the hacky patch below, and running cargo test --no-default-features --features csharp-naot results::run to verify that my build works.

diff --git a/tests/runtime/main.rs b/tests/runtime/main.rs
index fd89888..504b03c 100644
--- a/tests/runtime/main.rs
+++ b/tests/runtime/main.rs
@@ -145,7 +145,7 @@ fn tests(name: &str, dir_name: &str) -> Result<Vec<PathBuf>> {
     let wasi_adapter =
         std::fs::read(&test_artifacts::ADAPTER).context("failed to read the wasi adapter")?;

-    drop(std::fs::remove_dir_all(&out_dir));
+    //drop(std::fs::remove_dir_all(&out_dir));
     std::fs::create_dir_all(&out_dir)?;

     if cfg!(feature = "rust") && !rust.is_empty() {
@@ -664,7 +664,7 @@ fn tests(name: &str, dir_name: &str) -> Result<Vec<PathBuf>> {
         for path in c_sharp.iter() {
             let world_name = &resolve.worlds[world].name;
             let out_dir = out_dir.join(format!("csharp-{}", world_name));
-            drop(fs::remove_dir_all(&out_dir));
+            //drop(fs::remove_dir_all(&out_dir));
             fs::create_dir_all(&out_dir).unwrap();

             for csharp_impl in &c_sharp {
@@ -765,7 +765,7 @@ fn tests(name: &str, dir_name: &str) -> Result<Vec<PathBuf>> {
                     out_wasm
                 ));
             let component_path = out_wasm.with_extension("component.wasm");
-            fs::write(&component_path, component).expect("write component to disk");
+            //fs::write(&component_path, component).expect("write component to disk");

             result.push(component_path);
         }

view this post on Zulip Scott Waye (Jun 08 2024 at 08:43):

Ralph said:

That's standard. Dwarf dump doesn't know the file type

However https://wa2.dev does identify that the sections are present
image.png

view this post on Zulip Scott Waye (Jun 08 2024 at 08:44):

Joel Dice said:

ok, so I am able to reproduce it with Scott's build. So the runtime behavior is consistent, at least.

And I confirm that your wasm runs successfully on windows.

view this post on Zulip Scott Waye (Jun 09 2024 at 21:45):

https://github.com/dotnet/runtimelab/issues/2606 I created this question because I started to try aligning the return area correctly, but I may be missing something simple. As an alternative we could drop the thread static for now, just leak the buffers, just to pass the tests, obviously we need a correct solution, there is another cleanup that is missing also while we are talking about missing bits.

WIth this code public static class InteropReturnArea { [InlineArray(3)] internal struct ReturnArea { private ulong buffer; internal unsafe nint AddressOfReturnArea() { return (nint)Unsafe.AsPointer...

view this post on Zulip Joel Dice (Jun 13 2024 at 13:44):

@Scott Waye I saw https://github.com/dotnet/runtimelab/pull/2609 has been merged (thanks so much for doing that!), so I tried re-running the failed jobs for https://github.com/bytecodealliance/wit-bindgen/pull/968, expecting it to pick up the new NuGet package, but I'm still getting a "return pointer not aligned". Is there something I need to do to make sure the latest release is used?

This PR is a workaround to align all thread statics to 8 byte boundary to ensure those statics that do require 8 byte alignment are aligned. Follows #2606
This addresses part of #964. For WIT functions returning result<T, E>, we now generate methods which return T and throw WitException such that the E value may be retrieved using WitException.Value....

view this post on Zulip Ralph (Jun 13 2024 at 15:55):

Yup, good test

view this post on Zulip Scott Waye (Jun 19 2024 at 19:31):

The packages are published, I will try a bit later. A little confused as https://github.com/bytecodealliance/wit-bindgen/pull/968/commits/5d0455a1ecd97fe2488374a8fc09001c3936a43a was green a few days ago. Did you comment out the test?

This addresses part of #964. For WIT functions returning result<T, E>, we now generate methods which return T and throw WitException such that the E value may be retrieved using WitException....

view this post on Zulip Joel Dice (Jun 19 2024 at 19:37):

Oh yeah, I updated the test I added in that PR to only use 32-bit values as a workaround, since I'd didn't want it blocked on an unrelated issue.

view this post on Zulip Scott Waye (Jun 19 2024 at 19:37):

got it, thanks


Last updated: Dec 13 2025 at 20:04 UTC