Hi. I want to work on the C# bindgen a little bit. I notice a lot of very strange generated code, that I suspect can be shortened and optimized a fair amount.
I'm (for now) mainly interested in string-related binding generation.
Here's a sample generated from ./tests/codegen/strings.wit.
Exports:
[UnmanagedCallersOnly(EntryPoint = "foo:foo/strings#b")]
public static unsafe nint wasmExportB() {
string ret;
ret = StringsImpl.B();
var ptr = InteropReturnArea.returnArea.AddressOfReturnArea();
var stringSpan = MemoryExtensions.AsSpan(ret);
var length = Encoding.UTF8.GetByteCount(stringSpan);
var strPtr = NativeMemory.Alloc((nuint)length);
Encoding.UTF8.GetBytes(stringSpan, new Span<byte>(strPtr, length));
BitConverter.TryWriteBytes(new Span<byte>((void*)(ptr + 4), 4), length);
BitConverter.TryWriteBytes(new Span<byte>((void*)(ptr + 0), 4), (int)strPtr);
return ptr;
}
Imports:
internal static class CWasmInterop
{
[DllImport("foo:foo/strings", EntryPoint = "c"), WasmImportLinkage]
internal static extern void wasmImportC(nint p0, int p1, nint p2, int p3, nint p4);
}
public static unsafe string C(string a, string b)
{
var cleanups = new List<global::System.Action>();
var utf8Bytes = Encoding.UTF8.GetBytes(a);
var length = utf8Bytes.Length;
var gcHandle = GCHandle.Alloc(utf8Bytes, GCHandleType.Pinned);
var strPtr = gcHandle.AddrOfPinnedObject();
cleanups.Add(()=> gcHandle.Free());
var utf8Bytes1 = Encoding.UTF8.GetBytes(b);
var length2 = utf8Bytes1.Length;
var gcHandle3 = GCHandle.Alloc(utf8Bytes1, GCHandleType.Pinned);
var strPtr0 = gcHandle3.AddrOfPinnedObject();
cleanups.Add(()=> gcHandle3.Free());
var retArea = stackalloc uint[2+1];
var ptr = ((int)retArea) + (4 - 1) & -4;
CWasmInterop.wasmImportC(strPtr.ToInt32(), length, strPtr0.ToInt32(), length2, ptr);
foreach (var cleanup in cleanups)
{
cleanup();
}
return Encoding.UTF8.GetString((byte*)BitConverter.ToInt32(new Span<byte>((void*)(ptr + 0), 4)), BitConverter.ToInt32(new Span<byte>((void*)(ptr + 4), 4)));
}
I'm confused about a few things:
What is the "return area"? What does it look like in native code? I imagine it must be either some native struct/some other kind of buffer containing addresses/pointers in WASM space? The generated code looks really strange to me in that regard; do I understand correctly that pointers are 4 bytes in size in WASM? Why is a stackalloc being used? Why is it 1 byte larger than necessary?
Is there a particular reason behind the use of GCHandle over fixed? Eliminating the use of GCHandle would eliminate all the need for cleanup.
A lot of the uses of Spans seems very redundant to me, but I have a hard time parsing why they were used and what exactly they do.
If someone has the time to explain the flow of data and what parts represent what, I'd greatly appreciate it.
What is the "return area"? What does it look like in native code? I imagine it must be either some native struct/some other kind of buffer containing addresses/pointers in WASM space?
Yes its a buffer. It can contain pointers or actual data dependingon the type of the ABI call for the data being passed.
do I understand correctly that pointers are
4bytes in size in WASM?
Most of the time, though it will be possible to do 64bit in the future.
Why is a
stackallocbeing used? Why is it1byte larger than necessary?
When the memory doesn't need to be kept around, we can use stackalloc to avoid allocating on the heap. The extra byte is becuase .net stack alloc doesn't always align on the wasm memory boundries, so we need to over allocate and shift the bytes to align on 32 bit boundry
Is there a particular reason behind the use of
GCHandleoverfixed? Eliminating the use ofGCHandlewould eliminate all the need for cleanup.
Fixed won't work in edge cases of variants, it could be that we switch to the using fixed in all the cases it does work then use GHChandle when we can't but it would add some complexity to code generation.
A lot of the uses of
Spans seems very redundant to me, but I have a hard time parsing why they were used and what exactly they do.
Yes I think they are probably overused, there were several of us trying to get this up and running with various skill levels. I for one was new to low level c# so could be to blame for some of it :upside_down: . We have a few tracking issues for clean this and other things up.
https://github.com/bytecodealliance/wit-bindgen/issues/1150
https://github.com/bytecodealliance/wit-bindgen/issues/1143
@James Sturtevant Can you explain what's really going on with the retArea?
var retArea = stackalloc uint[2+1];
var ptr = ((int)retArea) + (4 - 1) & -4;
Why is this truncating a 64-bit address (uint* retArea) to a 32-bit integer? It must lose some crucial bits of the address. How is the WASM runtime able to write to such an incomplete pointer, one that does not exist in C# space?
There's a lot of difficulty in what I actually want to do.
Let's take a sample string import:
internal static class SampleWasmInterop
{
[DllImport("foo:foo/strings", EntryPoint = "sample"), WasmImportLinkage]
internal static extern void wasmImportSample(nint p0, int p1, nint p2);
}
public static unsafe string Sample(string value)
{
var cleanups = new List<Action>();
var utf8Bytes = Encoding.UTF8.GetBytes(value);
var length = utf8Bytes.Length;
var gcHandle = GCHandle.Alloc(utf8Bytes, GCHandleType.Pinned);
var strPtr = gcHandle.AddrOfPinnedObject();
cleanups.Add(()=> gcHandle.Free());
var retArea = stackalloc uint[2+1];
var ptr = ((int)retArea) + (4 - 1) & -4;
SampleWasmInterop.wasmImportSampleImport(strPtr.ToInt32(), length, ptr);
foreach (var cleanup in cleanups)
{
cleanup();
}
return Encoding.UTF8.GetString((byte*)BitConverter.ToInt32(new Span<byte>((void*)(ptr + 0), 4)), BitConverter.ToInt32(new Span<byte>((void*)(ptr + 4), 4)));
}
What I want to achieve is the following:
file struct StringReturnArea
{
public int Address;
public int Length;
}
[GeneratedCode("wit-bindgen", "0.42.1")]
[SkipLocalsInit]
public static unsafe string Sample(string value)
{
StringReturnArea __retArea = default;
string __retVal = default;
byte* __valueNative = default;
scoped Utf8StringMarshaller.ManagedToUnmanagedIn __valueNativeMarshaller = new();
try
{
__valueNativeMarshaller.FromManaged(value, stackalloc byte[Utf8StringMarshaller.ManagedToUnmanagedIn.BufferSize]);
{
__valueNative = __valueNativeMarshaller.ToUnmanaged();
__wasmImport(__valueNative, value.Length, &__retArea);
}
__retVal = Encoding.UTF8.GetString((byte*)__retArea.Address, __retArea.Length);
}
finally
{
__valueNativeMarshaller.Free();
}
return __retVal;
[DllImport("foo:foo/strings", EntryPoint = "sample")]
[WasmImportLinkage]
static extern void __wasmImport(byte* valueNative, int valueLength, StringReturnArea* retArea);
}
Now, there may be some issues with the alignment you mentioned on __valueNative and &__retArea. That should be easy to handle.
My problem is my lack of Rust knowledge. I wouldn't know where to begin working on this.
An extra layer of difficulty is added by the fact that we'd need to check all input parameters and make sure they don't overlap with any of the generated locals.
We dont support memory64 (and nor does wasmtime yet I think), all our addresses are 32 bit so there is no truncation
Wasmtime does support memory64 for core modules, but not for components since the Component Model spec does not yet include support for memory64.
I.e. as far as wit-bindgen is concerned, you're correct that we're wasm32-only for now.
Thanks for the clarification!
@Scott Waye So when my C# project targets wasi-wasm, it's always going to be 32-bit (wasm32?) by default? I was concerned the host app could be 64-bit, which would cause an issue.
Does the component model support interaction between different bitness of host and component? I've not come across that before. @Joel Dice any idea?
But the answer is yes to the c# side, we only support compiling to 32bit wasm
Yeah, the host knows that wasm32 guest pointers are 32-bit even if host pointers are 64-bit, and there's never any ambiguity about whether a pointer is a guest pointer or a host pointer. Bitness aside, confusing guest pointers (which are offsets into the guest's linear memory) and host pointers (which are offsets into the host's virtual address space) would be catastrophic in any case.
Furthermore, components can be composed, meaning more than one linear memory may be in play, in which case a guest pointer must always be relative to a specific linear memory.
Why is this truncating a 64-bit address (
uint* retArea) to a 32-bit integer?
I am not sure If I understand this statement. uint* is 32 bit pointer saying it points to uint type and is same as void* and here we are casting the void* to an int. Maybe its safer to do something like `new IntPtr(void* ptr).toInt32()
Thanks, @ero seems like we are saying its all good, nothing can go wrong :-)
Joel Dice said:
Furthermore, components can be composed, meaning more than one linear memory may be in play, in which case a guest pointer must always be relative to a specific linear memory.
does component model support mutlple memories at this point? I thought it didn't
James Sturtevant said:
does component model support mutlple memories at this point? I thought it didn't
Sure; you can use wasm-tools compose or wac to take two (or more) components, each with their own linear memory, and create a new component that bundles the original two (or more) as subcomponents.
You can also componentize a core module that has more than one linear memory, if desired.
Although creating such a core module using e.g. LLVM might be a challenge. It's not hard to do by writing WAT directly, though.
Sure; you can use
wasm-tools composeor wac to take two (or more) components, each with their own linear memory, and create a new component that bundles the original two (or more) as subcomponents.
Yea this makes sense.
You can also componentize a core module that has more than one linear memory, if desired.
it was this scenario I didn't think would nessarily work. good too know!
Fun fact: Wasmtime synthesizes core modules internally to "glue together" composed components, one of which imports a function exported by the other. Each such module imports two linear memories: one from the importing component and the other from the exporting component. This allows it to "lift" parameters from the caller and "lower" them to the callee, and then vice-versa for the returned result.
@James Sturtevant
I am not sure If I understand this statement. uint* is 32 bit pointer
Yes, I wasn't sure whether this was the case. I thought potentially the app calling the WASM imports or exports can be 64-bit. I see now that this doesn't really work.
I think the C# crate needs some major changes to accomodate for my suggestions. The snippet I showed above is how the .NET base library does it and I would feel much more at ease using wit-bindgen if this was the code generated.
I unfortunately can't assist much in making these adjustments.
Last updated: Dec 06 2025 at 06:05 UTC