SoniEx2 opened issue #6813:
Thanks for filing a feature request! Please fill out the TODOs below.
Feature
TODO: Brief description of the feature/improvement you'd like to see in
Cranelift/Wasmtime.We want to see wasmtime use/support the leonbe addressing convention as implemented in wasm2c instead of the current lemulation addressing convention.
Benefit
TODO: What is the value of adding this in Cranelift/Wasmtime? What problems does
it solve?Consistency with another wasm implementation. Presumably easier for the JIT to optimize. Allegedly improved performance on older BE platforms without fused byteswap opcodes. (This could all be benchmarked.)
Implementation
TODO: Do you have an implementation plan, and/or ideas for data structures or
algorithms to use?It shouldn't be too hard, mostly changing memory instructions and some other stuff.
Alternatives
TODO: What are the alternative implementation approaches or alternative ways to
solve the problem that this feature would solve? How do these alternatives
compare to this proposal?The main alternative is to get rid of wasm2c's leonbe and make it use lemulation, or maybe even go the extra length and make it portable to mixed-endian architectures. Unfortunately the compiler isn't allowed to entirely eliminate byteswaps with that approach.
bjorn3 commented on issue #6813:
What does wasm2c's implementation have to do with wasmtime's? They are entirely independent wasm runtimes which can't be used together on a single wasm module.
Also what is the difference between leonbe and lemulation? I can't find anything about it on the internet.
SoniEx2 commented on issue #6813:
lemulation (emulated LE reads on a BE host - memory values (byte reads) are LE, memory ordering (relative to
position
) is LE):fn read_u32(memory: &[u8], position: u32) -> Option<u32> { let b0 = *memory.get(position)?; let b1 = *memory.get(position + 1)?; let b2 = *memory.get(position + 2)?; let b3 = *memory.get(position + 3)?; // generally optimized out on little-endian to a single 32-bit unaligned read Some(u32::from(b3) << 24 | u32::from(b2) << 16 | u32::from(b1) << 8 | u32::from(b0)) }
leonbe (observationally little-endian memory on a BE host - memory values (byte reads) are BE, memory ordering (relative to
position
) is LE):fn read_u32(memory: &[u8], position: u32) -> Option<u32> { let b0 = *memory.get(memory.len() - position - 1)?; let b1 = *memory.get(memory.len() - position - 2)?; let b2 = *memory.get(memory.len() - position - 3)?; let b3 = *memory.get(memory.len() - position - 4)?; // generally optimized out on big-endian to a single 32-bit unaligned read Some(u32::from(b3) << 24 | u32::from(b2) << 16 | u32::from(b1) << 8 | u32::from(b0)) }
SoniEx2 edited a comment on issue #6813:
lemulation (emulated LE reads on a BE host - memory values (byte reads) are LE, memory ordering (relative to
position
) is LE):fn read_u32(memory: &[u8], position: u32) -> Option<u32> { let b0 = *memory.get(position)?; let b1 = *memory.get(position + 1)?; let b2 = *memory.get(position + 2)?; let b3 = *memory.get(position + 3)?; // generally optimized out on little-endian to a single 32-bit unaligned read Some(u32::from(b3) << 24 | u32::from(b2) << 16 | u32::from(b1) << 8 | u32::from(b0)) }
leonbe (observationally little-endian memory on a BE host - memory values (byte reads) are BE, memory ordering (relative to
position
) is LE):fn read_u32(memory: &[u8], position: u32) -> Option<u32> { let b0 = *memory.get(memory.len() - position - 1)?; let b1 = *memory.get(memory.len() - position - 2)?; let b2 = *memory.get(memory.len() - position - 3)?; let b3 = *memory.get(memory.len() - position - 4)?; // generally optimized out on big-endian to a single 32-bit unaligned read Some(u32::from(b3) << 24 | u32::from(b2) << 16 | u32::from(b1) << 8 | u32::from(b0)) }
the pain point is that wasm2c/wabt cannot implement the wasm-c-api with leonbe since wasm-c-api doesn't expect to be used this way. having more leonbe wasm engines would make standardizing a leonbe wasm-c-api easier.
bjorn3 commented on issue #6813:
leonbe (observationally little-endian memory on a BE host - memory values (byte reads) are BE, memory ordering (relative to position) is LE):
That is a clever technique! I think for dynamic memories (which don't reserve a 4GB chunk all at once even if the actual size of the memory is still smaller) this will require copying the memory when trying to grow it much more often as you did have to grow it towards lower addresses rather than higher addresses. The fact that you have a subtraction on the critical path of computing the address may also be bad for performance, but I don't know how much effect it will have exactly.
fitzgen edited issue #6813:
Thanks for filing a feature request! Please fill out the TODOs below.
Feature
We want to see wasmtime use/support the leonbe addressing convention as implemented in wasm2c instead of the current lemulation addressing convention.
Benefit
Consistency with another wasm implementation. Presumably easier for the JIT to optimize. Allegedly improved performance on older BE platforms without fused byteswap opcodes. (This could all be benchmarked.)
Implementation
It shouldn't be too hard, mostly changing memory instructions and some other stuff.
Alternatives
The main alternative is to get rid of wasm2c's leonbe and make it use lemulation, or maybe even go the extra length and make it portable to mixed-endian architectures. Unfortunately the compiler isn't allowed to entirely eliminate byteswaps with that approach.
SoniEx2 commented on issue #6813:
it's true that allocators are generally designed to enable right growth but not so much left growth, but then there are many which aren't - as an example of a real system that does this, the JVM doesn't bother, so a JVM program (like a JVM wasm runtime) has to do full copies every time - so we don't expect it to be much of an issue in practice, especially if you can move whole pages at a time (page-oriented allocators?).
but the extra subtractions can be generally optimized out. (manually if the compiler can't do it for you.)
bjorn3 commented on issue #6813:
A JVM wasm runtime has to always copy, but Wasmtime could avoid the copy. I just looked at the impl and it seems like it too always copies too though.
Last updated: Jan 24 2025 at 00:11 UTC