Stream: git-wasmtime

Topic: wasmtime / issue #6813 s390x, etc: Support leonbe?


view this post on Zulip Wasmtime GitHub notifications bot (Aug 07 2023 at 14:57):

SoniEx2 opened issue #6813:

Thanks for filing a feature request! Please fill out the TODOs below.

Feature

TODO: Brief description of the feature/improvement you'd like to see in
Cranelift/Wasmtime.

We want to see wasmtime use/support the leonbe addressing convention as implemented in wasm2c instead of the current lemulation addressing convention.

Benefit

TODO: What is the value of adding this in Cranelift/Wasmtime? What problems does
it solve?

Consistency with another wasm implementation. Presumably easier for the JIT to optimize. Allegedly improved performance on older BE platforms without fused byteswap opcodes. (This could all be benchmarked.)

Implementation

TODO: Do you have an implementation plan, and/or ideas for data structures or
algorithms to use?

It shouldn't be too hard, mostly changing memory instructions and some other stuff.

Alternatives

TODO: What are the alternative implementation approaches or alternative ways to
solve the problem that this feature would solve? How do these alternatives
compare to this proposal?

The main alternative is to get rid of wasm2c's leonbe and make it use lemulation, or maybe even go the extra length and make it portable to mixed-endian architectures. Unfortunately the compiler isn't allowed to entirely eliminate byteswaps with that approach.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 07 2023 at 18:47):

bjorn3 commented on issue #6813:

What does wasm2c's implementation have to do with wasmtime's? They are entirely independent wasm runtimes which can't be used together on a single wasm module.

Also what is the difference between leonbe and lemulation? I can't find anything about it on the internet.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 07 2023 at 19:25):

SoniEx2 commented on issue #6813:

lemulation (emulated LE reads on a BE host - memory values (byte reads) are LE, memory ordering (relative to position) is LE):

fn read_u32(memory: &[u8], position: u32) -> Option<u32> {
  let b0 = *memory.get(position)?;
  let b1 = *memory.get(position + 1)?;
  let b2 = *memory.get(position + 2)?;
  let b3 = *memory.get(position + 3)?;
  // generally optimized out on little-endian to a single 32-bit unaligned read
  Some(u32::from(b3) << 24 | u32::from(b2) << 16 | u32::from(b1) << 8 | u32::from(b0))
}

leonbe (observationally little-endian memory on a BE host - memory values (byte reads) are BE, memory ordering (relative to position) is LE):

fn read_u32(memory: &[u8], position: u32) -> Option<u32> {
  let b0 = *memory.get(memory.len() - position - 1)?;
  let b1 = *memory.get(memory.len() - position - 2)?;
  let b2 = *memory.get(memory.len() - position - 3)?;
  let b3 = *memory.get(memory.len() - position - 4)?;
  // generally optimized out on big-endian to a single 32-bit unaligned read
  Some(u32::from(b3) << 24 | u32::from(b2) << 16 | u32::from(b1) << 8 | u32::from(b0))
}

view this post on Zulip Wasmtime GitHub notifications bot (Aug 07 2023 at 19:27):

SoniEx2 edited a comment on issue #6813:

lemulation (emulated LE reads on a BE host - memory values (byte reads) are LE, memory ordering (relative to position) is LE):

fn read_u32(memory: &[u8], position: u32) -> Option<u32> {
  let b0 = *memory.get(position)?;
  let b1 = *memory.get(position + 1)?;
  let b2 = *memory.get(position + 2)?;
  let b3 = *memory.get(position + 3)?;
  // generally optimized out on little-endian to a single 32-bit unaligned read
  Some(u32::from(b3) << 24 | u32::from(b2) << 16 | u32::from(b1) << 8 | u32::from(b0))
}

leonbe (observationally little-endian memory on a BE host - memory values (byte reads) are BE, memory ordering (relative to position) is LE):

fn read_u32(memory: &[u8], position: u32) -> Option<u32> {
  let b0 = *memory.get(memory.len() - position - 1)?;
  let b1 = *memory.get(memory.len() - position - 2)?;
  let b2 = *memory.get(memory.len() - position - 3)?;
  let b3 = *memory.get(memory.len() - position - 4)?;
  // generally optimized out on big-endian to a single 32-bit unaligned read
  Some(u32::from(b3) << 24 | u32::from(b2) << 16 | u32::from(b1) << 8 | u32::from(b0))
}

the pain point is that wasm2c/wabt cannot implement the wasm-c-api with leonbe since wasm-c-api doesn't expect to be used this way. having more leonbe wasm engines would make standardizing a leonbe wasm-c-api easier.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2023 at 17:07):

bjorn3 commented on issue #6813:

leonbe (observationally little-endian memory on a BE host - memory values (byte reads) are BE, memory ordering (relative to position) is LE):

That is a clever technique! I think for dynamic memories (which don't reserve a 4GB chunk all at once even if the actual size of the memory is still smaller) this will require copying the memory when trying to grow it much more often as you did have to grow it towards lower addresses rather than higher addresses. The fact that you have a subtraction on the critical path of computing the address may also be bad for performance, but I don't know how much effect it will have exactly.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2023 at 17:27):

fitzgen edited issue #6813:

Thanks for filing a feature request! Please fill out the TODOs below.

Feature

We want to see wasmtime use/support the leonbe addressing convention as implemented in wasm2c instead of the current lemulation addressing convention.

Benefit

Consistency with another wasm implementation. Presumably easier for the JIT to optimize. Allegedly improved performance on older BE platforms without fused byteswap opcodes. (This could all be benchmarked.)

Implementation

It shouldn't be too hard, mostly changing memory instructions and some other stuff.

Alternatives

The main alternative is to get rid of wasm2c's leonbe and make it use lemulation, or maybe even go the extra length and make it portable to mixed-endian architectures. Unfortunately the compiler isn't allowed to entirely eliminate byteswaps with that approach.

view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2023 at 17:49):

SoniEx2 commented on issue #6813:

it's true that allocators are generally designed to enable right growth but not so much left growth, but then there are many which aren't - as an example of a real system that does this, the JVM doesn't bother, so a JVM program (like a JVM wasm runtime) has to do full copies every time - so we don't expect it to be much of an issue in practice, especially if you can move whole pages at a time (page-oriented allocators?).

but the extra subtractions can be generally optimized out. (manually if the compiler can't do it for you.)

view this post on Zulip Wasmtime GitHub notifications bot (Aug 08 2023 at 18:38):

bjorn3 commented on issue #6813:

A JVM wasm runtime has to always copy, but Wasmtime could avoid the copy. I just looked at the impl and it seems like it too always copies too though.


Last updated: Oct 23 2024 at 20:03 UTC