wasmtime / Issue #2186 Wasm on a 16-bit-ish big-endian pl... · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / Issue #2186 Wasm on a 16-bit-ish big-endian pl...

Wasmtime GitHub notifications bot (Sep 07 2020 at 02:52):

Hey, so we kinda would like to put wasm on a Sega Genesis, so that we can put Rust on a Sega Genesis. We think maintaining a wasm backend would be less effort than maintaining an LLVM backend, but from what we've been reading, the following might be an issue:

It's big-endian.

There are no floats. We don't think this is a big deal because we can just not use floats in Rust.

There's only 64k of RAM, and we somehow need to fit everything into it. It probably wouldn't be a big deal to expose the real stack to wasm, but Rust may not be able to deal with it.

A lot of stuff needs to be stored in ROM. However, we haven't been able to figure out how wasm (or well, Rust) handles strings and bytestrings in code? This could be a real problem, especially if bank switching is involved. (There's also a restriction that VDP DMA transfers can't cross 128kB(?) boundaries.)

How bad of an idea is this? It wouldn't really be in-spec, but we don't expect it to run arbitrary wasm either.

Wasmtime GitHub notifications bot (Sep 07 2020 at 09:03):

bjorn3 commented on Issue #2186:

It's big-endian.

You will need to do a byte swap when writing to the wasm memory. This is not necessary if you are writing to the wasm stack.

There are no floats. We don't think this is a big deal because we can just not use floats in Rust.

For as long as the wasm module doesn't use floats itself this is not a problem.

There's only 64k of RAM, and we somehow need to fit everything into it. It probably wouldn't be a big deal to expose the real stack to wasm, but Rust may not be able to deal with it.

Wasm pages are 64k.

A lot of stuff needs to be stored in ROM. However, we haven't been able to figure out how wasm (or well, Rust) handles strings and bytestrings in code? This could be a real problem, especially if bank switching is involved. (There's also a restriction that VDP DMA transfers can't cross 128kB(?) boundaries.)

Wasm modules copy all the data they contain to the memory on startup. Wasm doesn't have such a concept as read-only data.

Because wasm pages are 64k, the amount of data to copy as stated it always a multiple of 64k. You may be able to hack your way around this if you know the exact size of the useful data.

Wasmtime GitHub notifications bot (Sep 07 2020 at 11:19):

SoniEx2 commented on Issue #2186:

You will need to do a byte swap when writing to the wasm memory.

We won't.

Wasm modules copy all the data they contain to the memory on startup. Wasm doesn't have such a concept as read-only data.

How does this work? Does the module come with a predefined memory section that gets copied over or is there some generated startup code that we need to intercept?

Wasmtime GitHub notifications bot (Sep 07 2020 at 11:21):

bjorn3 commented on Issue #2186:

We won't.

That will only work if you tell the WASM compiler that it needs to emit big-endian code. I don't think LLVM has an option for this.

Does the module come with a predefined memory section that gets copied over or is there some generated startup code that we need to intercept?

It comes with a predefined memory section.

Wasmtime GitHub notifications bot (Sep 07 2020 at 11:22):

bjorn3 edited a comment on Issue #2186:

We won't.

That will only work if you tell the WASM compiler that it needs to emit big-endian code. I don't think LLVM has an option for this.

Does the module come with a predefined memory section that gets copied over or is there some generated startup code that we need to intercept?

It comes with a predefined memory section that gets copied over on module instantiation.

Wasmtime GitHub notifications bot (Sep 07 2020 at 11:38):

SoniEx2 commented on Issue #2186:

That will only work if you tell the WASM compiler that it needs to emit big-endian code. I don't think LLVM has an option for this.

That still sounds like an easier patch than maintaining an LLVM backend. Or we might try to use some heuristics, or get LLVM to give us a symbol table with types and just swap those in the predefined memory section. This is assuming LLVM doesn't pull an aliasing on us.

It comes with a predefined memory section that gets copied over on module instantiation.

That's good. Sounds like we might be able to get away with using a few tricks.

It also seems like a WASM implementation is allowed to trap on unaligned memory access as well, if told to do so. So we don't have to worry about that one, at least.

Wasmtime GitHub notifications bot (Sep 07 2020 at 23:26):

cfallin commented on Issue #2186:

This sounds like a really interesting project and a working wasm-on-Genesis runtime would be a fantastic hack!

Just wanted to add a bit of input on the endianness issue. It seems to me that inventing a "big-endian Wasm" platform, and modifying the Wasm generator(s) you care about to produce this, is likely to cause headaches on several levels. For one, the .wasm files you'd be operating on would not be runnable on any other Wasm platform -- so you immediately lose e.g. the ability to test something locally. Any .wasm module you might want to import from elsewhere and link to would also cause issues. So you lose (i) the benefits of the standard, deterministic, cross-platform VM, and (ii) the ability to take advantage of the ecosystem of reusable modules. Finally, you lose the ability to use any unmodified compiler that generates standard Wasm; you have to add a special hack to each compiler you use.

Basically, you bifurcate the universe into "Wasm-LE" and "Wasm-BE" and lose most of the benefits of standardization. You're likely to get lots of pushback from the Wasm community on something like that, as well: there's a strong emphasis on precise, full-defined execution semantics, standardized everywhere.

It seems that on the m68k (which Wikipedia tells me that the Genesis uses?), it's possible to endian-swap a 32-bit word in three instructions (from Stackoverflow here:
ROL.W #8,D0
SWAP.L   D0
ROL.W #8,D0
This would certainly add overhead, but seems (to me) at least feasible?

Wasmtime GitHub notifications bot (Sep 07 2020 at 23:27):

cfallin edited a comment on Issue #2186:

This sounds like a really interesting project and a working wasm-on-Genesis runtime would be a fantastic hack!

Just wanted to add a bit of input on the endianness issue. It seems to me that inventing a "big-endian Wasm" platform, and modifying the Wasm generator(s) you care about to produce this, is likely to cause headaches on several levels. For one, the .wasm files you'd be operating on would not be runnable on any other Wasm platform -- so you immediately lose e.g. the ability to test something locally. Any .wasm module you might want to import from elsewhere and link to would also cause issues. So you lose (i) the benefits of the standard, deterministic, cross-platform VM, and (ii) the ability to take advantage of the ecosystem of reusable modules. Finally, you lose the ability to use any unmodified compiler that generates standard Wasm; you have to add a special hack to each compiler you use.

Basically, you bifurcate the universe into "Wasm-LE" and "Wasm-BE" and lose most of the benefits of standardization. You're likely to get lots of pushback from the Wasm community on something like that, as well (if upstreaming this is a consideration in the future): there's a strong emphasis on precise, fully-defined execution semantics, standardized everywhere.

It seems that on the m68k (which Wikipedia tells me that the Genesis uses?), it's possible to endian-swap a 32-bit word in three instructions (from Stackoverflow here:
ROL.W #8,D0
SWAP.L   D0
ROL.W #8,D0
This would certainly add overhead, but seems (to me) at least feasible?

Wasmtime GitHub notifications bot (Sep 07 2020 at 23:27):

cfallin edited a comment on Issue #2186:

This sounds like a really interesting project and a working wasm-on-Genesis runtime would be a fantastic hack!

Just wanted to add a bit of input on the endianness issue. It seems to me that inventing a "big-endian Wasm" platform, and modifying the Wasm generator(s) you care about to produce this, is likely to cause headaches on several levels. For one, the .wasm files you'd be operating on would not be runnable on any other Wasm platform -- so you immediately lose e.g. the ability to test something locally. Any .wasm module you might want to import from elsewhere and link to would also cause issues. So you lose (i) the benefits of the standard, deterministic, cross-platform VM, and (ii) the ability to take advantage of the ecosystem of reusable modules. Finally, you lose the ability to use any unmodified compiler that generates standard Wasm; you have to add a special hack to each compiler you use.

Basically, you bifurcate the universe into "Wasm-LE" and "Wasm-BE" and lose most of the benefits of standardization. You're likely to get lots of pushback from the Wasm community on something like that, as well (if upstreaming this is a consideration in the future): there's a strong emphasis on precise, fully-defined execution semantics, standardized everywhere.

It seems that on the m68k (which Wikipedia tells me that the Genesis uses?), it's possible to endian-swap a 32-bit word in three instructions (from Stackoverflow here):
ROL.W #8,D0
SWAP.L   D0
ROL.W #8,D0
This would certainly add overhead, but seems (to me) at least feasible?

Wasmtime GitHub notifications bot (Sep 07 2020 at 23:45):

SoniEx2 commented on Issue #2186:

what about turning off type aliasing?

Wasmtime GitHub notifications bot (Sep 08 2020 at 16:27):

cfallin commented on Issue #2186:

I don't think that would help; type aliasing analysis in LLVM just informs its optimization passes about what assumptions they may make, but doesn't change how data layout is computed or code is generated. Fundamentally, the endianness is still visible in lots of different ways, e.g. in struct layout, and constant data included in the initial memory contents. The endianness is a deeply-baked-in assumption and I don't think one can predict all the breakage that might happen if you change the Wasm host's endianness without byte-swapping.

That said, the right place for a discussion on "should there be a big-endian Wasm" is probably the Wasm spec community, rather than Cranelift/wasmtime; others on e.g. https://github.com/WebAssembly/spec or https://github.com/WebAssembly/proposals might have ideas or thoughts on this!

Wasmtime GitHub notifications bot (Sep 08 2020 at 16:39):

SoniEx2 commented on Issue #2186:

for comparison, java doesn't leak endianness (altho it does expose it here and there, in deliberate locations), and we believe the only issue would be in constant data - which can hopefully be solved by simply tagging the constant data with the relevant types and optionally enforcing it at runtime.

Wasmtime GitHub notifications bot (Sep 08 2020 at 17:27):

cfallin commented on Issue #2186:

Right, the issue is that Wasm's heap model is a bit lower-level than Java's: the linear memory is byte-addressable, and it is permitted to load bytes that were stored by a 16-/32-/64-bit store instruction (and vice versa). So for the endian-switch to work, you would need to ensure that memory stored as a larger integer is only ever accessed that way. I don't know if Rust-on-LLVM-to-Wasm guarantees that, and I'd be suspicious around e.g. enum tags, packed structs in general, widening or narrowing ops combined with loads or stores, etc. Basically, you would need to audit the whole compiler for such assumptions based on endianness. Not saying this is impossible, but it seems like a very brittle solution, compared to swapping bytes on loads/stores, IMHO.

Wasmtime GitHub notifications bot (Sep 08 2020 at 17:39):

bjorn3 commented on Issue #2186:

Basically, you would need to audit the whole compiler for such assumptions based on endianness.

Not only that. For example u32::to_le_bytes is implemented by writing it to memory on little-endian systems and by first doing a byte swap on big-endian systems and then writing it to memory on big-endian systems.

Wasmtime GitHub notifications bot (Sep 08 2020 at 18:31):

SoniEx2 commented on Issue #2186:

Additional question: can we just shove rust's pseudo stack in the middle of the real stack somehow? or would we have to rely on implementation details? are there opcodes (or maybe a global called "stack") specifically for shadow stack management?

Wasmtime GitHub notifications bot (Sep 08 2020 at 18:35):

bjorn3 commented on Issue #2186:

The rust stack is an LLVM concept. There are no opcodes for it. LLVM uses a global called __stack_pointer for the rust stack.

Wasmtime GitHub notifications bot (Sep 08 2020 at 18:50):

cfallin commented on Issue #2186:

And to add a bit more: you might try to pattern-match LLVM's pseudo-stack somehow (and, say, translate pushes / pops / accesses to native stack ops in your own private opcode space), but the real problem is that the program can take addresses to objects on the stack, and that address could reach any other load/store in the program (unless you do interprocedural alias or escape analysis to prove otherwise). So now every load/store has a "is this a special stack address" test, effectively software-level address translation, which probably costs more than you would save by treating the stack specially.

Wasmtime GitHub notifications bot (Sep 08 2020 at 18:56):

SoniEx2 commented on Issue #2186:

ideally we can get away with just adjusting the global, which is something LLVM doesn't expect but we're hoping it'll be fine.

Wasmtime GitHub notifications bot (Sep 08 2020 at 19:53):

SoniEx2 commented on Issue #2186:

Basically, you would need to audit the whole compiler for such assumptions based on endianness.

Not only that. For example u32::to_le_bytes is implemented by writing it to memory on little-endian systems and by first doing a byte swap on big-endian systems and then writing it to memory on big-endian systems.

Rust lets us override that. LLVM on the other hand...

Wasmtime GitHub notifications bot (Sep 08 2020 at 19:53):

SoniEx2 edited a comment on Issue #2186:

Basically, you would need to audit the whole compiler for such assumptions based on endianness.

Not only that. For example u32::to_le_bytes is implemented by writing it to memory on little-endian systems and by first doing a byte swap on big-endian systems and then writing it to memory on big-endian systems.

Rust lets us override that. LLVM on the other hand... (which is fine, but really, we need to be able to identify ints in the data so we can swap them.)

Last updated: Apr 18 2025 at 08:04 UTC