bjorn3 commented on issue #3153:
Heaps are always precisely sized. This means that every call to
memory.grow will incur a memcpy of memory from the old heap to the
new. We probably want to at least look into mremap on Linux and
otherwise try to implement schemes where dynamic heaps have some
reserved pages to grow into to help amortize the cost of
memory.grow.I believe mmap has a flag that prevents overwriting already mapped memory. Combining this with MAP_FIXED would allow growing the heap without moving part of the time.
github-actions[bot] commented on issue #3153:
Subscribe to Label Action
cc @fitzgen, @peterhuene
<details>
This issue or pull request has been labeled: "cranelift", "cranelift:meta", "cranelift:wasm", "fuzzing", "wasmtime:c-api"Thus the following users have been cc'd because of the following labels:
- fitzgen: fuzzing
- peterhuene: wasmtime:c-api
To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.
Learn more.
</details>
alexcrichton commented on issue #3153:
Hm ok so I may need some assistance with that error. Currently we have:
enum InstructionData { HeapData(Opcode /* u16 */, Value /* u32 */, ir::Heap /* u32 */, imm: ir::immediates::Uimm32 /* u32 */), // ... }
This results in a 16-byte enum size in total because the discriminant can be packed into the space preceding
Opcode
. This PR changes theUimm32
there toUimm64
, which inflates the size of the enum to 24 bytes:struct Before(u8, u16, u32, u32, u32); // .. vs .. struct After(u8, u16, u32, u32, u64);
and there's a test asserting that this enum is 16-bytes in size.
I'm having trouble figuring out how best to shrink this back to 16-bytes. Some things I've tried:
- Technically
Opcode
can be removed here since there's only one ever used withHeapAddr
. That doesn't free up enough space alone.- I'm not sure if there's much use case for having lots of heap indices, so the
struct Heap(u32)
may be overkill. Chanaging that tou8
I think would reduce the size of this enum back to 16 bytes. Changing tou16
and removingOpcode
would also reduce the enum back to 16 bytes.- I was also looking at using
ir::Constant
instead ofUimm64
but that felt somewhat heavyweight for this use case because it seems like it would frequently allocate in the constant pool. There also didn't appear to be many helper functions going from aConstant
to au64
(or some other integral type) so this felt like I was too far off the beaten path.Do others have alternative ideas that could be implemented, or guidance on which of the above paths is the way to go here?
bjorn3 commented on issue #3153:
load, store, etc also use 32bit offsets even though they can already be used with 64bit pointers. In cg_clif if the offset overflows the
Offset32
, I emit aiadd_imm ptr, offset
before loading or storing. The same could be done forheap_addr
to keep a 32bit offset.
cfallin commented on issue #3153:
I may be misunderstanding the use of that immediate, but it looks to me like the field is actually used to denote the size of the access, not of the underlying heap (link), on the
heap_addr
instruction (which appears to be the only one that uses that instruction format). GIven that individual memory accesses definitely don't access more than 4GiB-sized values, I think 32 bits is still OK here?
alexcrichton commented on issue #3153:
Heh unfortunately I think you've fallen into the same trap I've fallen into a number of times. The
heap_addr
instruction does indeed document its immediate as an access size, but that's not technically how the wasm backend is using it. When translating, assuming there's a guard page, we feign the access size as the fixed offset in the memory instruction. We don't technically actually access all those bytes, but for the purposes of validation it's sufficient.Put another way, because wasm memory offsets can be 64-bits (with memory64), and because we use this offset as the "access size", that's the motivation for increasing the bit-width of the access size.
cfallin commented on issue #3153:
Hmm, I see; unexpected but I guess that usage makes sense.
IMHO the best option here is probably a hybrid approach during translation: if the offset fits in 32 bits (which will be the case the overwhelming majority of the time, hopefully), do as we presently do; if larger, then emit an explicit add instruction to compute the
addr+offset
, as @bjorn3 suggests above. The reason I'm hesitant to just accept the 64-bit immediate and correspondingInstructionData
growth is that I think the latter will have a surprisingly large impact on compile-time performance; allocating the space only when needed (via another instruction) should be more efficient, even though it's less theoretically clean.
alexcrichton commented on issue #3153:
I considered doing the
iadd_imm
trick but the issue is that overflow needs to be detected (otherwise the 65-bit effective offset isn't correctly calculated). That would involve doing some sort of trap-if-overflow check, but that means that every memory access would have 2 branches instead of one as with this PR. I was hoping to avoid that because I'm assuming the perf hit is already quite large and making it even larger could be worse.I think it would work for the time being, but I wanted to explore possibilities to update the cranelift instruction. I think it makes sense to not inflate 16 bytes to 24 bytes, but I wasn't having much luck brainstorming possibilities of how to keep it under 16 bytes (unless we want to change heap indexes to u8 and disallow 256+ heaps (which I'm not sure why we'd want anyway)
cfallin commented on issue #3153:
It would only be two checks for offsets >4GiB, no? I'd expect that most Wasm producers use offsets for things like structs and it would be very strange to have an offset that large, so in my thinking at least, it makes sense to treat this as a "needed for correctness but can be slow" corner. I may be misunderstanding the issue though?
Re: heap count, I'd be hesitant about limiting to 256 heaps; we might bump our heads on that in the future multi-module / component world (e.g. heap per Wasm component, with 300 components in a very large dependency graph -- at least, that's not totally implausible).
alexcrichton commented on issue #3153:
Hm nah that's a good point. We already incur a real bounds check on wasm32 for 2gb+ offsets so having an extra bounds check on wasm64 for 4gb+ bounds makes sense. I'll go ahead and implement that doing an addition with a overflow check if the offset doesn't fit in a u32
alexcrichton commented on issue #3153:
For reviewers, extra care when reviewing
cranelift/wasm/src/code_translator.rs
is likely in order. That's some tricky code that's pretty meticulously set up. It has large effects on performance and is the lynchpin of wasm heap correctness, so just want to triple-check I'm right there.
alexcrichton commented on issue #3153:
@peterhuene oh I realize now I tagged you here without much context. Would you be ok reviewing the Wasmtime-specific pieces of this? If not no worries, I'm sure I can find another unwilling
victimcandidate.
alexcrichton edited a comment on issue #3153:
@peterhuene oh I realize now I tagged you here without much context. Would you be ok reviewing the Wasmtime-specific pieces of this? If not no worries, I'm sure I can find another
unwilling victimcandidate.
peterhuene commented on issue #3153:
:+1: I'll review it now.
alexcrichton commented on issue #3153:
Thanks @peterhuene!
Last updated: Jan 24 2025 at 00:11 UTC