Stream: wasmtime

Topic: Resources for learning about linear memory


view this post on Zulip Fritz Rehde (Jul 18 2023 at 18:00):

I am interested in learning more about the wasmtime linear memory for a project. I have been going through memory.rs and mmap.rs a little, but I think I am lacking a bigger picture. Are there any online resources you can recommend (documentation going into more detail)?
A more specific question: What is the difference between the accessible_size and mapping_size (called for instance in accessible_reserved)? As far as I understand, the accessible size represents memory that is guaranteed to be accessible/usable, so it should always be allowed to dereference the pointer to the regions that are accessible. I am not sure what mapping_size is for.

view this post on Zulip Chris Fallin (Jul 18 2023 at 18:30):

Unfortunately there's not really documentation besides what's in the code, so the best way to know how things work is to study the implementation -- though we're happy to answer questions here too

view this post on Zulip Chris Fallin (Jul 18 2023 at 18:30):

accessible_size is indeed the size of memory that is legal to access -- it corresponds to the size of the Wasm heap (which can grow during runtime)

view this post on Zulip Chris Fallin (Jul 18 2023 at 18:30):

mapping_size is, as name suggests, the size of the total memory region we reserve. This can be larger than accessible_size because we want to allow growth without relocating the heap

view this post on Zulip Fritz Rehde (Jul 18 2023 at 19:37):

Ok, thanks. What is the purpose/use of the pre_guard_bytes?

view this post on Zulip Alex Crichton (Jul 18 2023 at 19:51):

It's a configuration for guaranteed-unmapped memory before the start of linear memory which is a small defense-in-depth mechanism against possible compiler bugs which accidentally go before linear memory, otherwise it has no other runtime effect

view this post on Zulip Fritz Rehde (Jul 19 2023 at 21:11):

Thanks! Another question: The bounds checking that ensure nothing is accessed beyong the bounds of the linear memory is done both at compile time and runtime, right? So far, I found validate_bounds in runtime/src/instance.rs (runtime) and bounds_check_and_compute_addr in cranelift/wasm/src/code_translator/bounds_checks.rs (compile-time). Are there any more locations relevant to bounds checking?

view this post on Zulip Chris Fallin (Jul 19 2023 at 21:33):

Runtime bounds are really validated by code generated by what you've listed as "compile-time" checks. Since addresses aren't known until runtime, it's not really possible to do bounds-checking purely at compile-time. In more detail, we have two kinds of bounds-checking, "dynamic" and "static" (the names may not be the best, but that's what they are). Static bounds-checking is implemented by mapping a virtual-memory region to be accessible only as far as the memory's length, and a "guard region" after, so if the guest accesses out-of-bounds, we get (and catch) a SIGSEGV. The guard needs to be as large as the 32-bit offset can allow

view this post on Zulip Chris Fallin (Jul 19 2023 at 21:33):

and then dynamic is what you describe, with actual comparison operators

view this post on Zulip Fritz Rehde (Jul 20 2023 at 16:45):

Where in the code is this "guard region" implemented? Are those all these special cases in bound_checks.rs?

view this post on Zulip Chris Fallin (Jul 20 2023 at 16:46):

it's not really a single line of code we can point at; it's an overall design

view this post on Zulip Chris Fallin (Jul 20 2023 at 16:46):

the memory map creates the guard region

view this post on Zulip Chris Fallin (Jul 20 2023 at 16:46):

and then we compile in a way that has no dynamic bounds check, but rather adds a 32-bit offset

view this post on Zulip Fritz Rehde (Jul 20 2023 at 17:36):

Maybe some more context on why I am asking these questions: In a previous thread, I already mentionned that we are working on a prototype for adding MTE to wasm(time) for increased memory safety (it's part of a software stack that also involves/requires llvm to do some analysis). MTE requires aarch64 and 64 bit pointers, so we had to adapt wasmtime to use that, and that worked well. Since we no longer only have 32 bit addresses, we had to insert 64 bit runtime out of bounds checks. Now, we were thinking of a way to remove the overhead of these bounds checks. We came up with the idea of replacing these runtime bounds checks by using MTE (MTE adds tag bits to the upper bits of addresses) as well, by tagging the entire linear memory itself (with the stg instruction) and all pointers to the linear memory. Then, MTE would trap at a tag mismatch at runtime. We already realized that we should only be tagging the accessible linear memory (our changes here were mostly made to mmap.rs and memory.rs). We've also already modified the bounds checks in the functions/files that I mentionned in the previous message. However, it seems like we have missed adjusting some code related to the bounds checks (not sure if it's the "dynamic" or "static" bounds-checking you mentionned), since I still get an out of bounds exception when running a simple test program, probably because we haven't masked out the tag bits somewhere.

view this post on Zulip fitzgen (he/him) (Jul 20 2023 at 17:40):

you may be interested in this issue and its discussion as well: https://github.com/bytecodealliance/wasmtime/issues/6094

I've been digging into Cranelift's and Wasmtime's code quality and Wasm execution throughput when "dynamic memories" with explicit bounds checks are used to implement Wasm linear memories. Here I'm...

view this post on Zulip fitzgen (he/him) (Jul 20 2023 at 17:41):

when you say you had to insert 64 bit addresses, do you mean you switched to using wasm64?

view this post on Zulip fitzgen (he/him) (Jul 20 2023 at 17:41):

because wasm32 loads/stores already end up as 64 bit addresses after translation, so I am a bit confused

view this post on Zulip Fritz Rehde (Jul 20 2023 at 17:54):

Yes, we switched to wasm64.


Last updated: Oct 23 2024 at 20:03 UTC