Stream: general

Topic: memory.grow and physical memory consumption


view this post on Zulip Coulson Liang (Jun 20 2024 at 23:46):

Hi guys, I want to ask about the relation between memory.grow and physical memory consumption. On linux, when we call memory.grow, the runtime calls mmap to reserve the virtual memory space, but linux shouldn't really reserve physical memory for this operation, until the the new page being accessed right?

But if the above is true, then what's is the downside of just initialize the linear memory to be 4GB (the entire 32 bit space) at the beginning ?

view this post on Zulip Jacob Lifshay (Jun 21 2024 at 00:09):

using an entire 4gb section of address space can quickly run into limitations, such as on common x86-64 systems where userspace only has 47-bits of total address space available, which would mean you can't have more than 32768 wasm instances at once, which is a pretty low limit

view this post on Zulip Jacob Lifshay (Jun 21 2024 at 00:15):

somewhat older versions of windows limit that to 43-bits, meaning you're limited to just 2048 wasm instances

view this post on Zulip Coulson Liang (Jun 21 2024 at 00:17):

I see, this makes sense thank you!

view this post on Zulip Chris Fallin (Jun 21 2024 at 00:21):

@Coulson Liang I think there may be several inaccuracies in your description of how the opcodes work:

What @Jacob Lifshay says about address space limits is also true, and something we worry about, but in practice we still choose to use the VM-based approach typically because the alternative is explicit bounds checks which is a nontrivial perf impact and so usually a worse option

view this post on Zulip Chris Fallin (Jun 21 2024 at 00:22):

finally -- I'm not sure what you mean by "initialize the linear memory to be 4GB at the beginning" but we're constrained by Wasm semantics -- the linear memory size is defined by the initial size and any grow operations, we must trap if accesses happen that are out of bounds, so we have to actually adjust memory permissions as it grows

view this post on Zulip Alex Crichton (Jun 21 2024 at 01:02):

A downside to an embedder for starting linear memories at 4G is that the embedder can't keep track easily of what's paged in and what isn't. The memory.grow instruction can fail and provides a hook for the embedder to reject a requested growth as being to big or possibly block the instance entirely until there is more memory available. If everything is mapped in the there's no way to hook and event of memory being paged in easily

view this post on Zulip Nicholas Renner (Jun 21 2024 at 19:25):

Hey all, I'm working with Coulson. Really appreciate all the replies here!

@Chris Fallin Can you possibly direct me to where virtual memory space is reserved? I'm assuming this is for the 4GB region. This makes a lot of sense since I was concerned about how remap operations would work.

view this post on Zulip Chris Fallin (Jun 21 2024 at 19:28):

Grep for mmap in the wasmtime codebase, you'll find the abstractions in the runtime crate

view this post on Zulip Chris Fallin (Jun 21 2024 at 19:29):

(I hope that doesn't sound like a glib answer, but it really is that simple: we use mmap to reserve the space, so you can trace backward from there :-) )

view this post on Zulip Chris Fallin (Jun 21 2024 at 19:29):

in particular you might want to study the "on-demand allocator" first, it's simpler than the pooling allocator

view this post on Zulip Coulson Liang (Jun 21 2024 at 19:52):

Yeah yeah, here's the unified abstraction https://github.com/bytecodealliance/wasmtime/blob/main/crates/wasmtime/src/runtime/vm/mmap.rs
And here's the underlying implementation for unix systems
https://github.com/bytecodealliance/wasmtime/blob/main/crates/wasmtime/src/runtime/vm/sys/unix/mmap.rs

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.
A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

view this post on Zulip Alex Crichton (Jun 21 2024 at 20:34):

With default settings the mmap happens here

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

view this post on Zulip Alex Crichton (Jun 21 2024 at 20:34):

The default settings for various knobs there are here

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

view this post on Zulip Alex Crichton (Jun 21 2024 at 20:35):

if you look at strace for a single linear memory you should see a mmap of anonymous PROT_NONE memory which is 8G large (2G guard before, 4G region in the middle, 2G guard after)

view this post on Zulip Alex Crichton (Jun 21 2024 at 20:35):

you'll then see mprotect calls to PROT_READ | PROT_WRITE to make things read/write as memory becomes accessible

view this post on Zulip Alex Crichton (Jun 21 2024 at 20:36):

Wasmtime has no bindings to mremap right now if that's what you're looking for

view this post on Zulip Alex Crichton (Jun 21 2024 at 20:36):

memory growth which moves linear memory (which again doesn't happen by default, requires non-default settings), happens here with a simple memcpy

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

view this post on Zulip Nicholas Renner (Jun 21 2024 at 20:57):

Cool thank both of you so much. This all makes a lot of sense now.

view this post on Zulip Nicholas Renner (Jun 21 2024 at 21:00):

So as linear memory grows thats essentially just expanding whats accessible via mprotect. Are there any other checks being done to make sure addresses are valid? or is that just handled by the underlying OS?

I think Coulson asked the same question in a different way but just want to be sure. I believe I saw in a doc somewhere that adress checking similar to how NaCl handles SFI doesnt exist because of the overhead?

view this post on Zulip Chris Fallin (Jun 21 2024 at 21:03):

https://github.com/bytecodealliance/wasmtime/blob/main/cranelift/wasm/src/code_translator/bounds_checks.rs should answer your questiosn re: bounds-checks -- study the cases in there. We do have modes where we use dynamic checks instead of virtual memory permissions, it's configurable, but VM-based (we call it a "static memory") is the default

A fast and secure runtime for WebAssembly. Contribute to bytecodealliance/wasmtime development by creating an account on GitHub.

view this post on Zulip Nicholas Renner (Jun 21 2024 at 21:10):

ok thank you will try to go through this, its quite a lot of code. What I think youre saying though is that the default VM-based static memory just makes sure addresses are within the 4GB reserved region?

view this post on Zulip Chris Fallin (Jun 21 2024 at 21:11):

It's a little more subtle than that, I'd recommend reading the code

view this post on Zulip Chris Fallin (Jun 21 2024 at 21:12):

there are details to do with offsets on the loads/stores for example, and a "guard region"

view this post on Zulip Chris Fallin (Jun 21 2024 at 21:12):

the main bit is the seven cases with comments that show inequalities; that shouldn't be too bad to read through

view this post on Zulip Alex Crichton (Jun 21 2024 at 22:04):

You can also check out online some various settings -- https://godbolt.org/z/sxzjTsMMG

(module (memory 1) (func (param i32) (result i32) local.get 0 i32.load ) )

view this post on Zulip Alex Crichton (Jun 21 2024 at 22:04):

you can see there how the CLI settings affect the codegen for loads/stores

view this post on Zulip Alex Crichton (Jun 21 2024 at 22:04):

although you have to sort of manually reassemble things in your head due to lack of mapping there

view this post on Zulip Alex Crichton (Jun 21 2024 at 22:04):

Locally you can use wasmtime explore to take a look at what wasm corresponds to what asm

view this post on Zulip Xinyu Zeng (Sep 11 2024 at 09:57):

Alex Crichton said:

memory growth which moves linear memory (which again doesn't happen by default, requires non-default settings), happens here with a simple memcpy

That is when I saw the doc says memory.grow() will relocate base ptr but actually it does not (on 64-bit linux). Can I safely assume in this case I can have a “long lived” pointer into memory since the base ptr will not change? Thanks a lot.

Long context: I want to avoid the copy of output data from guest to host so I tried to store a pointer to the guest's memory(the output data) in host's "MyBuffer". When this "MyBuffer" drops it will call the dealloc func in guest to free the memory. That instance/store may also be reused after the pointer is saved in this "MyBuffer", meaning that memory.grow() will be called.

view this post on Zulip Alex Crichton (Sep 11 2024 at 14:52):

In general it's not safe to assume the pointer won't change. If possible I'd operate under the assumption that the pointer can change whenever wasm is called. Otherwise though it is possible where you can configure wasmtime specifically to ensure that the base pointer never changes.

view this post on Zulip Xinyu Zeng (Sep 12 2024 at 02:37):

Alex Crichton said:

In general it's not safe to assume the pointer won't change. If possible I'd operate under the assumption that the pointer can change whenever wasm is called. Otherwise though it is possible where you can configure wasmtime specifically to ensure that the base pointer never changes.

I see. Thanks. I think if we ensure after we get the pointer no wasm calls will be made to that instace&store (and the store wont be dropped), that pointer also won't change? A follow-up question is, would this be the canonical way to avoid copy of output data from wasm? In my use case, I would like the output to be zero-copy since it is large. That is the reason for my hacking above.

view this post on Zulip Xinyu Zeng (Sep 12 2024 at 04:02):

Sorry to add more to this thread, one more question though: When the wasm code contains malloc and free, will those be compiled to mprotect under default setting?

view this post on Zulip Alex Crichton (Sep 12 2024 at 04:22):

. I think if we ensure after we get the pointer no wasm calls will be made to that instace&store (and the store wont be dropped), that pointer also won't change?

Correct yeah. You'll also need to avoid growing memory. If possible it's recommended to use the safe Memory::data API or Memory::data_mut so you don't have to worry about these concerns, but that may also not be applicable in all situations.

And yes it's expected the embedders should be able to borrow data directly from wasm, and that's what Memory::data enables (or raw access too).

When the wasm code contains malloc and free, will those be compiled to mprotect under default setting?

If I understand you question right, I believe the answer is "sort of". The wasm code itself probably has a malloc/free, for example from wasi-libc. This is not implemented with mprotect at all since it's a pure wasm-level abstractions. The wasm code for malloc, though, probably calls the memory.grow wasm instruction at some point to allocate more meomry (that's a WebAssembly-level primitive). That is implemented with mprotect in Wasmtime.

There is, however, no equivalent to freeing memory in WebAssembly. Once an instance has memory it has no means of releasing it until the entire instance is destroyed.

view this post on Zulip Xinyu Zeng (Sep 12 2024 at 04:27):

but that may also not be applicable in all situations.

Yes that is my case... Thank you so much

view this post on Zulip Xinyu Zeng (Sep 12 2024 at 04:43):

Understood the memory can only be released after instance dropped. Thank you.
But I am thinking about whether we can reuse the grown linear memory. Say we have:

fn alloc_and_return_ptr() -> Ptr {
  let buff = vec![0u8; 1024 * 1024 * 1024]
  std::mem::forget(buf);
  // somehow return buff's pointer to host
}
pub unsafe extern "C" fn dealloc(ptr: *mut u8, len: usize, align: usize) {
    std::alloc::dealloc(
        ptr,
        std::alloc::Layout::from_size_align_unchecked(len, align),
    );
}

Both functions are compiled to WASM and exported to host. After calling alloc_and_return_ptrsay the physical memory grows 1GB, but then calling dealloc on the pointer does not make the physical memory freed. But then we call alloc_and_return_ptrI assume that 1GB will be reused and physical memory will not grow? My test shows it grows a little but I don't know where that growth comes from.

I am also wondering why we cannot release the 1GB physical memory after calling dealloc since it is mmap under the hood. I guess the answer is wasm's linear memory requirement.


Last updated: Oct 23 2024 at 20:03 UTC