Stream: git-wasmtime

Topic: wasmtime / issue #3754 Pooling allocator: investigate com...


view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2022 at 20:07):

cfallin opened issue #3754:

After some discussion with @alexcrichton just now, the question of why wasmtime's pooling allocator maintains separate pools for memories, tables, and instance structs (and stacks, but those are special on Windows at least) arose.

Especially if we're considering using madvise to flash-clear state and make instantiation faster, it might make sense to investigate whether we can "transpose" the arrangement and put all of an instance's resources in one contiguous region (with appropriate guards of course).

For example, one instance slot could contain all of that instance's tables and memories, and the Instance struct, and perhaps the stack on non-Windows platforms.

The major advantage of this is that we can decommit (flash-clear) the whole region with one madvise syscall. In contrast, today, we need a separate madvise for each resource. This cost is somewhat hidden today because we write ~every byte of the Instance and the tables (with eager init) but if we lazily initialize state in tables and/or Instances (e.g. lazy anyfunc init) this could matter more.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2022 at 20:07):

cfallin edited issue #3754:

After some discussion with @alexcrichton just now, the question of why wasmtime's pooling allocator maintains separate pools for memories, tables, and instance structs (and stacks, but those are special on Windows at least) arose.

Especially if we're considering using madvise to flash-clear state and make instantiation faster, it might make sense to investigate whether we can "transpose" the arrangement and put all of an instance's resources in one contiguous region (with appropriate guards of course).

For example, one instance slot could contain all of that instance's tables and memories, and the Instance struct, and perhaps the stack on non-Windows platforms.

The major advantage of this is that we can decommit (flash-clear) the whole region with one madvise syscall. In contrast, today, we need a separate madvise for each resource. This cost is somewhat hidden today because we write ~every byte of the Instance and the tables (with eager init) but if we lazily initialize state in tables and/or Instances (e.g. lazy anyfunc init) this could matter more.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2022 at 20:13):

alexcrichton commented on issue #3754:

cc @peterhuene, you probably know the reason for separate mmaps here.

One idea is that we could possibly have less fragmentation by putting everything in one since we don't technically need guards between the instance and the tables (in theory at least), although we do for sure needs guards on stacks and on memories.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2022 at 20:26):

peterhuene commented on issue #3754:

The primary reason for the disjoint instance resources is that it greatly simplified the uffd implementation as we could just monitor a single range for all linear memories rather than discrete parts of a range representing a single instance (or have to worry about faults from things like tables or stack pages).

Indeed, the cost was additional madvise calls to make it simpler. Lucet has the design proposed here where all instance resources remain in a continuous slot (and thus a single madvise call), but its uffd handler has to handle faults in pages not meant for linear memory too.

A single madvise call is definitely preferable and if we're replacing the uffd implementation, I think this makes sense.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2022 at 20:28):

peterhuene edited a comment on issue #3754:

The primary reason for the disjoint instance resources is that it greatly simplified the uffd implementation as we could just monitor a single range for all linear memories rather than discrete parts of a range representing a single instance's linear memory pages (or, alternatively, monitor pages meant for tables or stacks).

Indeed, the cost was additional madvise calls to make it simpler. Lucet has the design proposed here where all instance resources remain in a continuous slot (and thus a single madvise call), but its uffd handler has to handle faults in pages not meant for linear memory too.

A single madvise call is definitely preferable and if we're replacing the uffd implementation, I think this makes sense.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 01 2022 at 20:39):

peterhuene edited a comment on issue #3754:

The primary reason for the disjoint instance resources is that it greatly simplified the uffd implementation as we could just monitor a single range for all linear memories rather than multiple discrete parts of a range representing an instance's linear memory pages (or, alternatively, monitor pages meant for tables or stacks).

Indeed, the cost was additional madvise calls to make it simpler. Lucet has the design proposed here where all instance resources remain in a continuous slot (and thus a single madvise call), but its uffd handler has to handle faults in pages not meant for linear memory too.

A single madvise call is definitely preferable and if we're replacing the uffd implementation, I think this makes sense.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 02 2022 at 14:29):

bjorn3 commented on issue #3754:

Using separate pools would also make it harder to leak pointers (or even cause interpret attacker controlled bytes as a pointer) in case of bugs I would guess.

view this post on Zulip Wasmtime GitHub notifications bot (Feb 02 2022 at 14:29):

bjorn3 edited a comment on issue #3754:

Using separate pools would also make it harder to leak pointers (or even cause code to interpret attacker controlled bytes as a pointer) in case of bugs I would guess.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 23 2022 at 20:13):

alexcrichton labeled issue #3754:

After some discussion with @alexcrichton just now, the question of why wasmtime's pooling allocator maintains separate pools for memories, tables, and instance structs (and stacks, but those are special on Windows at least) arose.

Especially if we're considering using madvise to flash-clear state and make instantiation faster, it might make sense to investigate whether we can "transpose" the arrangement and put all of an instance's resources in one contiguous region (with appropriate guards of course).

For example, one instance slot could contain all of that instance's tables and memories, and the Instance struct, and perhaps the stack on non-Windows platforms.

The major advantage of this is that we can decommit (flash-clear) the whole region with one madvise syscall. In contrast, today, we need a separate madvise for each resource. This cost is somewhat hidden today because we write ~every byte of the Instance and the tables (with eager init) but if we lazily initialize state in tables and/or Instances (e.g. lazy anyfunc init) this could matter more.

view this post on Zulip Wasmtime GitHub notifications bot (May 20 2022 at 15:31):

alexcrichton commented on issue #3754:

A bit belated but a month or so I did some investigation into this to see what would happen. My findings at the time was that the cost for an madvise call was proportional to the size of the memory that we're madvise-ing, even if that memory wasn't present. My conclusion at the time was that it was better to do the little madvise calls we do today rather than lumping everything into one giant allocation where one madvise call is all that's necessary to blow away.

I wasn't super scientific in my findings, though. Additionally it was also awhile ago I collected data for this and I probably don't have the branch around any more. Wanted to note though that I don't think this is necessarily a slam dunk and before trying to proceed with this (if ever), we'll want to carefully evaluate performance.


Last updated: Dec 23 2024 at 12:05 UTC