Stream: git-wasmtime

Topic: wasmtime / PR #8245 Module creation from premapped images


view this post on Zulip Wasmtime GitHub notifications bot (Mar 26 2024 at 21:33):

Milek7 opened PR #8245 from Milek7:premapped-image to bytecodealliance:main:

As discussed previously in #7777 for some platform it is useful to allow loading modules using platform specific methods without mmaping memory as executable. This attempts to sidestep defining completely user-implementable CodeMemory trait, by requiring that precompiled cwasm file is mapped before using platform-specific methods, with headers and all. This way only host memory range needs to be passed to new Module::from_premapped_image method, which will then parse wasmtime-specific ELF header as usual.

For testing I hacked together this tool for packing cwasm files into Windows DLL or Linux DSO files: https://gist.github.com/Milek7/e8c1a9c284dc82c60cf48637f753b102

It can be used as follows:
wasmtime compile wasmmodule.wasm
cwasm2so pe wasmmodule wasmmodule.wasm wasmmodule.dll

HMODULE hmodule = LoadLibraryW(L"wasmmodule.dll");
uint8_t** ptr = (uint8_t**)GetProcAddress(hmodule, "elf_ptr");
size_t* size = (size_t*)GetProcAddress(hmodule, "elf_size");

error = wasmtime_module_from_premapped_image(engine, *ptr, *size, hmodule, [](void *data) {
    FreeLibrary((HMODULE)data);
}, &module);

or
wasmtime compile wasmmodule.wasm
cwasm2so elf wasmmodule wasmmodule.wasm wasmmodule.so

void* hmodule = dlopen("wasmmodule.so", RTLD_NOW);
uint8_t** ptr = (uint8_t**)dlsym(hmodule, "wasmmodule_elf_ptr");
size_t* size = (size_t*)dlsym(hmodule, "wasmmodule_elf_size");

error = wasmtime_module_from_premapped_image(engine, *ptr, *size, hmodule, [](void *data) {
    dlclose(data);
}, &module);

view this post on Zulip Wasmtime GitHub notifications bot (Mar 26 2024 at 21:33):

Milek7 requested fitzgen for a review on PR #8245.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 26 2024 at 21:33):

Milek7 requested wasmtime-core-reviewers for a review on PR #8245.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 26 2024 at 21:33):

Milek7 edited PR #8245:

As discussed previously in #7777 for some platforms it is useful to allow loading modules using platform specific methods without mmaping memory as executable. This attempts to sidestep defining completely user-implementable CodeMemory trait, by requiring that precompiled cwasm file is mapped before using platform-specific methods, with headers and all. This way only host memory range needs to be passed to new Module::from_premapped_image method, which will then parse wasmtime-specific ELF header as usual.

For testing I hacked together this tool for packing cwasm files into Windows DLL or Linux DSO files: https://gist.github.com/Milek7/e8c1a9c284dc82c60cf48637f753b102

It can be used as follows:
wasmtime compile wasmmodule.wasm
cwasm2so pe wasmmodule wasmmodule.wasm wasmmodule.dll

HMODULE hmodule = LoadLibraryW(L"wasmmodule.dll");
uint8_t** ptr = (uint8_t**)GetProcAddress(hmodule, "elf_ptr");
size_t* size = (size_t*)GetProcAddress(hmodule, "elf_size");

error = wasmtime_module_from_premapped_image(engine, *ptr, *size, hmodule, [](void *data) {
    FreeLibrary((HMODULE)data);
}, &module);

or
wasmtime compile wasmmodule.wasm
cwasm2so elf wasmmodule wasmmodule.wasm wasmmodule.so

void* hmodule = dlopen("wasmmodule.so", RTLD_NOW);
uint8_t** ptr = (uint8_t**)dlsym(hmodule, "wasmmodule_elf_ptr");
size_t* size = (size_t*)dlsym(hmodule, "wasmmodule_elf_size");

error = wasmtime_module_from_premapped_image(engine, *ptr, *size, hmodule, [](void *data) {
    dlclose(data);
}, &module);

view this post on Zulip Wasmtime GitHub notifications bot (Mar 26 2024 at 21:37):

Milek7 updated PR #8245.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 26 2024 at 21:43):

Milek7 edited PR #8245:

As discussed previously in #7777 for some platforms it is useful to allow loading modules using platform specific methods without mmaping memory as executable. This attempts to sidestep defining completely user-implementable CodeMemory trait, by requiring that precompiled cwasm file is mapped before using platform-specific methods, with headers and all. This way only host memory range needs to be passed to new Module::from_premapped_image method, which will then parse wasmtime-specific ELF header as usual.

For testing I hacked together this tool for packing cwasm files into Windows DLL or Linux DSO files: https://gist.github.com/Milek7/e8c1a9c284dc82c60cf48637f753b102

It can be used as follows:
wasmtime compile wasmmodule.wasm
cwasm2so pe wasmmodule wasmmodule.cwasm wasmmodule.dll

HMODULE hmodule = LoadLibraryW(L"wasmmodule.dll");
uint8_t** ptr = (uint8_t**)GetProcAddress(hmodule, "elf_ptr");
size_t* size = (size_t*)GetProcAddress(hmodule, "elf_size");

error = wasmtime_module_from_premapped_image(engine, *ptr, *size, hmodule, [](void *data) {
    FreeLibrary((HMODULE)data);
}, &module);

or
wasmtime compile wasmmodule.wasm
cwasm2so elf wasmmodule wasmmodule.cwasm wasmmodule.so

void* hmodule = dlopen("wasmmodule.so", RTLD_NOW);
uint8_t** ptr = (uint8_t**)dlsym(hmodule, "wasmmodule_elf_ptr");
size_t* size = (size_t*)dlsym(hmodule, "wasmmodule_elf_size");

error = wasmtime_module_from_premapped_image(engine, *ptr, *size, hmodule, [](void *data) {
    dlclose(data);
}, &module);

view this post on Zulip Wasmtime GitHub notifications bot (Mar 26 2024 at 21:44):

github-actions[bot] commented on PR #8245:

Subscribe to Label Action

cc @peterhuene

<details>
This issue or pull request has been labeled: "wasmtime:api", "wasmtime:c-api"

Thus the following users have been cc'd because of the following labels:

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

view this post on Zulip Wasmtime GitHub notifications bot (Mar 26 2024 at 22:29):

Milek7 updated PR #8245.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 27 2024 at 14:40):

alexcrichton requested alexcrichton for a review on PR #8245.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 27 2024 at 14:57):

alexcrichton commented on PR #8245:

Thanks for the PR here! I like the look of this and I agree it's probably best to avoid allowing arbitrary implementations of CodeMemory. To make sure I understand what's going on here -- this is assuming that the *.so and *.dll created by your tool maps the *.cwasm into memory but the directives in the native object are such that the memory protections of all attributes are already configured appropriately? For example .text is already executable and everything else is already readonly?

Also, how willing are you to continue to work on this? I realize you're probably focused on what you're working on rather than changing this according to review, but I think some of the points below are going to be important to continue to maintain this over time for us. Additionally I think this feature could be useful to other folks as well, so I think it'd be good to polish it too if we can. That being said I'm happy to help out myself where I can, but I probably can't take on everything below, so your assistance as well would be much appreicated.

If my assumption above is correct, I like this approach! At a high-level though some things I think may want to be changed are:

view this post on Zulip Wasmtime GitHub notifications bot (Mar 27 2024 at 15:48):

Milek7 commented on PR #8245:

Yes, its contents are mapped directly so that calling serialize() would yield cwasm original file, with separate segments for executable areas:
![obraz](https://github.com/bytecodealliance/wasmtime/assets/7935014/af86a809-acfd-4e95-8288-152888ebcd5a)
![obraz](https://github.com/bytecodealliance/wasmtime/assets/7935014/53ca3bd1-2fe9-4b10-8a90-a73ff90e8142)
Here first three segments contain cwasm file, while second one corresponds to .text section inside it. Remaining segments contain metadata needed for dynamic library, headers pointing to unwind information, etc.

I have taken the view here that generating these binaries and how exactly they are loaded are up to embedder. One thing to note that while I don't need that for my use case, replacing raw memory range with library to be loaded would preclude use of linking compiled module as static library. Nevertheless if that's desired I could work on moving generation and loading into wasmtime itself some time later.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 27 2024 at 16:04):

Milek7 edited a comment on PR #8245:

Yes, its contents are mapped directly so that calling serialize() would yield cwasm original file, with separate segments for executable areas: (screenshots for ELF and PE, with different input files)
![obraz](https://github.com/bytecodealliance/wasmtime/assets/7935014/af86a809-acfd-4e95-8288-152888ebcd5a)
![obraz](https://github.com/bytecodealliance/wasmtime/assets/7935014/53ca3bd1-2fe9-4b10-8a90-a73ff90e8142)
Here first three segments contain cwasm file, while second one corresponds to .text section inside it. Remaining segments contain metadata needed for dynamic library, headers pointing to unwind information, etc.

I have taken the view here that generating these binaries and how exactly they are loaded are up to embedder. One thing to note that while I don't need that for my use case, replacing raw memory range with library to be loaded would preclude use of linking compiled module as static library. Nevertheless if that's desired I could work on moving generation and loading into wasmtime itself some time later.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 27 2024 at 18:14):

Milek7 edited a comment on PR #8245:

Yes, its contents are mapped directly so that calling serialize() would yield cwasm original file, with separate segments for executable areas: (screenshots for ELF and PE, with different input files)
![obraz](https://github.com/bytecodealliance/wasmtime/assets/7935014/af86a809-acfd-4e95-8288-152888ebcd5a)
![obraz](https://github.com/bytecodealliance/wasmtime/assets/7935014/53ca3bd1-2fe9-4b10-8a90-a73ff90e8142)
Here first three segments contain cwasm file, while second one corresponds to .text section inside it. Remaining segments contain metadata needed for dynamic library, headers pointing to unwind information, etc.

I have taken the view here that generating these binaries and how exactly they are loaded are up to embedder. One thing to note that while I don't think I need that for my use case, replacing raw memory range with library to be loaded would preclude use of linking compiled module as static library. Nevertheless if that's desired I could work on moving generation and loading into wasmtime itself some time later.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 27 2024 at 18:45):

Milek7 edited a comment on PR #8245:

Yes, its contents are mapped directly so that calling serialize() would yield cwasm original file, with separate segments for executable areas: (screenshots for ELF and PE, with different input files)
![obraz](https://github.com/bytecodealliance/wasmtime/assets/7935014/af86a809-acfd-4e95-8288-152888ebcd5a)
![obraz](https://github.com/bytecodealliance/wasmtime/assets/7935014/53ca3bd1-2fe9-4b10-8a90-a73ff90e8142)
Here first three segments contain cwasm file, while second one corresponds to .text section inside it. Remaining segments contain metadata needed for dynamic library, headers pointing to unwind information, etc.

I have taken the view here that generating these binaries and how exactly they are loaded are up to embedder. One thing to note that while I don't need that for my use case, replacing raw memory range with library to be loaded would preclude use of linking compiled module as static library. Nevertheless if that's desired I could work on moving generation and loading into wasmtime itself some time later.

view this post on Zulip Wasmtime GitHub notifications bot (Mar 27 2024 at 22:12):

alexcrichton commented on PR #8245:

Ok makes sense, thanks for the clarification!

Personally I think it's important to have tests for this, and to do that I think it's ok to move the bits and pieces necessary to build this image into Wasmtime itself. If the bits and pieces in Wasmtime don't work for your use case though then I definitely don't want to ask you to build something you're not going to use.

One of the main worries I have is that there's a lot of implicit assumptions about the output of Wasmtime for this tool to work, so I'm a bit afraid of putting that on embedders as it seems like we may accidentally break it in the future. For example:

More-or-less I'd be more comfortable if we internalized some of these pieces in Wasmtime to be able to update it as the design in Wasmtime itself evolves over time. For example if the goal is to create a linkable object I think that'd be great to add here as well. If creating a dynamic object is all that's needed I think your gist would work well to live in Wasmtime too.


Last updated: Nov 22 2024 at 16:03 UTC