hungryzzz opened PR #8633 from hungryzzz:fix-8573
to bytecodealliance:main
:
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->The current default 16-bytes function alignment for x86-64 would cause suboptimal execution performance under some cases which are reported in https://github.com/bytecodealliance/wasmtime/issues/8573.
Based on the discussion "the CPU frontend grabs an aligned 32B or 64B chunk at a time" in https://github.com/bytecodealliance/wasmtime/issues/8573, this PR changes the default alignment from 16-bytes to 32-bytes for better performance.
Also rerun the cases reported in https://github.com/bytecodealliance/wasmtime/issues/8573 and the execution time will back to normal.
# After changes ➜ case ✗ wasmtime compile good.wasm -o good.cwasm ➜ case ✗ wasmtime compile bad.wasm -o bad.cwasm ➜ case ✗ time wasmtime run --allow-precompiled good.cwasm ~/wasmtime/target/release/wasmtime run good.cwasm 4.67s user 0.00s system 100% cpu 4.674 total ➜ case ✗ time wasmtime run --allow-precompiled bad.cwasm ~/wasmtime/target/release/wasmtime run bad.cwasm 4.68 user 0.01s system 100% cpu 9.365 total # Before changes ➜ case ✗ wasmtime compile good.wasm -o good.cwasm ➜ case ✗ wasmtime compile bad.wasm -o bad.cwasm ➜ case ✗ time wasmtime run --allow-precompiled good.cwasm ~/wasmtime/target/release/wasmtime run good.cwasm 4.67s user 0.00s system 100% cpu 4.674 total ➜ case ✗ time wasmtime run --allow-precompiled bad.cwasm ~/wasmtime/target/release/wasmtime run bad.cwasm 9.36s user 0.01s system 100% cpu 9.365 total
hungryzzz requested elliottt for a review on PR #8633.
hungryzzz requested wasmtime-compiler-reviewers for a review on PR #8633.
hungryzzz edited PR #8633:
<!--
Please make sure you include the following information:
If this work has been discussed elsewhere, please include a link to that
conversation. If it was discussed in an issue, just mention "issue #...".Explain why this change is needed. If the details are in an issue already,
this can be brief.Our development process is documented in the Wasmtime book:
https://docs.wasmtime.dev/contributing-development-process.htmlPlease ensure all communication follows the code of conduct:
https://github.com/bytecodealliance/wasmtime/blob/main/CODE_OF_CONDUCT.md
-->The current default 16-bytes function alignment for x86-64 would cause suboptimal execution performance under some cases which are reported in https://github.com/bytecodealliance/wasmtime/issues/8573.
Based on the discussion "the CPU frontend grabs an aligned 32B or 64B chunk at a time" in https://github.com/bytecodealliance/wasmtime/issues/8573, this PR changes the default alignment from 16-bytes to 32-bytes for better performance.
Also rerun the cases reported in https://github.com/bytecodealliance/wasmtime/issues/8573 and the execution time will back to normal.
# After changes ➜ case ✗ wasmtime compile good.wasm -o good.cwasm ➜ case ✗ wasmtime compile bad.wasm -o bad.cwasm ➜ case ✗ time wasmtime run --allow-precompiled good.cwasm ~/wasmtime/target/release/wasmtime run good.cwasm 4.68s user 0.00s system 100% cpu 4.680 total ➜ case ✗ time wasmtime run --allow-precompiled bad.cwasm ~/wasmtime/target/release/wasmtime run bad.cwasm 4.67s user 0.01s system 99% cpu 4.681 total # Before changes ➜ case ✗ wasmtime compile good.wasm -o good.cwasm ➜ case ✗ wasmtime compile bad.wasm -o bad.cwasm ➜ case ✗ time wasmtime run --allow-precompiled good.cwasm ~/wasmtime/target/release/wasmtime run good.cwasm 4.67s user 0.00s system 100% cpu 4.674 total ➜ case ✗ time wasmtime run --allow-precompiled bad.cwasm ~/wasmtime/target/release/wasmtime run bad.cwasm 9.36s user 0.01s system 100% cpu 9.365 total
alexcrichton commented on PR #8633:
Thanks for this! It looks like this is causing changes in the disassembly of some existing tests. You can update the disassembly with
WASMTIME_TEST_BLESS=1 cargo test --test disas
locally and commit those changes to get pushed up here
hungryzzz requested wasmtime-core-reviewers for a review on PR #8633.
hungryzzz updated PR #8633.
alexcrichton submitted PR review:
Thanks!
alexcrichton merged PR #8633.
Last updated: Jan 24 2025 at 00:11 UTC