fitzgen commented on issue #6031:
FWIW: I looked over the wasm filetest changes pretty closely, but did not go over the ISA-specific versions of those same tests in much detail.
fitzgen commented on issue #6031:
Going to gather sightglass numbers now...
fitzgen commented on issue #6031:
Looks like this gives not just execution speed ups, but also compilation speed ups (presumably due to processing less IR):
$ sightglass benchmark -e main.so -e opt.so --engine-flags="--static-memory-maximum-size=0 --dynamic-memory-guard-size=65536" -m insts-retired -- benchmarks/default.suite execution :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm Δ = 354918771.70 ± 209612.26 (confidence = 99%) opt.so is 1.08x to 1.08x faster than main.so! [4757395008 4757613072.90 4757854325] main.so [4402584769 4402694301.20 4403032801] opt.so execution :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 2246146.90 ± 322.54 (confidence = 99%) opt.so is 1.07x to 1.07x faster than main.so! [35406698 35406850.20 35407505] main.so [33160532 33160703.30 33161229] opt.so execution :: instructions-retired :: benchmarks/bz2/benchmark.wasm Δ = 20037307.30 ± 5.37 (confidence = 99%) opt.so is 1.05x to 1.05x faster than main.so! [402201103 402201106.00 402201111] main.so [382163791 382163798.70 382163803] opt.so compilation :: instructions-retired :: benchmarks/pulldown-cmark/benchmark.wasm Δ = 80907.10 ± 30178.64 (confidence = 99%) opt.so is 1.01x to 1.01x faster than main.so! [9350309 9389659.00 9444444] main.so [9276889 9308751.90 9338252] opt.so compilation :: instructions-retired :: benchmarks/spidermonkey/benchmark.wasm Δ = 1534345.20 ± 211755.93 (confidence = 99%) opt.so is 1.01x to 1.01x faster than main.so! [218580786 218834739.10 219048529] main.so [216969499 217300393.90 217501790] opt.so
alexcrichton commented on issue #6031:
Hm ok well I also see now that a test failed with something that should trap which didn't, which means now that I missed something in review, so I'm much less confident in my review now.
alexcrichton commented on issue #6031:
Ah yeah after digging in I believe that's correct. The configuration requires dynamic memories to have a 64k guard page region but the pooling allocator in this configuration specifically isn't decommitting memory past the end to avoid syscalls, which means that there's actually no guard pages at all upon reinstantiation. That's ok with today's impelmentation of bounds checks which don't try to exploit the fact that the arithmetic can be simplified (e.g. this PR).
This is a tough problem though because that was one of the assumptions about dynamic memory and the pooling allocator, that we could avoid decommitting memory and save on syscalls/contention. This is making me realize though that if we want to take advantage of guard pages on dynamic memories (which I suspect we do since it should help drastically simplify the complexity of analysis required to handle multiple bounds checks) then this optimization isn't possible.
The offending lines I think are here which those need to be executed unconditionally if the guard size for memory is more than 0 bytes, which I think is basically always.
fitzgen commented on issue #6031:
The offending lines I think are here which those need to be executed unconditionally if the guard size for memory is more than 0 bytes, which I think is basically always.
Thanks! I was digging into the test but hadn't got that far before I was interrupted. Always nice when someone debugs the issue for you :)
Last updated: Nov 22 2024 at 17:03 UTC