cfallin opened PR #4269 from 14-bit-type
to main
:
After extending
Type
to au16
,ValueData
became 12 bytes rather
than 8. This packs it back down to 8 bytes (64 bits) by stealing two
bits from theType
for the enum discriminant (leaving 14 bits for the
type itself).Performance comparison (3-way between original (
ty-u8
), 16-bitType
(ty-u16
), and this PR (ty-packed
)):~/work/sightglass% target/release/sightglass-cli benchmark \ -e ~/ty-u8.so -e ~/ty-u16.so -e ~/ty-packed.so \ --iterations-per-process 10 --processes 2 \ benchmarks-next/spidermonkey/benchmark.wasm compilation benchmarks-next/spidermonkey/benchmark.wasm cycles [20654406874 21749213920.50 22958520306] /home/cfallin/ty-packed.so [22227738316 22584704883.90 22916433748] /home/cfallin/ty-u16.so [20659150490 21598675968.60 22588108428] /home/cfallin/ty-u8.so nanoseconds [5435333269 5723139427.25 6041072883] /home/cfallin/ty-packed.so [5848788229 5942729637.85 6030030341] /home/cfallin/ty-u16.so [5436002390 5683248226.10 5943626225] /home/cfallin/ty-u8.so
So, when compiling SpiderMonkey.wasm, making
Type
16 bits regresses
performance by 4.5% (5.683s -> 5.723s), while this PR gets 14 bits for a 1.0%
cost (5.683s -> 5.723s). That's still not great, and we can likely do better,
but it's a start.cc @sparker-arm -- you could try this as a starting point, and maybe look
for where the remaining 1% degradation is coming from and try to address it?
I suspect that theVec<Type>
inVCode
may be playing a role, and if so
we can do a sparse-out-of-bounds trick (Vec<u8>
with one sentinel
"out of bounds" indicator value andFxHashMap<Value, Type>
for exceptions).
cfallin updated PR #4269 from 14-bit-type
to main
.
cfallin has marked PR #4269 as ready for review.
cfallin requested sparker-arm for a review on PR #4269.
sparker-arm submitted PR review.
cfallin updated PR #4269 from 14-bit-type
to main
.
cfallin merged PR #4269.
Last updated: Dec 23 2024 at 13:07 UTC