@Andrew Brown you might be interested in this, so I'm updating Rust's simd support for wasm to the latest spec. One of the examples in the simd repo is a hex encoder that uses sse/avx on x86 and such, so I copied one of those and translated it to wasm intrinsics. Below "default" is using the simd intrinsics and "fallback" is the code you would write today (e.g. no intrinsics). Also "large" is processing 1MB and "small" is processing < 128 bytes.
test benches::large_default ... bench: 213,961 ns/iter (+/- 5,108) = 4900 MB/s
test benches::large_fallback ... bench: 3,108,434 ns/iter (+/- 75,730) = 337 MB/s
test benches::small_default ... bench: 52 ns/iter (+/- 0) = 2250 MB/s
test benches::small_fallback ... bench: 358 ns/iter (+/- 0) = 326 MB/s
basically wasmtime's implementation of SIMD, for hex encoding, is a 7x-15x speedup
not exactly a clever benchmark since who's bottlenecked hex encoding, but I figured this was pretty neat :)
Nice! Yeah, thanks for showing me that. Where's the code for those benchmarks?
gimme one min, will post soon
@Andrew Brown https://github.com/rust-lang/stdarch/pull/874, notably https://github.com/rust-lang/stdarch/pull/874/files#diff-179577566f4ea187af5abf39056532cb
That's pretty cool... I guess I never thought through how I would compile Rust to Wasm SIMD--but there it is!
Hi @Alex Crichton , Can you share that translation? The before and after?
@Johnnie Birch I think https://gist.github.com/alexcrichton/f9f10a1e2ce56c246fb449df45c3f113 is it
I previously used jitdump to get this stuff out but jitdump isn't working for me right now
perf report
isn't getting any symbols showing up and it's not figuring out where jit code lives
@Alex Crichton Got it thanks. Sorry, I'll take a look at jitdump and perf report. Need to figure out a way to have proper testing for those.
hm ok I bisected a bit and it looks like https://github.com/bytecodealliance/wasmtime/pull/1565 breaks perf
when the module is loaded from the cache, before that commit or --disable-cache
fixes the perf issues I was having
I'll investigate tomorrow more, no idea what that PR would be doing...
@Alex Crichton any chance you can run it on aarch64 too? :)
@Joey Gouly
thread '<unnamed>' panicked at 'Vector ops not implemented.', cranelift/codegen/src/isa/aarch64/lower_inst.rs:1624:13
:(
hex.wasm.gz -- this is the file I'm using:
$ ./target/release/wasmtime run --enable-simd -- hex.wasm --bench
running 9 tests
test tests::avx_works ... ignored
test tests::big ... ignored
test tests::empty ... ignored
test tests::encode_equals_fallback ... ignored
test tests::odd ... ignored
test benches::large_default ... bench: 214,676 ns/iter (+/- 2,050) = 4884 MB/s
test benches::large_fallback ... bench: 3,447,077 ns/iter (+/- 78,646) = 304 MB/s
test benches::small_default ... bench: 54 ns/iter (+/- 0) = 2166 MB/s
test benches::small_fallback ... bench: 397 ns/iter (+/- 7) = 294 MB/s
test result: ok. 0 passed; 0 failed; 5 ignored; 4 measured; 0 filtered out
@Alex Crichton aww, well we're working on simd, so hopefully it'll all be implemented soon!
@Alex Crichton btw, I had to run 'gunzip' twice on that file... am I doing something weird?
uh...
it looks like zulip maybe ran another layer of gz after I uploaded
or I ran gz twice by accident
it should be 16MB ish
yeah I got it working, but was very confused for a little bit :-)
Last updated: Jan 24 2025 at 00:11 UTC