alexcrichton edited issue #13386:
In https://github.com/bytecodealliance/wasmtime/pull/13382 I'm applying an optimization where
array.fillfori8-element arrays to be optimized to amemseton the host. This is relatively easy to do becausememory.fillalready has the infrastructure for this on the host andarray.fillis just reusing it. The intended benefit of this is that we get to use the host's vectorized routines forarray.fillas opposed to a per-byte-loop within CLIF. This benefit, however, is also theoretically applicable for elements of other sizes (e.g. all the way up to 128-bits). Implementing this, however, would require new libcalls on the host, for examplememory.fill{16,32,64,128}.This is doable without too too much effort, but this was left out of #13382 because it's not clear whether this is worth it. It'd likely be useful to investigate sibling peer compilers to see what they do in the face of
array.fillor similar for larger-than-8-bit-types.
alexcrichton commented on issue #13386:
I ended up doing a bit more work on https://github.com/bytecodealliance/wasmtime/pull/13382 for some more optimizations here. Specifically during
array.fill, andarray.new_defaultwhich uses the same internals, in addition to handlingi8arrays there's a check to see if the initialization value is a constant, and if that constant is a memset-able constant. For examplei64.const 0is a memset-able constant, as well asi64.const -1, buti64.const 1is not. That enables more usage ofmemset, notably witharray.new_default, which I think is going to be important.What this does not handle, however, is a few situations:
- We should still vectorize initialization of
i64.const 1in theory, the original premise of this issue- Due to codegen initializing
ref.i31 (i32.const -1), which has the bit-patterni32.const -1in CLIF, is not recognized and is not optimized tomemset. I think that's due to the fact that the shape at codegen-time is not a constant initialization value but probably instead either the result if aniaddorborinstruction. This is later const-prop'd to a constant, so the final IR looks like it should bememset, but the order of operations didn't go well.- While unrelated to
array.fill, the implementation ofarray.copydoes not use thememory.copylibcall forVMGcRef-based types. This can be used, however, when a GC implementation doesn't have read/write barriers (e.g. the null/copying collectors). In this situation we should ideally make a dynamic deduction based on the collector at compile time and usememory.copy's libcall unconditionally for all types if barriers aren't needed.
Last updated: Jun 01 2026 at 09:49 UTC