wasmtime / PR #13382 Optimize `array.fill` and `array.copy` · git-wasmtime

Stream: git-wasmtime

Topic: wasmtime / PR #13382 Optimize `array.fill` and `array.copy`

Wasmtime GitHub notifications bot (May 15 2026 at 02:32):

alexcrichton opened PR #13382 from alexcrichton:refactor-array-copy to bytecodealliance:main:

This commit is a step towards optimizing the codegen and implementation of the array.fill and array.copy GC instructions. This builds on the intrinsics refactored in #13367 and #13368 to be used in the GC implementation as well. For example array.fill and array.new_default for i8 arrays will now use memset to fill an array. The larger change here is the reimplementation of array.copy to inline the loop within generated CLIF.

Prior to this commit the array.copy function was exclusively built as a libcall. There were two versions of this libcall -- one for GC references and one for non GC references. Even with this minor optimization, however, the benchmark in #13279 is still quite slow. The main reason for this is that the GC references version of the libcall has an extremely high constant overhead per element copied. This is due to the nature of the implementation using completely safe APIs which engages a lot more dynamic type checks and GC heap tracking than necessary.

The implementation of array.copy in this commit now looks like:

First the bounds of the copy operation are all validated. Traps happen if either array is out of bounds.

Next raw heap addresses are calculated for the start/end of the src/dst.

For copyable types (e.g. i8, i32, etc), the implementation then delegates to the same intrinsic as memory.copy, a host-defined memmove.

For GC types (e.g. anyref), the implementation instead has an inline loop which copies elements from the source to the destination. This copy is structured as two loops, either a forwards copy or a backwards copy, and only one is taken.

The result of this implementation is that the benchmark in #13279 is massively faster. What previously took minutes-to-hours now completes in seconds, even with the DRC collector.

One caveat with this implementation is that when delegating to memory.copy or memory.fill there's an
unnecessary-if-wasmtime-has-no-bugs bounds check double-check the host will not fault. While wasm code will gracefully catch segfaults the host will not, so this is necessary to ensure that in the face of heap corruption the host never faults.

In the future it might make sense to split out functionality like this into separate helper CLIF functions to keep the source relatively lean. In the meantime though the massive performance benefits seem to outweigh the size costs (to me at least).

Wasmtime GitHub notifications bot (May 15 2026 at 02:32):

alexcrichton requested cfallin for a review on PR #13382.

Wasmtime GitHub notifications bot (May 15 2026 at 02:32):

alexcrichton requested wasmtime-compiler-reviewers for a review on PR #13382.

Wasmtime GitHub notifications bot (May 15 2026 at 02:32):

alexcrichton requested wasmtime-core-reviewers for a review on PR #13382.

Wasmtime GitHub notifications bot (May 15 2026 at 02:34):

alexcrichton edited PR #13382:

This commit is a step towards optimizing the codegen and implementation of the array.fill and array.copy GC instructions. This builds on the intrinsics refactored in #13367 and #13368 to be used in the GC implementation as well. For example array.fill and array.new_default for i8 arrays will now use memset to fill an array. The larger change here is the reimplementation of array.copy to inline the loop within generated CLIF.

Prior to this commit the array.copy function was exclusively built as a libcall. There were two versions of this libcall -- one for GC references and one for non GC references. Even with this minor optimization, however, the benchmark in #13279 is still quite slow. The main reason for this is that the GC references version of the libcall has an extremely high constant overhead per element copied. This is due to the nature of the implementation using completely safe APIs which engages a lot more dynamic type checks and GC heap tracking than necessary.

The implementation of array.copy in this commit now looks like:

First the bounds of the copy operation are all validated. Traps happen if either array is out of bounds.

Next raw heap addresses are calculated for the start/end of the src/dst.

For copyable types (e.g. i8, i32, etc), the implementation then delegates to the same intrinsic as memory.copy, a host-defined memmove.

For GC types (e.g. anyref), the implementation instead has an inline loop which copies elements from the source to the destination. This copy is structured as two loops, either a forwards copy or a backwards copy, and only one is taken.

The result of this implementation is that the benchmark in #13279 is massively faster. What previously took minutes-to-hours now completes in seconds, even with the DRC collector.

One caveat with this implementation is that when delegating to memory.copy or memory.fill there's an
unnecessary-if-wasmtime-has-no-bugs bounds check double-check the host will not fault. While wasm code will gracefully catch segfaults the host will not, so this is necessary to ensure that in the face of heap corruption the host never faults.

In the future it might make sense to split out functionality like this into separate helper CLIF functions to keep the source relatively lean. In the meantime though the massive performance benefits seem to outweigh the size costs (to me at least).



Closes #13279

Wasmtime GitHub notifications bot (May 15 2026 at 02:34):

alexcrichton unassigned cfallin from PR #13382 Optimize array.fill and array.copy.

Wasmtime GitHub notifications bot (May 15 2026 at 02:34):

alexcrichton requested fitzgen for a review on PR #13382.

Wasmtime GitHub notifications bot (May 15 2026 at 03:25):

alexcrichton updated PR #13382.

Wasmtime GitHub notifications bot (May 15 2026 at 09:35):

github-actions[bot] added the label wasmtime:api on PR #13382.

Wasmtime GitHub notifications bot (May 15 2026 at 09:35):

github-actions[bot] added the label wasmtime:ref-types on PR #13382.

Wasmtime GitHub notifications bot (May 15 2026 at 09:36):

github-actions[bot] commented on PR #13382:

Subscribe to Label Action

cc @fitzgen

<details>
This issue or pull request has been labeled: "wasmtime:api", "wasmtime:ref-types"

Thus the following users have been cc'd because of the following labels:

fitzgen: wasmtime:ref-types

To subscribe or unsubscribe from this label, edit the <code>.github/subscribe-to-label.json</code> configuration file.

Learn more.
</details>

Wasmtime GitHub notifications bot (May 15 2026 at 13:25):

:thumbs_up: fitzgen submitted PR review.

Wasmtime GitHub notifications bot (May 15 2026 at 13:25):

:speech_balloon: fitzgen created PR review comment:

Do we want to special case all POD types here? Like why not do this for i16 and v128 and f32 and even i31ref?

I guess memory_fill only takes a byte value to fill? Might make sense to have 8/16/32/64/128 versions. File a follow up issue?

Wasmtime GitHub notifications bot (May 15 2026 at 13:25):

:speech_balloon: fitzgen created PR review comment:

Maybe move this after array_bounds is called (and have array_bounds return its one_elem_size that emit_array_size_info creates) so that

We aren't emitting one_elem_size an extra time, and

we can do the multiply after we know it won't overflow (which doesn't really matter but is one less comment to read / thing to consider and double check for code readers)

Wasmtime GitHub notifications bot (May 15 2026 at 13:25):

:speech_balloon: fitzgen created PR review comment:

We are checking fuel before this, do we also want to check epochs at these loop headers? Or introduce an outer loop around this that does epoch checks and only process chunks of a maximum size in this inner loop?

Wasmtime GitHub notifications bot (May 15 2026 at 13:25):

:speech_balloon: fitzgen created PR review comment:

I think this should be slightly better:

        let corrupt_src = builder
            .ins()
            .icmp(IntCC::UnsignedGreaterThan, src_end_addr, heap_end);
        let corrupt_dst = builder
            .ins()
            .icmp(IntCC::UnsignedGreaterThan, dst_end_addr, heap_end);
        let corrupt = builder.ins().bor(corrupt_src, corrupt_dst);
        func_env.trapnz(builder, corrupt, TRAP_GC_HEAP_CORRUPT);

Wasmtime GitHub notifications bot (May 15 2026 at 14:50):

:memo: alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (May 15 2026 at 14:50):

:speech_balloon: alexcrichton created PR review comment:

I think codegen for that will be pretty significantly worse because it requires taking the results of the comparisons, or'ing them together, and then testing that value against zero, and then branching. Code-wise I think the best we can do here is compare, conditional branch, compare, conditional branch. IR-wise it's a bit larger, but machine-code-wise it'll be tighter

Wasmtime GitHub notifications bot (May 15 2026 at 14:51):

:memo: alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (May 15 2026 at 14:51):

:speech_balloon: alexcrichton created PR review comment:

Oh the translate_loop_header at the top does both fuel and epoch checks so these loops should be good.

Wasmtime GitHub notifications bot (May 15 2026 at 14:52):

:memo: alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (May 15 2026 at 14:52):

:speech_balloon: alexcrichton created PR review comment:

I was wondering the same, yeah, and you're right the reason is that it'd require new libcalls and I wasn't looking to expand the set of libcalls here. I also figure that languages will probably expect byte-arrays to be the most optimal so I figured we'd at least have a fast path for that. Can file a follow-up issue yeah.

Wasmtime GitHub notifications bot (May 15 2026 at 14:56):

alexcrichton updated PR #13382.

Wasmtime GitHub notifications bot (May 15 2026 at 15:22):

:thumbs_up: fitzgen submitted PR review.

Wasmtime GitHub notifications bot (May 15 2026 at 15:22):

:speech_balloon: fitzgen created PR review comment:

Hm yeah I guess we don't even legalize trapz into a branch in the midend anymore, so my original implied concern about this pattern being harder to GVN/LICM/etc is moot anyways

Wasmtime GitHub notifications bot (May 15 2026 at 15:22):

:speech_balloon: fitzgen created PR review comment:

Ah, I missed that! thanks

Wasmtime GitHub notifications bot (May 15 2026 at 15:37):

alexcrichton updated PR #13382.

Wasmtime GitHub notifications bot (May 15 2026 at 15:39):

:memo: alexcrichton submitted PR review.

Wasmtime GitHub notifications bot (May 15 2026 at 15:39):

:speech_balloon: alexcrichton created PR review comment:

I've filed an issue at https://github.com/bytecodealliance/wasmtime/issues/13386, and upon considering this more I realized that one thing we'd want to do, even in the future world of larger bit-widths, is that we still want to specialize based on what's actually being filled in. For example array.new_default with zero-initialization should be a memset for pretty much all types.

I've gone ahead and implemented that here, too. Mind re-reviewing the latest commit?

Wasmtime GitHub notifications bot (May 15 2026 at 21:22):

:thumbs_up: fitzgen submitted PR review:

Ship it!

Wasmtime GitHub notifications bot (May 15 2026 at 21:22):

fitzgen added PR #13382 Optimize array.fill and array.copy to the merge queue

Wasmtime GitHub notifications bot (May 15 2026 at 21:47):

github-merge-queue[bot] removed PR #13382 Optimize array.fill and array.copy from the merge queue

Wasmtime GitHub notifications bot (May 15 2026 at 21:57):

alexcrichton updated PR #13382.

Wasmtime GitHub notifications bot (May 15 2026 at 21:57):

alexcrichton has enabled auto merge for PR #13382.

Wasmtime GitHub notifications bot (May 15 2026 at 22:09):

alexcrichton added PR #13382 Optimize array.fill and array.copy to the merge queue

Wasmtime GitHub notifications bot (May 15 2026 at 22:33):

:check: alexcrichton merged PR #13382.

Wasmtime GitHub notifications bot (May 15 2026 at 22:33):

alexcrichton removed PR #13382 Optimize array.fill and array.copy from the merge queue

Wasmtime GitHub notifications bot (May 16 2026 at 05:42):

:thumbs_up: Ari4ka submitted PR review.

Last updated: Jul 29 2026 at 05:03 UTC