I just finished debugging a performance regression in Spin and wanted to write it down here in case anyone else is affected by this. We were seeing a performance regression when Wasmtime 34 was updated to Wasmtime 35. After various layers of bisection I found that https://github.com/bytecodealliance/wasmtime/pull/10388 was the culprit.
The stack-switching PR was disabled for Wasmtime 34 despite merging just before the branch (I reverted it manually to de-risk Wasmtime 34). The PR, however, lives in Wasmtime 35 and 36 and main today. After puzzling over how an off-by-default feature would affect performance so drastically I discovered that the PR inadvertently changed the behavior of TablePool::reset_table_pages_to_zero where previously only the table's size was reset and afterwards the entire table slot was reset. A calculation of table.size() * mem::size_of::<*mut u8>() was changed to self.data_size(table.element_type()) where the latter is the size of the whole slot. A normal bug to happen so no one's at fault of course.
This meant, though, that when combined with *_keep_resident options it means that tables could have a way higher memset amount afterwards than before (for the same-size tables too). This ended up being the source of our performance regression.
The reason I'm talking about this here instead of on GitHub is that this is inadvertently already fixed. I ended up fixing this behavior in https://github.com/bytecodealliance/wasmtime/pull/11341 mistakenly assuming that the table pool allocator had always reset the entire slot instead of just the table itself. Basically I didn't realize that the behavior I was changing had itself changed recently with the merging of stack-switching. That PR did not make its way into Wasmtime 35 but it has made its way into Wasmtime 36.
So tl;dr; if you use *_keep_resident and see a performance regression on Wasmtime 35 but not 34 or 36 this may be why.
Interesting, thanks for finding this and for the writeup; that's definitely an error I made while working to address feedback on the stack-switching runtime changes.
That likely explains this profile I've seen on Wasmtime 35 with keep_resident options set https://profiler.firefox.com/public/8zfwkxghrmkz4d4bzd6zb9famgrx76kdw617bw0/flame-graph/?globalTrackOrder=102&hiddenGlobalTracks=02&symbolServer=http%3A%2F%2F127.0.0.1%3A3333%2F3nc2077n82dr7b772r2b9l8vm41mhn9anasg47j&thread=2whswxpxrwzmzowAlAnwCiCkwDhDjwFeFgwGdGfwIb&transforms=f-combined-gwkx8wxbxdxeyuyvz3z4&v=11
I've removed these options to work around it, but maybe I should revisit them in 36
Your profile Roman shows most of the memset from deallocate_memories which shouldn't have changed between 34/35/36, so that may be something else?
In that embedding the memory size is actually static across all modules, and it's pretty big, so I just assumed that it's simply too big for the feature. madvise performed way better in my testing.
I did see the table deallocation also incur significant cost, but I've since lost that profile - removing the keep_resident options fixed both issues for me that time.
for reference, here's the complete set of config I landed upon in the end: https://github.com/near/nearcore/blob/359902578a29d4542fb0d816c9cee2a45341d4a0/runtime/near-vm-runner/src/wasmtime_runner/mod.rs#L353-L414
at one point I've updated to 36 and it gave some performance benefits
I would definitely expect that for certain workloads keep_resident can make things worse, with or without the bug (though worse with the 35 bug). This is true with the pagemap optimizations as well. The biggest penalty moves a bit with madvise to arise during page faults, but if there's very few dirtied pages then both the madvise and the page faults can be, in aggregate, pretty expensive.
I did some comparisons of those tradeoffs in comments I added later to https://github.com/bytecodealliance/wasmtime/pull/11372. What I don't show there is any comparison of a single huge madvise compared against a pagemap scan + madvise. I felt it reached the point where just trying to bench real workloads made more sense.
Last updated: Dec 06 2025 at 06:05 UTC