jadamcrain added the bug label to Issue #10239.
jadamcrain opened issue #10239:
We're embedding
wasmtime
to execute plugins within our application. These plugins are defined in WIT. The application instantiates multiple instances of the plugin, and drives each instance on its own Tokio task. We started seeing slow memory growth in production. This was surprising because our application is carefully designed to have very flat memory usage. It has a fixed number of Tokio tasks that only communicate with each-other using bounded queues.When only 1 plugin executes, no memory is leaked, and the application has a flat steady-state memory usage:

When more than 1 plugin executes, the application will leak memory in WASM => HOST callbacks, e.g. 2 instances on 2 tasks was actually pretty steady for a period of time and then started leaking:

Zoomed in view of last chart:

It will sometimes start leaking right away, and other times take a while to start. Once it starts leaking, it always continues to leak. At 4 instances (4 plugins on 4 parallel tasks) the leaks blow up immediately:

The higher the level of parallelism, the faster the leak... It feels like there's something shared between instances here that isn't thread-safe.
The leaked allocations are
realloc
that occurs when "lifting" a list type from WASM => host in a host callback.
Test Case
Not easy to replicate outside of our application ATM.
Steps to Reproduce
I'd like to upload the ZST traces for heaptrack, but github is blocking even a zip of them? They're large, 40-60 MB.
Versions and Environment
Wasmtime version or commit: 29.0.1
Operating system: Linux
Architecture: x86_64
jadamcrain edited issue #10239:
We're embedding
wasmtime
to execute plugins within our application. These plugins are defined in WIT. The application instantiates multiple instances of the plugin, and drives each instance on its own Tokio task. We started seeing slow memory growth in production. This was surprising because our application is carefully designed to have very flat memory usage. It has a fixed number of Tokio tasks that only communicate with each-other using bounded queues.When only 1 plugin executes, no memory is leaked, and the application has a flat steady-state memory usage:

When more than 1 plugin executes, the application will leak memory in WASM => HOST callbacks, e.g. 2 instances on 2 tasks was actually pretty steady for a period of time and then started leaking:

Zoomed in view of last chart:

It will sometimes start leaking right away, and other times take a while to start like the trace above. Once it starts leaking, it always continue to leak. At 4 instances (4 plugins on 4 parallel tasks) the leaks blow up immediately:

The higher the level of parallelism, the faster the leak... It feels like there's something shared between instances here that isn't thread-safe.
The leaked allocations are
realloc
that occurs when "lifting" a list type from WASM => host in a host callback.
Test Case
Not easy to replicate outside of our application ATM.
Steps to Reproduce
I'd like to upload the ZST traces for heaptrack, but github is blocking even a zip of them? They're large, 40-60 MB.
Versions and Environment
Wasmtime version or commit: 29.0.1
Operating system: Linux
Architecture: x86_64
alexcrichton commented on issue #10239:
Thanks for the report! I'm going to ask some questions about the shape of your embedding to help get some more information and hopefully assist in debugging as well. It's understandable if you can't share the whole application, but it may take some more back-and-forth in the absence of a reproduction.
- Are you using the pooling allocator? Or the default
OnDemand
allocation strategy?- Or, more generally, are you able to share a snippet/gist of your creation of
wasmtime::Config
? Understanding the configuration settings may be helpful in determining what possible leak scenarios there are.- Is there a legend/key for the colors of the stripes in the graphs above?
- Would you be able to share the signature of the WIT function that looks like it's leaking? Or are you able to pin down which host function is triggering the leak?
- Can you talk more about the lifecycle of a plugin? Is it instantiated for a long time? Or only a short period of time before it's thrown away?
- Can you speak more as to what/how statistics are being gathered here? Is it instrumentation of malloc/free with
LD_PRELOAD
? Or something lower level perhaps?
jadamcrain commented on issue #10239:
Hi @alexcrichton. Before I make anyone guess in the dark here on our proprietary application, I'm try to make this leak occur in a minimal application that mimics our embedding that I can just shove in a public repo. Fingers crossed.
Some initial responses below while I work on a full host/guest I can hand you in the background:
Are you using the pooling allocator? Or the default OnDemand allocation strategy?
We've not explicitly selected any allocator, so I assume it's the default.
Or, more generally, are you able to share a snippet/gist of your creation of wasmtime::Config? Understanding the configuration settings may be helpful in determining what possible leak scenarios there are.
We're using all of the default settings.
Is there a legend/key for the colors of the stripes in the graphs above?
Yes, there is. I'll get you this in a bit if I fail to give you a reproducible example.
Would you be able to share the signature of the WIT function that looks like it's leaking? Or are you able to pin down which host function is triggering the leak?
It's a pretty simple host function:
// publish a set of samples publish: func(samples: list<sample>);
A
sample
is nothing special... it doesn't contain any dynamically allocated types and should all just be laid out on the stack.
Can you talk more about the lifecycle of a plugin? Is it instantiated for a long time? Or only a short period of time before it's thrown away?
I lives forever... as long as the application. We actually use the plugin to create a single
Resource
type during initialization. We then periodically call a single method on the guest resource, which can call back to host functions like "publish" above.
Can you speak more as to what/how statistics are being gathered here? Is it instrumentation of malloc/free with LD_PRELOAD? Or something lower level perhaps?
My understanding is that Heaptrack uses LD_PRELOAD to insert it's own .SO between the application and the allocator. We're not using jemalloc here, just the default Rust global allocator.
jadamcrain edited a comment on issue #10239:
Hi @alexcrichton. Before I make anyone guess in the dark here on our proprietary application, I'm try to make this leak occur in a minimal application that mimics our embedding that I can just shove in a public repo. Fingers crossed.
Some initial responses below while I work on a full host/guest I can hand you in the background:
Are you using the pooling allocator? Or the default OnDemand allocation strategy?
We've not explicitly selected any allocator, so I assume it's the default.
Or, more generally, are you able to share a snippet/gist of your creation of wasmtime::Config? Understanding the configuration settings may be helpful in determining what possible leak scenarios there are.
We're using all of the default settings.
Is there a legend/key for the colors of the stripes in the graphs above?
Yes, there is. I'll get you this in a bit if I fail to give you a reproducible example.
Would you be able to share the signature of the WIT function that looks like it's leaking? Or are you able to pin down which host function is triggering the leak?
It's a pretty simple host function:
// publish a set of samples publish: func(samples: list<sample>);
A
sample
is nothing special... it doesn't contain any dynamically allocated types and should all just be laid out on the stack.
Can you talk more about the lifecycle of a plugin? Is it instantiated for a long time? Or only a short period of time before it's thrown away?
It lives forever... as long as the application. We actually use the plugin to create a single
Resource
type during initialization. We then periodically call a single method on the guest resource, which can call back to host functions like "publish" above.
Can you speak more as to what/how statistics are being gathered here? Is it instrumentation of malloc/free with LD_PRELOAD? Or something lower level perhaps?
My understanding is that Heaptrack uses LD_PRELOAD to insert it's own .SO between the application and the allocator. We're not using jemalloc here, just the default Rust global allocator.
jadamcrain edited a comment on issue #10239:
Hi @alexcrichton. Before I make anyone guess in the dark here on our proprietary application, I'm trying to make this leak occur in a minimal application that mimics our embedding that I can just shove in a public repo. Fingers crossed.
Some initial responses below while I work on a full host/guest I can hand you in the background:
Are you using the pooling allocator? Or the default OnDemand allocation strategy?
We've not explicitly selected any allocator, so I assume it's the default.
Or, more generally, are you able to share a snippet/gist of your creation of wasmtime::Config? Understanding the configuration settings may be helpful in determining what possible leak scenarios there are.
We're using all of the default settings.
Is there a legend/key for the colors of the stripes in the graphs above?
Yes, there is. I'll get you this in a bit if I fail to give you a reproducible example.
Would you be able to share the signature of the WIT function that looks like it's leaking? Or are you able to pin down which host function is triggering the leak?
It's a pretty simple host function:
// publish a set of samples publish: func(samples: list<sample>);
A
sample
is nothing special... it doesn't contain any dynamically allocated types and should all just be laid out on the stack.
Can you talk more about the lifecycle of a plugin? Is it instantiated for a long time? Or only a short period of time before it's thrown away?
It lives forever... as long as the application. We actually use the plugin to create a single
Resource
type during initialization. We then periodically call a single method on the guest resource, which can call back to host functions like "publish" above.
Can you speak more as to what/how statistics are being gathered here? Is it instrumentation of malloc/free with LD_PRELOAD? Or something lower level perhaps?
My understanding is that Heaptrack uses LD_PRELOAD to insert it's own .SO between the application and the allocator. We're not using jemalloc here, just the default Rust global allocator.
alexcrichton commented on issue #10239:
Ok thanks for the info!
Everything seems pretty reasonable to me and from what I can tell from the screenshots it looks like the
Vec<Sample>
that's allocated on the host is what's leaking. I've double-checked the various bits and pieces I could in Wasmtime and I can't find anything awry though. In the final screenshot though you've expanded a chain of 18.1MB leaked bytes, but just above that (highlighted in the screenshot) is a leak of 35.7MB. Does the trace there look similar?I also assume you're using
wasmtime::component::bindgen!
-generated bindings for this API? If so you should get theVec<Sample>
and that should naturally get deallocated when it falls out of scope in Rust. Basically I'm as stumped as you are :) (I'll keep digging once you've got more info though)
jadamcrain commented on issue #10239:
Yes, I'm using
wasmtime::component::bindgen!
. I agree that this doesn't make any sense. I just tried this using valgrind's massif and I'm getting a flat trace there even with high parallelism... this kinda makes me think that this might be a bug inheaptrace
rather than a leak application.I'm going to try a couple more heap profiling tools to be certain that's the case like
jemalloc
.Heaptrack did allow me to find a leak in our code (me being stupid and growing and endless HashMap), but then I kept going with, but it might just be wrong here.
jadamcrain edited a comment on issue #10239:
Yes, I'm using
wasmtime::component::bindgen!
. I agree that this doesn't make any sense. I just tried this using valgrind's massif and I'm getting a flat trace there even with high parallelism... this kinda makes me think that this might be a bug inheaptrack
rather than a leak application.Massif is really slow though compared to heaptrack, so to rule out some kind of heisenbug, I'm going to try a couple more heap profiling tools to be certain that's the case like
jemalloc
.Heaptrack did allow me to find a leak in our code (me being stupid and growing and endless HashMap), but then I kept going with, but it might just be wrong here.
jadamcrain edited a comment on issue #10239:
Yes, I'm using
wasmtime::component::bindgen!
. I agree that this doesn't make any sense. I just tried this using valgrind's massif and I'm getting a flat trace there even with high parallelism... this kinda makes me think that this might be a bug inheaptrack
rather than a leak in the application.Massif is really slow though compared to heaptrack, so to rule out some kind of heisenbug, I'm going to try a couple more heap profiling tools to be certain that's the case like
jemalloc
.Heaptrack did allow me to find a leak in our code (me being stupid and growing and endless HashMap), but then I kept going with, but it might just be wrong here.
jadamcrain edited a comment on issue #10239:
Yes, I'm using
wasmtime::component::bindgen!
. I agree that this doesn't make any sense. I just tried this using valgrind's massif and I'm getting a flat trace there even with high parallelism... this kinda makes me think that this might be a bug inheaptrack
rather than a leak in the application.Massif is really slow though compared to heaptrack, so to rule out some kind of heisenbug, I'm going to try a couple more heap profiling tools to be certain that's the case like
jemalloc
.Heaptrack did allow me to find a leak in our code (me being stupid and growing and endless HashMap), but then I kept going with it assuming it was reporting correct results, but it might just be wrong here.
jadamcrain closed issue #10239:
We're embedding
wasmtime
to execute plugins within our application. These plugins are defined in WIT. The application instantiates multiple instances of the plugin, and drives each instance on its own Tokio task. We started seeing slow memory growth in production. This was surprising because our application is carefully designed to have very flat memory usage. It has a fixed number of Tokio tasks that only communicate with each-other using bounded queues.When only 1 plugin executes, no memory is leaked, and the application has a flat steady-state memory usage:

When more than 1 plugin executes, the application will leak memory in WASM => HOST callbacks, e.g. 2 instances on 2 tasks was actually pretty steady for a period of time and then started leaking:

Zoomed in view of last chart:

It will sometimes start leaking right away, and other times take a while to start like the trace above. Once it starts leaking, it always continue to leak. At 4 instances (4 plugins on 4 parallel tasks) the leaks blow up immediately:

The higher the level of parallelism, the faster the leak... It feels like there's something shared between instances here that isn't thread-safe.
The leaked allocations are
realloc
that occurs when "lifting" a list type from WASM => host in a host callback.
Test Case
Not easy to replicate outside of our application ATM.
Steps to Reproduce
I'd like to upload the ZST traces for heaptrack, but github is blocking even a zip of them? They're large, 40-60 MB.
Versions and Environment
Wasmtime version or commit: 29.0.1
Operating system: Linux
Architecture: x86_64
jadamcrain commented on issue #10239:
I've used Valgrind's
massif
andjemalloc
. There are no heap leaks. Heaptrack appears to just have a bug under some unknown set of conditions that leads to those nonsensical profiles and leaked stack traces. It was a red herring that adding wasmtime to the mix triggered the bug. Who knows why... depth of the stack traces, anonymous stack frames, I have no idea why but the other heap profiling tools had no issue.The reason I first thought there was a leak in production was the we were running the application as a systemd service and the memory "usage" reported by
systemctl status
apparently includes cached file data by Linux! In the same redeployment, I added both the WASM plugin stuff + some historical logging directly to files... the growing memory usage was just Linux caching this written file data and accounting for it when reporting the memory usage. If you look at the RSS memory usage usingps/top/etc
you actually see that that part of the usage is stable.So, a bug in heaptrack combined w/ me being a systemd noob resulted in a wild goose chase =).

alexcrichton commented on issue #10239:
Oh wow, that's wild! Regardless thanks for investigating and tracking that down!
Last updated: Feb 28 2025 at 01:30 UTC