On 12/14/21 07:08, Dominique Martinet wrote:
I've double-checked with traces in load_spa_handle/unref_handle
and it
is all free()d as soon as the client disconnects, so there's no reason
the memory would still be used... And I think we're just looking at some
malloc optimisation not releasing the memory.
To confirm, I've tried starting pipewire-pulse with jemalloc loaded,
LD_PRELOAD=/usr/lib64/libjemalloc.so , and interestingly after the 100
clients exit the memory stays at ~3-400MB but as soon as single new
client connects it jumps back down to 20MB, so that seems to confirm it.
(with tcmalloc it stays all the way up at 700+MB...)
So I guess we're just chasing after artifacts from the allocator,
and
it'll be hard to tell which it is when I happen to see pipewire-pulse
with high memory later on...
It can be difficult to tell the difference between:
(a) allocator caching
(b) application usage
To help with we developed some additional tracing utilities:
https://pagure.io/glibc-malloc-trace-utils
The idea was to get a full API trace of malloc family calls and then play them back
in a simulator to evaluate the heap/arena usage when threads were involved.
Knowing the exact API calls lets you determine if you have (a), where the API calls
show a small usage but in reality RSS is higher, or (b) where the API calls show there
are some unmatched free()s and the usage is growing.
It seems like you used jemalloc and then found that memory usage stays low?
If that is the case it may be userspace caching from the allocator.
jemalloc is particularly lean with a time-decay thread that frees back to the OS
in order to reduce memory usage down to a fixed percentage. The consequence of
this is that you get latency on the allocation side, and the application has to
take this into account.
From what I can see the big allocations are (didn't look at
lifetime of each
alloc):
- load_spa_handle for audioconvert/libspa-audioconvert allocs 3.7MB
- pw_proxy_new allocates 590k
- reply_create_playback_stream allocates 4MB
- spa_buffer_alloc_array allocates 1MB from negotiate_buffers
- spa_buffer_alloc_array allocates 256K x2 + 128K
from negotiate_link_buffers
On a 64-bit system the maximum dynamic allocation size is 32MiB.
As you call malloc with ever larger values the dynamic scaling will scale up to
at most 32MiB (half of a 64MiB heap). So it is possible that all of these allocations
are placed on the mmap/sbrk'd heaps and stay there for future usage until freed back.
Could you try running with this env var:
GLIBC_TUNABLES=glibc.malloc.mmap_threshold=131072
Note: See `info libc tunables`.
maybe some of these buffers sticking around for the duration of the
connection could be pooled and shared?
They are pooled and shared if they are cached by the system memory allocator.
All of tcmalloc, jemalloc, and glibc malloc attempt to cache the userspace requests
with different algorithms that match given workloads.
--
Cheers,
Carlos.