Re: pipewire memory usage

Tuesday, 14 December 2021

On 12/14/21 07:08, Dominique Martinet wrote:
...
 I've double-checked with traces in load_spa_handle/unref_handle
and it
 is all free()d as soon as the client disconnects, so there's no reason
 the memory would still be used... And I think we're just looking at some
 malloc optimisation not releasing the memory.

 To confirm, I've tried starting pipewire-pulse with jemalloc loaded,
 LD_PRELOAD=/usr/lib64/libjemalloc.so , and interestingly after the 100
 clients exit the memory stays at ~3-400MB but as soon as single new
 client connects it jumps back down to 20MB, so that seems to confirm it.
 (with tcmalloc it stays all the way up at 700+MB...) 

...
 So I guess we're just chasing after artifacts from the allocator,
and
 it'll be hard to tell which it is when I happen to see pipewire-pulse
 with high memory later on... 
It can be difficult to tell the difference between:
(a) allocator caching
(b) application usage

To help with we developed some additional tracing utilities:
https://pagure.io/glibc-malloc-trace-utils

The idea was to get a full API trace of malloc family calls and then play them back
in a simulator to evaluate the heap/arena usage when threads were involved.

Knowing the exact API calls lets you determine if you have (a), where the API calls
show a small usage but in reality RSS is higher, or (b) where the API calls show there
are some unmatched free()s and the usage is growing.

It seems like you used jemalloc and then found that memory usage stays low?

If that is the case it may be userspace caching from the allocator.

jemalloc is particularly lean with a time-decay thread that frees back to the OS
in order to reduce memory usage down to a fixed percentage. The consequence of
this is that you get latency on the allocation side, and the application has to
take this into account.

...
 From what I can see the big allocations are (didn't look at
lifetime of each
 alloc):
  - load_spa_handle for audioconvert/libspa-audioconvert allocs 3.7MB
  - pw_proxy_new allocates 590k
  - reply_create_playback_stream allocates 4MB
  - spa_buffer_alloc_array allocates 1MB from negotiate_buffers
  - spa_buffer_alloc_array allocates 256K x2 + 128K
    from negotiate_link_buffers 
On a 64-bit system the maximum dynamic allocation size is 32MiB.

As you call malloc with ever larger values the dynamic scaling will scale up to
at most 32MiB (half of a 64MiB heap). So it is possible that all of these allocations
are placed on the mmap/sbrk'd heaps and stay there for future usage until freed back.

Could you try running with this env var:

GLIBC_TUNABLES=glibc.malloc.mmap_threshold=131072

Note: See `info libc tunables`.

...
 maybe some of these buffers sticking around for the duration of the
 connection could be pooled and shared?  
They are pooled and shared if they are cached by the system memory allocator.

All of tcmalloc, jemalloc, and glibc malloc attempt to cache the userspace requests
with different algorithms that match given workloads.

-- 
Cheers,
Carlos.

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: pipewire memory usage