Sysprof has modular data collection backends, and not everything requires linking against
libunwind.
For those not familiar with Sysprof, or profiling the desktop at large, generally a single
program is not the problem. The performance problems often exist across a number of
processes. That can be anything from a library used by multiple applications which
cumulatively waste resources, IPC across programs, thundering herds when files on disk
change, GPU usage, CPU frequency scaling, memory bandwidth, RAPL, etc.
So Sysprof has a binary logging format that is straight-forward, efficient, and allows us
to record many different types of information within a single file. That file format is
used by a number of tools in the stack from GLib, Pango, Gtk, Mutter, GNOME Shell, GJS,
various libraries, and applications on top of it. It can capture counters, stack traces,
file contents, marks, logs, and a multitude of other data frames.
These capture files can also be muxed together at any point.
Some of the modular data collectors require libunwind, many do not. For example, the
memprof collector records the backtraces from malloc/free/etc. But the GJS data-collector
can use SpiderMonkey's internal APIs to get backtraces from a SIGPROF sigaction. The
most used collector, however, is the perf collector which is just reading from a perf fd
mmap'd into a ring buffer.
The perf collector doesn't record the whole stack because the amount of time it takes
to decode a 30 second system-wide capture with DWARF/etc is so slow practically nobody
would use it.
The best profiler is the one people will use.
We have an in-tree parser for ELF that allows us to avoid a lot of extraneous code when
extracting symbols. Partially because libunwind is incredibly slow (by profiler
requirements), and partially because historically we never had to stash stack frames for
contextual unwinding.
Could we write a new data collection module that does DWARF unwinding and stashes some 8kb
of stack? Sure. Would people use it? Probably not, because again, it's so slow that
people will start profiling by intuition again which is probably the worst of all
options.
Can we write a eBPF kernel module to decode symbols there? Maybe? Can I? Probably not.
Personally, I think some libraries should not be compiled with -fno-omit-frame-pointer.
However, I think that number is much smaller than the opposite. Encryption, graphics
drivers, etc all seem like good candidates here to be explicit about performance
requirements.
Sysprof's modular *and* system-wide profiling played a significant role in how GNOME
Shell got faster over the past years. All of a sudden it's developers had a tool which
could coalesce stack traces, counters, marks, logs, display timing information and GL
command state both from apps and compositors, and track event propagation across
processes.
To my knowledge, we don't have this tooling anywhere else on Fedora. The sad part is,
those who want to casually drive by and fix performance start with "recompile the
stack with jhbuild" or I guess RPMs/koji if you're into that sort of thing.