* Ben Cotton:
* Meta builds all its libraries and executables with
-fno-omit-frame-pointer by default. Internal benchmarks did not show
significant impact on performance when omitting the frame pointer for
two of our most performance intensive applications.
They probably saw *significant* (in the statistics sense) performance
regressions, but deemed them acceptable.
* Firefox recently landed a change to preserve the frame pointer in
all jitted code
(
https://bugzilla.mozilla.org/show_bug.cgi?id=1426134). No significant
decrease in performance was observed.
That could because they have to do stack walking as part of regular
operation. So I'm not sure if this an appropriate comparison. It's
also possible that the JIT compiler had issues that prevented it from
taking full advantage of a larger register file.
What you see on the Mozilla ticket is stuff that broke with the slightly
smaller register set. That is going to bite some packages on i686.
(It's not going to impact x86-64, I think.)
* [
https://lwn.net/Articles/680985 LBR] - New Intel CPUs have a
feature that gives you source and target addresses for the last 16 (or
32, in newer CPUs) branches with no overhead. It can be configured to
record only function calls and to be used as a stack, which means it
can be used to get the stack trace. Sadly, you only get the last X
calls, and not the full stack trace, so the data can be very
incomplete. On top of that, many Fedora users might still be using
CPUs without LBR support which means we wouldn't be able to assume
working profilers on a Fedora system by default.
Do you really need more than five or so *physical* stack frames during
profiling, to figure out what is going on? Graphs generated from DWARF
unwinding typically will show logical stack frames from inlining, too,
and appear much deeper.
Thanks,
Florian