On 6/17/22 15:10, Daan De Meyer via devel wrote:
> they hit everyone, not just those who are highly dependent on
the
> profiling tools the proposal is concerned about.
The kernel benchmarks were added as an example of openly available data we could find on
the potential impact of frame pointers. Note that the email from Mel Gorman is all we have
to go on. Unfortunately the original data from the benchmarks is gone so I can't try
to reproduce them. I've emailed Mel to see if he still has the benchmarks stored
somewhere so we can perhaps try to reproduce the results.
I've added a clarification to the change proposal that we don't intend to
actually compile the kernel with frame pointers, since the kernel is already built with
ORC support and this works well so there's nothing to really be gained by building the
kernel with frame pointers. That means we won't see the kernel regressions that were
reported by the Suse benchmarks.
Unfortunately, there's no readily available benchmarks that I've been able to
find that would show the exact impact of frame pointers on common Fedora workflows. The
Phoronix benchmark suite could be used but that would imply doing a mass rebuild with
frame pointers before we could actually run it and measure the impact.
I don’t think it is a good idea to do something that would regress
performance until Fedora is competitive when it comes to real-world
performance (boot time, latency in GUI applications, etc). Synthetic
benchmarks are less important.
Also, as mentioned in the proposal, all our internal services at Meta
are built with frame pointers enabled. We did canaries a few years ago on some of our most
CPU intensive services to see if it would make sense to build them without frame pointers,
and found that there were no significant enough wins to be had to justify the loss in
continuous profiling data caused by building without frame pointers
Can you provide actual numbers here?
> (Are you referring to a novel kernel-resident tool?)
Unfortunately, no, there's no in-kernel DWARF unwinder due to the complexity
involved. Instead, the kernel uses ORC and has an unwinder for that. Adding ORC support to
all of Linux userspace so that we can unwind it in the kernel isn't likely to happen,
since all tooling would have to be changed to support ORC.
How difficult would it be to do exactly this?
> The proposal doesn't characterize the "reasonably low
overhead" that
> this operation targets. That makes it hard to judge the tradeoffs.
Characterizing the impact would mean rebuilding most of the distro with frame pointers
and running a comprehensive benchmark suite on it. Doing this will be a rather involved
process. If you know of any other representative benchmark suites that we could run that
wouldn't require rebuilding most of the distro, we could look into running these with
and without frame pointers to measure the impact.
> If typing that option were a hardship, it could be made default on
> Fedora. With broad debuginfod auto-downloading capability, maybe it's
> worth considering.
The issue with DWARF isn't that we have to add an extra option to perf, it's that
without an in kernel DWARF unwinder (which is very unlikely to ever happen as discussed
above), it's expensive to use DWARF for stacktrace unwinding, as we have to copy the
entire stack and unwind it in user space, which adds substantial overhead. This means we
can't use it for continuous profiling.
Would it be possible for the kernel to somehow drop into ring 3, use
whatever library userspace would use to unwind the stack, and then use
some magic syscall to return to kernel space? Or is the problem that
perf runs in NMI context and so very much cannot do anything fancy?
--
Sincerely,
Demi Marie Obenour (she/her/hers)