Kevin Kofler via devel wrote:
Demi Marie Obenour wrote:
> Valgrind is not helpful for profiling production workloads. It is
> too slow and will not provide an accurate indication of where the
> time is being spent. That requires a sampling profiler.
IMHO, Valgrind (with the Callgrind or Cachegrind profiles) has a pretty
good cost model. Is it slow? Yes, definitely. (Count up to a factor 50
slowdown for CPU-bound code.) Does it tell you where the bottlenecks are?
In my experience, it does. I have even run entire JVMs through Valgrind
Callgrind in order to find bottlenecks in the native C/C++ JNIs. (It will
not help with the Java code, of course. You need a Java profiler for
that.) It has always found the problem spots, where fixing them made the
program faster. So, while I can understand the "too slow" part, I cannot
agree with your "will not provide an accurate indication of where the time
is being spent" assertion. It is quite the opposite: sampling will
necessarily be less accurate because it can only take snapshots at certain
intervals whereas Valgrind monitors the entire program execution at all
times.
PS: Now, what I have NOT done with Valgrind, and what I agree Valgrind is
not designed for, is what Meta is apparently doing with Perf, i.e., running
their production server services with profiling always enabled. I normally
run CLI tools or unit tests in Valgrind, which is what it works with best.
If I have to profile or debug a server in Valgrind, I need to run a
dedicated local instance for the test. But usually, it is better to just
write a non-server test program that simulates the workload without a need
for a client.
What Meta is doing can indeed only reasonably be done with a sampling
profiler, with additional restrictions (in particular, they state that
Perf's support to drop back to userspace for DWARF unwinding is too slow for
them), but is that a common enough use case that it warrants globally
degrading Fedora performance for all users, most of whom do not use Perf
this way (if they even use it at all)?
I see -fno-omit-frame-pointer as a crude workaround for lack of proper
unwinding support, not something you want to ever use in production.
Kevin Kofler