Re: Schedule for Tuesday's FESCo Meeting (2023-01-03)

Saturday, 7 January 2023

Richard W.M. Jones wrote:
...
 The problem is you're confusing general gains and gains in
 specific scenarios. 
But the thing is that a gain in some specific scenario is a lot less useful 
than a general gain. And the latter is usually not had through profiling, 
but through improvements in toolchain optimizations. -fomit-frame-pointer 
was one such improvement that you have now successfully destroyed for all 
Fedora users.

...
 Perf + flamegraphs are such a useful tool that we managed to double
 performance (ie. ~ 100% gain) in one particular network server case
 that we investigated a few years ago.  This was by spotting that the
 kernel was writing to an MSR (hardware register) which was really
 slow, and as it wasn't necessary we just got rid of it.

 For that one use case - an incredible performance gain!  Does this
 mean everyone sees their machines double in speed?  Of course not. 
And that is why that improvement is much less impressive than it sounds at 
first. Chances are it helps only a handful users, in a handful situations, 
and even for those users, the overall improvement is not going to be 100% 
because they will also be using other software than the one you profiled and 
optimized.

...
 Will we be able to say that "Fedora got N% faster" in two
years?
 Not at all - it depends entirely what you use Fedora for. 
Hence this makes the claims made by the change proponents entirely 
unrealistic and impossible to ever verify. We are hitting the end users with 
an overall performance penalty in exchange of potential performance 
improvements that are impossible not only to predict, but even to quantify 
after the fact, i.e., the claim that the latter will more than compensate 
for the former is completely unsubstantiated.

...
 The overhead is also a real thing.  There's a few percent
overhead
 everywhere for enabling frame pointers because every stack frame entry
 and exit involves a couple of extra instructions. 
Exactly.

...
 Anyway I'd really urge you to play with these tools before
judging
 this proposal: https://www.brendangregg.com/flamegraphs.html 
KCachegrind, using Valgrind with the Callgrind or Cachegrind tool, gives me 
more information than that even without frame pointers, and it is actually 
reliable because it dynamically instruments the code and traces every single 
instruction instead of just taking random samples and hoping it did not miss 
anything important. It is also much more reproducible because it uses a 
mathematical model for the CPU cycles instead of a wallclock time sample 
that depends not only on your particular CPU, but also on things such as 
background tasks, thermal throttling, etc. Yes, it is slower (up to a factor 
~50), but only for the developer doing the profiling, and as explained 
above, the reported cycle counts do not depend on the wallclock time anyway.

        Kevin Kofler

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

Re: Schedule for Tuesday's FESCo Meeting (2023-01-03)