On Wed, 2004-05-05 at 23:11, Will Cohen wrote:
I work on performance tools at Red Hat. I have been told there is
interest in tuning the desktop to improve performance. I have a number
of questions to help identify the work needed in this area. I would be
interested in any answers that people have for the following questions.
What is the set of software in the "Desktop" (executable names and/or
For the last few years I've been working on and off on the performance
of Nautilus. At this time Nautilus is a lot better performance-wise, but
we still don't fully grasp the performance properties of it. I'll try to
give you some idea of what i've been doing.
What specific performance problems have people observed so far in
desktop? For example heavy CPU or memory usage by particular
applications. Another example long latency between event and resulting
Nautilus has had various problems. One important one is the time it
takes to open a new window, other ones are startup time, time to read a
large directory, and total memory use.
What metrics were used to gauged the effect of software changes on
In general, the slowness have been on a scale that you could use a
handwatch to time it (for e.g. directory load), and at other times I've
put in printfs() to print time() at some specific points in the app.
Often you can see the performance increase by just using the app.
What performance tools have people used so far to identify
problems with desktop applications?
I use a variety of tools:
* printfs in strategic places to try to figure out what gets called,
when it gets called, and how long it takes.
* Sprinkle "access ("doing <foo>", 0) in the code, then run the app
under strace -tt, which will show you what sort of i/o is done. You can
look at the access lines in the log to see what is happening at the code
level, including timestamps.
* I've used the sampling profiler in eazel-tools in gnome cvs. This is a
sampling profiler that you LD_PRELOAD into your app. Its not perfect,
but it gives you at least some data when used with shared libs (as
opposed to gprof). It gives gprof output.
* KCachegrind. I've only used this a bit, the performance of Nautilus
while running it is pretty poor, so its hard to use.
* memprof. This is an excellent app for tracking down leaks and large
users of memory.
How well or poorly did the performance tools work in identifying the
While they did help, they are not as useful as I would like. They
require a lot of work to set up, and the presentation/data-mining
features are pretty limited.
In general, debugging desktop apps is quite different from lowlevel,
non-interactive apps. First of all they are generally much more
structurally complex, relying on many shared libraries, several
processes with various sorts of IPC and lots of file I/O and user input.
Secondly, the typical call traces are very deep (60 or even 80 frames
are not uncommon), and often highly recursive. A typical backtrace
involves several signal emissions, where each emission is on the order
of 5 function calls deep (just for the signal emission code, not the
called function). These functions are also typically the same, so they
intermingle the stack traces. Take this simplified backtrace for
signal_a_handler () - foo(); return TRUE;
g_signal_emit () - locate callback, call
caller_a() - g_signal_emit (object, "signal_a", data)
signal_b_handler () - bar(); return TRUE;
g_signal_emit () - locate callback, call
caller_b() - g_signal_emit (object, "signal_b", data)
When looking at a profile for code such as this, what you see is that
caller_a() uses a lot of time, but when you go into it to see what it
does, you end up looking at g_signal_emit, which gets called by lots of
other places like B, so its very hard to figure out what of that is
actually from the A call.
It gets even worse in the (very common) situation of the signal handler
itself emitting a signal. This creates a mutual recursion into the
g_signal_emit() function similar to the a+b case above:
g_signal_emit ("signal b")
g_signal_emit ("signal a")
When stepping into the g_signal_emit from signal_a_handler it seems like
that calls signal_a_handler again, since thats another child of
g_signal_emit. Profilers just don't handle this very well.
Here is a couple of issues I have with current profiling tools:
* They have no way of profiling i/o and seeks. A lot of our problems is
due to reading to many files, reading files to often, or paging in
data/code. Current profilers just don't show this at all.
* Little support for tracking issues wrt IPC calls between different
processes. Whether this be X inter-client calls for e.g. DnD, or Corba
calls to some object.
* Poor visialization support of the data, especially with mutually
recursive calls as described above.
Generally, all of the fixes I've done has been of the type "Don't do
this incredibly stupid thing". Whether that has been O(n^2) algorithm
due to treating a list as an arrays in loops, reading the same file over
and over again, or something else. I've never *once* had to count cycles
in some hot function or anything like that, Its always about adding a
cache, doing something in a different way, or just avoid doing the
expensive stupid thing. However, the stupidities are burried in lots and
lots of code, and finding them in all the data a profiler spews out is
the real hard part.
Were benchmarks used to test performance of desktop applications? If
so, what type of benchmarks were used (e.g. micro benchmarks or
measuring the amount of time required to do something in an
Typically not. They were all ad-hoc testing by the developer as part of
trying to track down some specific slowness.
Were the benchmarks runnable in batch mode without human assistance?
Alexander Larsson Red Hat, Inc
He's a superhumanly strong flyboy senator on the wrong side of the law. She's
a time-travelling motormouth detective with only herself to blame. They fight