Performance tuning the Fedora Desktop

Re: Fedora-desktop-list Digest,...

window managers

William Cohen

Wednesday, 5 May 2004 Wed, 5 May '04

4:11 p.m.

I work on performance tools at Red Hat. I have been told there is interest in tuning the desktop to improve performance. I have a number of questions to help identify the work needed in this area. I would be interested in any answers that people have for the following questions. What is the set of software in the "Desktop" (executable names and/or RPM packages)? What specific performance problems have people observed so far in the desktop? For example heavy CPU or memory usage by particular applications. Another example long latency between event and resulting action. What metrics were used to gauged the effect of software changes on performance? What performance tools have people used so far to identify performance problems with desktop applications? How well or poorly did the performance tools work in identifying the performance problem? Were benchmarks used to test performance of desktop applications? If so, what type of benchmarks were used (e.g. micro benchmarks or measuring the amount of time required to do something in an application program)? Were the benchmarks runnable in batch mode without human assistance? -Will

Show replies by date

Ed Mack

Wednesday, 5 May Wed, 5 May

4:21 p.m.

As a small note, I've found gaim to take a very excessive amount of time to bring up it's first connecting dialogue. Although, does this more fall into Gaim's domain? On Wed, 2004-05-05 at 22:11, Will Cohen wrote:

...

kardarisk＠upnet.gr

4:30 p.m.

On Thursday, 6 May 2004, Will Cohen wrote:

...

I think that up2date needs an optimization. More info later...

David Holden

Friday, 7 May Fri, 7 May

9:54 a.m.

On Wednesday 05 May 2004 22:11, Will Cohen wrote:

...

From my point of view rpm -qa | grep kde would give a good starting list

+ emacs + java + mozilla (now seems faster than firefox strangely enough)

...

What specific performance problems have people observed so far in the desktop? For example heavy CPU or memory usage by particular applications. Another example long latency between event and resulting action.

heavy disk usage kill my machines performance, I work on a laptop with a fast CPU and plenty of ram.

...

What metrics were used to gauged the effect of software changes on performance?

One point I will note is that KDE seems a lot speedier now I've upgraded to 3.2 line. Dave.

...

What performance tools have people used so far to identify performance problems with desktop applications? How well or poorly did the performance tools work in identifying the performance problem? Were benchmarks used to test performance of desktop applications? If so, what type of benchmarks were used (e.g. micro benchmarks or measuring the amount of time required to do something in an application program)? Were the benchmarks runnable in batch mode without human assistance? -Will

-- Dr. David Holden. (Systems Developer) Crystallography Journals Online: <http://journals.iucr.org> Thanks in advance:- Please avoid sending me Word or PowerPoint attachments. See: <http://www.fsf.org/philosophy/no-word-attachments.html> UK Privacy (R.I.P) : http://www.stand.org.uk/commentary.php3 Public GPG key available on request. -------------------------------------------------------------

Julien Olivier

10:08 a.m.

On Wed, 2004-05-05 at 22:11, Will Cohen wrote:

...

Here is a quick list of apps that seem to take too long to start for me: openoffice.org gedit gnome-terminal gnome-background-properties And, from time to time, clicking on the main menu takes ages to display the menu (where ages really mean up to 10 seconds) Hope this helps a little... -- Julien Olivier <julo(a)altern.org>

Alexander Larsson

Tuesday, 11 May Tue, 11 May

7:30 a.m.

On Fri, 2004-05-07 at 17:08, Julien Olivier wrote:

...

On Wed, 2004-05-05 at 22:11, Will Cohen wrote: And, from time to time, clicking on the main menu takes ages to display the menu (where ages really mean up to 10 seconds)

I think the panel loads the menu "on-demand" to save memory, and once its loaded its forgotten if not used for a while. Or something like that. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander Larsson Red Hat, Inc alexl(a)redhat.com alla(a)lysator.liu.se He's a benighted soccer-playing jungle king with no name. She's a time-travelling renegade widow with an evil twin sister. They fight crime!

Mark McLoughlin

7:30 a.m.

Hi, On Tue, 2004-05-11 at 13:30, Alexander Larsson wrote:

...

On Fri, 2004-05-07 at 17:08, Julien Olivier wrote: > On Wed, 2004-05-05 at 22:11, Will Cohen wrote: > And, from time to time, clicking on the main menu takes ages to display > the menu (where ages really mean up to 10 seconds) I think the panel loads the menu "on-demand" to save memory, and once its loaded its forgotten if not used for a while. Or something like that.

I haven't quite figured out whats going on here yet. There are a number of options: 1) We check and re-read the menu on click 2) The menu has been swapped out by that stage 3) Destroying the GTK+ menu takes a long time (this was a problem at one point with GTK+ 2.3.x, but that got fixed) It needs profiling. Cheers, Mark.

Dan Williams

9:26 a.m.

Hi, The Menu VFS code has to do a number of things when the panel first reads it as well: 1) Locate and read all .menu files in /etc/xdg/menus 2) Parse these menus and locate all .desktop/.directory directories 3) Parse all .desktop/.directory files from (2) to find out their categories, names, and OnlyShowIn values 4) Create the actual menu structure from the data in (2) and (3) There is definitely room for improvement and optimization here in the menu code, but 1 - 4 have to happen in any case, which includes a lot of hitting the disk. So if you're doing something disk intensive at the time you first click, it might take a couple seconds. Also, if the directories or files in (1) and (2) get touched or changed, the whole menu structure is currently rebuilt to make it aware of changes, which means invalidating a lot of cache and reloading many bits of info. Dan On Tue, 2004-05-11 at 13:30 +0100, Mark McLoughlin wrote:

...

Hi, On Tue, 2004-05-11 at 13:30, Alexander Larsson wrote: > On Fri, 2004-05-07 at 17:08, Julien Olivier wrote: > > On Wed, 2004-05-05 at 22:11, Will Cohen wrote: > > And, from time to time, clicking on the main menu takes ages to display > > the menu (where ages really mean up to 10 seconds) > > I think the panel loads the menu "on-demand" to save memory, and once > its loaded its forgotten if not used for a while. Or something like > that. I haven't quite figured out whats going on here yet. There are a number of options: 1) We check and re-read the menu on click 2) The menu has been swapped out by that stage 3) Destroying the GTK+ menu takes a long time (this was a problem at one point with GTK+ 2.3.x, but that got fixed) It needs profiling. Cheers, Mark. -- Fedora-desktop-list mailing list Fedora-desktop-list(a)redhat.com http://www.redhat.com/mailman/listinfo/fedora-desktop-list

rada and gus

Saturday, 8 May Sat, 8 May

4:10 a.m.

Open office takes an inordinately long time to open. The opening speed has not improved significantly as far as I can tell since RH 8 through RH9 and FC-1. Mozilla mail and Evolution take quite a bit longer to open than kmail which is fast, but Mozilla and Evolution have more functionality. I use kdm as the display manager and kde as desktop. I have not tired with gdm or gnome. I did not use any tools to measure the performance, this is just my gestalt impression. Gus On Thursday 06 May 2004 02:41, Will Cohen wrote:

...

What metrics were used to gauged the effect of software changes on performance?

.....

Havoc Pennington

11:11 a.m.

On Wed, 2004-05-05 at 17:11, Will Cohen wrote:

...

What is the set of software in the "Desktop" (executable names and/or RPM packages)?

The default desktop is GNOME + Mozilla + OpenOffice.org + Evolution. gnome-*, evolution, mozilla, ooffice. Fedora Core also includes KDE (most of /usr/bin/k*), Epiphany, XFCE, and other choices.

...

Startup/login times are the most visible latencies, but also repaint (during opaque window resize for example, or opening a menu).

...

What performance tools have people used so far to identify performance problems with desktop applications?

speedprof, strace -t, memprof, printf-with-clock()/gettimeofday(), Soeren's kernel module. Havoc

Daniel Veillard

Tuesday, 11 May Tue, 11 May

9:40 a.m.

On Sat, May 08, 2004 at 12:11:47PM -0400, Havoc Pennington wrote:

...

> What performance tools have people used so far to identify performance > problems with desktop applications? speedprof, strace -t, memprof, printf-with-clock()/gettimeofday(), Soeren's kernel module.

What I'm somewhat still missing is code coverage analysis, this would be useful for regression tests analysis but also to try to isolate code that is never run in "normal" scenarios. Daniel -- Daniel Veillard | Red Hat Desktop team http://redhat.com/ veillard(a)redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

Warren Togami

Sunday, 9 May Sun, 9 May

7:39 a.m.

Will Cohen wrote:

...

On a somewhat related topic of desktop performance, recently fedora.us Extras has begun experimenting with -Os rather than the standard -O2 optimization for our firefox & thunderbird packages. So far it seems to be working very well, with noticably smaller binary RPMS and runtime memory footprint of these two very large applications. I asked gcc developers if they had a guess about which -O2 and -Os would be "faster" for large applications like firefox & thunderbird. They generally replied that they have no idea, because compiler optimization is an inexact science. All kinds of other factors come into play like smaller memory footprint (less swapping), smaller code size (maybe better use of CPU cache). Have there been any past discussions about changing the standard compiler optimization for perhaps FC3?

...

https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=100423 One long standing desktop bug that has annoyed me personally is Evolution's extreme performance problems when dealing with a large amount of mail in dozens of IMAP folders. It literally takes MINUTES for evolution-1.4.x or evolution-1.5.x to start and read the first message in any IMAP folder due to this problem, while Mozilla, Thunderbird and KMail show new mail almost instantly. (This report also describes an unrelated 100% CPU usage in resizing panes horizontally.) Aside from Evolution, I personally see rpm's huge memory footprint as being a huge problem. Recently I did a full upgrade of FC1 "Everything" to FC2 in a single rpm transaction on a box with 256MB RAM. It took almost 7 hours due to a massive amount of swapping as rpm's memory footprint climbed past 400MB. We really need to improve this situation. I would ask for optimization of rpm's memory footprint to be a high priority for FC3 timeframe, but it may be too scary of a problem. =( On a somewhat related note, the rhn-applet uses more than 30MB of virtual memory. That is just WAY too big. Also look at its CPU time after a few days running. The combined time of it doing *something* seems a bit too much IMHO. Aside from individual applications that need fixing for severe performance problems like the above examples, I see our current desktop software has having poor or lacking behavior in the area of application usage feedback as being a severe usability weakness. Currently we have somewhat acceptable Application Startup Feedback in both GNOME and KDE when programs are launched via panel or menu launchers from .desktop files. The cursor changing to an hour glass or otherwise showing motion and activity when you launch "mozilla" gives the user the feeling that "something is happening" and they must wait. Without application startup feedback, users click on the launcher several more times, and bad things happen. Application Startup Feedback is today not perfect in both GNOME and KDE. It all cases that I am aware of, launching an application from another (i.e. URL handler in gaim) does not trigger the mouse cursor to show activity. This is somewhat related to what I feel is another huge related weakness in our current desktop software: Application Busy Feedback. Within applications, users expect feedback from various operations to indicate that various apps, or parts of apps are busy doing something. Windows seems to have two levels of "busy" feedback. One with the tiny hourglass next to the pointer, and another with the entire cursor turning into an hourglass. I personally see that is quite effective when applications embrace this type of functionality in a fine-grained way. I am not a desktop developer, so I don't know much about the technical guts under the things I described here. Any explanation, links to specifications, and mention of future development related to Application Feedback would be appreciated. Warren Togami wtogami(a)redhat.com

Owen Taylor

10:56 a.m.

On Sun, 2004-05-09 at 08:39, Warren Togami wrote:

...

Will Cohen wrote: > I work on performance tools at Red Hat. I have been told there is > interest in tuning the desktop to improve performance. I have a number > of questions to help identify the work needed in this area. I would be > interested in any answers that people have for the following questions. > On a somewhat related topic of desktop performance, recently fedora.us Extras has begun experimenting with -Os rather than the standard -O2 optimization for our firefox & thunderbird packages. So far it seems to be working very well, with noticably smaller binary RPMS and runtime memory footprint of these two very large applications. I asked gcc developers if they had a guess about which -O2 and -Os would be "faster" for large applications like firefox & thunderbird. They generally replied that they have no idea, because compiler optimization is an inexact science. All kinds of other factors come into play like smaller memory footprint (less swapping), smaller code size (maybe better use of CPU cache). Have there been any past discussions about changing the standard compiler optimization for perhaps FC3?

Well, I think you've described a wonderful project that someone could do ... recompile the desktop packages with -Os and do some timing. That's the only way we'd know whether we should change the optimization flags or not. Regards, Owen

Warren Togami

Monday, 10 May Mon, 10 May

1:32 a.m.

Owen Taylor wrote:

...

>On a somewhat related topic of desktop performance, recently fedora.us >Extras has begun experimenting with -Os rather than the standard -O2 >optimization for our firefox & thunderbird packages. So far it seems to >be working very well, with noticably smaller binary RPMS and runtime >memory footprint of these two very large applications. I asked gcc >developers if they had a guess about which -O2 and -Os would be "faster" >for large applications like firefox & thunderbird. They generally >replied that they have no idea, because compiler optimization is an >inexact science. All kinds of other factors come into play like smaller >memory footprint (less swapping), smaller code size (maybe better use of >CPU cache). > >Have there been any past discussions about changing the standard >compiler optimization for perhaps FC3? Well, I think you've described a wonderful project that someone could do ... recompile the desktop packages with -Os and do some timing. That's the only way we'd know whether we should change the optimization flags or not. Regards, Owen

I just noticed today that the recent FC2 kernels are built with -Os rather than -O2. Just another data point for now. Warren

William Cohen

Tuesday, 11 May Tue, 11 May

3:08 p.m.

Warren Togami wrote:

...

It is quite possible that the saving in space could give significant performance benefits. Code fitting better in instruction cache, fewer page misses, and paging could be a large win than getting some inner loop to run faster.

...

Have there been any past discussions about changing the standard compiler optimization for perhaps FC3?

I have seen some discussions on -Os vs -O2, but I haven't seen actual performance comparisons.

...

> What specific performance problems have people observed so far in the > desktop? For example heavy CPU or memory usage by particular > applications. Another example long latency between event and resulting > action. https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=100423 One long standing desktop bug that has annoyed me personally is Evolution's extreme performance problems when dealing with a large amount of mail in dozens of IMAP folders. It literally takes MINUTES for evolution-1.4.x or evolution-1.5.x to start and read the first message in any IMAP folder due to this problem, while Mozilla, Thunderbird and KMail show new mail almost instantly. (This report also describes an unrelated 100% CPU usage in resizing panes horizontally.) Aside from Evolution, I personally see rpm's huge memory footprint as being a huge problem. Recently I did a full upgrade of FC1 "Everything" to FC2 in a single rpm transaction on a box with 256MB RAM. It took almost 7 hours due to a massive amount of swapping as rpm's memory footprint climbed past 400MB. We really need to improve this situation. I would ask for optimization of rpm's memory footprint to be a high priority for FC3 timeframe, but it may be too scary of a problem. =(

Jeff Johnson has mentioned the performance problems with the rpm internals to me before.

...

On a somewhat related note, the rhn-applet uses more than 30MB of virtual memory. That is just WAY too big. Also look at its CPU time after a few days running. The combined time of it doing *something* seems a bit too much IMHO.

30 MB? Do you have anymore details on that, e.g. space for code and data? Where there just a huge number of libraries being pulled in?

...

Aside from individual applications that need fixing for severe performance problems like the above examples, I see our current desktop software has having poor or lacking behavior in the area of application usage feedback as being a severe usability weakness. Currently we have somewhat acceptable Application Startup Feedback in both GNOME and KDE when programs are launched via panel or menu launchers from .desktop files. The cursor changing to an hour glass or otherwise showing motion and activity when you launch "mozilla" gives the user the feeling that "something is happening" and they must wait. Without application startup feedback, users click on the launcher several more times, and bad things happen. Application Startup Feedback is today not perfect in both GNOME and KDE. It all cases that I am aware of, launching an application from another (i.e. URL handler in gaim) does not trigger the mouse cursor to show activity. This is somewhat related to what I feel is another huge related weakness in our current desktop software: Application Busy Feedback. Within applications, users expect feedback from various operations to indicate that various apps, or parts of apps are busy doing something. Windows seems to have two levels of "busy" feedback. One with the tiny hourglass next to the pointer, and another with the entire cursor turning into an hourglass. I personally see that is quite effective when applications embrace this type of functionality in a fine-grained way. I am not a desktop developer, so I don't know much about the technical guts under the things I described here. Any explanation, links to specifications, and mention of future development related to Application Feedback would be appreciated.

Response time for actions has come up in a number of the responses to my mail. If the response was very fast, then the feedback wouldn't be an issue. However, given the response times, it appears that the system isn't reacting to the input. Some developers have used strategic printf and accesses to get data out like how long it took to get from here to there. Another thing we might consider is the ability to start and stop oprofile sampling at particular places in code. For example, instrument the code to start oprofile sampling on a particular action and then stop oprofile sampling on another event. This would avoid interesting data getting buried in the long term data. Using oprofile in this manner would give a better picture of what sections of code are getting exercised for certain actions. -Will

Daniel Veillard

3:44 p.m.

On Tue, May 11, 2004 at 04:08:21PM -0400, Will Cohen wrote:

...

>On a somewhat related note, the rhn-applet uses more than 30MB of >virtual memory. That is just WAY too big. Also look at its CPU time >after a few days running. The combined time of it doing *something* >seems a bit too much IMHO. 30 MB? Do you have anymore details on that, e.g. space for code and data? Where there just a huge number of libraries being pulled in?

Well, first there is Python plus all the bindings for GTK/Glade/Gnome/panel http/https/yum/apt/xml-rpc modules. There is also an internal set of current header informations from the RPM database to avoid seeking the rpmdb constantly. The CPU time can grow very fast if the applet is in blinking/fade mode (my request for dropping this feature was refused) there is also a non neglectible startup CPU consumption setting up the environment and scanning the installed rpm database for the current state. Anyway from a purely desktop POV, I think the applet must go, the machine should be up2date and if not the report should go to the sysadmin, not to the user. Daniel -- Daniel Veillard | Red Hat Desktop team http://redhat.com/ veillard(a)redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/

William Cohen

Thursday, 13 May Thu, 13 May

10:44 a.m.

Warren Togami wrote:

...

Ick, 30MB and 16MB resident for the rhn-applet? I took a quick look at the rhn-applet-gui on RHEL3 and FC2-test 3. The applet has definitely gotten bloated. The RHN people just sit around the corner from me I might point this out to them. -Will On RHEL3 PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 2227 wcohen 25 10 12896 6872 2512 S N 0.0 2.7 21:13 1 rhn-applet-gu On FC2-test3 PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 9664 wcohen 25 10 29592 16m 22m S 0.0 3.2 0:00.45 rhn-applet-gui This applet is pulling in lots of libraries. Just counting the number of memory maps, no consideration about the amount of space for each map: RHEL 3 236 map entries 106 shared library code ( grep "r-xp" /tmp/rhn-app.map |grep ".so"|wc) 106 shared library data ( grep "rw-p" /tmp/rhn-app.map |grep ".so"|wc) 107 executable (grep "r-xp" /tmp/rhn-app.map |wc) 124 rw data (grep "rw-p" /tmp/rhn-app.map |wc) 2 ro data (grep "r--p" /tmp/rhn-app.map |wc) FC2-test3 270 map entries 115 shared library code ( grep "r-xp" /tmp/rhn-app.map |grep ".so"|wc) 115 shared library data ( grep "rw-p" /tmp/rhn-app.map |grep ".so"|wc) 117 executables (grep "r-xp" /tmp/rhn-app.map.fc2 |wc) 139 rw data (grep "rw-p" /tmp/rhn-app.map |wc) 10 ro data (grep "r--p" /tmp/rhn-app.map |wc) Files mapped into FC2-test3 but don't appear to be listed in RHEL3 +/lib/libcom_err.so.2.1 +/lib/libnss_files-2.3.3.so +/lib/libselinux.so.1 +/lib/tls/librt-2.3.3.so +/usr/lib/libasound.so.2.0.0 +/usr/lib/libgnome-keyring.so.0.0.0 +/usr/lib/libgssapi_krb5.so.2.2 +/usr/lib/libIDL-2.so.0.0.0 +/usr/lib/libk5crypto.so.3.0 +/usr/lib/libkrb5.so.3.2 +/usr/lib/libORBit-imodule-2.so.0.0.0 +/usr/lib/python2.3/lib-dynload/binascii.so +/usr/lib/python2.3/lib-dynload/md5module.so +/usr/lib/python2.3/lib-dynload/_random.so +/usr/lib/python2.3/lib-dynload/_ssl.so +/usr/lib/python2.3/site-packages/_xmlplus/parsers/pyexpat.so +/usr/X11R6/lib/libXcursor.so.1.0.2

Soren Pedersen

Monday, 10 May Mon, 10 May

4:51 a.m.

Hi Will

...

How well or poorly did the performance tools work in identifying the performance problem?

I think profiling CPU usage at the desktop level has two important properties: 1 A call graph is essential 2 The data don't have to be very accurate Ad 1: The desktop CPU problems are generally algorithmic in nature. The big improvements come from fixing O(n^2) algorithms and from adding caching and other high-level optimizations. To do this it is essential to know *why* something time-consuming is being done, so that you can in the best case change the algorithm to not actually do it anymore. Ad 2: Since you are working on high-level optimizations, you need to know stuff like "30% in metacity" and get a rough break-down of those 30%. The profiler must not be so intrusive that the applications become unusable, but slightly skewed data is not a disaster. This high-level optimization is in contrast to tuning of inner loops, where the properties are reversed: 1 In which function do we spend the time 2 What, exactly, is the CPU doing. You want to know about cache misses and divisions and branch predictions and such things. You want to know in what lines of source code the time is spent. In this case you generally don't try to stop doing it, you try to do it faster. The sysprof profiler, which can be checked out of GNOME cvs, is clearly aiming at the first kind of profiling. Sysprof works with a kernel module that 50 times per second generates a stacktrace of the process in the "current" variable, unless the pid of that process is 0. A userspace application then reads those stacktraces and presents the information graphically in lists and trees. So it is a statistical, sampling profiler. The kernel code probably reveals that I am not an experienced kernel hacker. Generally I worked from various driver writing guides I found on the net, and I consider it quite likely to break on more exotic kernels, where "exotic" means different from mine. Its killer feature I think is the presentation of the data. For each function you can get a complete break-down of the children in which that function spends its time. This even works with recursion, including mutual recursion. Generally it never reports a function as calling itself, instead it combines the numbers correctly. The not completely trivial details would make this mail much longer. That you can change the view of the data quickly makes it possible to get a good high-level overview of the performance characteristics of the system. A different property sysprof has is that it is fairly easy to get running. Just install a kernel module and start the application and you are set. I found oprofile a bit more difficult to get started with. It seems to me that since oprofile probably reports more and better data than my kernel module, we should try and get the graphical presentation from sysprof to present oprofile data. It shouldn't be too difficult to do this; the presentation code was lifted from the memprof/speedprof profiler and is quite independent of the rest of the profiler. (Actually you could argue that the presentation code pretty much _is_ the entire profiler). Another thing that might be nice is a library that would allow symbol lookup in binaries. I spent quite a bit of time whacking the memprof code to deal with prelinked binaries, and I am not too confident I got it completely right. Soeren

William Cohen

11:12 a.m.

Soeren Sandmann Pedersen wrote:

...

Hi Will >How well or poorly did the performance tools work in identifying the >performance problem? I think profiling CPU usage at the desktop level has two important properties: 1 A call graph is essential 2 The data don't have to be very accurate Ad 1: The desktop CPU problems are generally algorithmic in nature. The big improvements come from fixing O(n^2) algorithms and from adding caching and other high-level optimizations. To do this it is essential to know *why* something time-consuming is being done, so that you can in the best case change the algorithm to not actually do it anymore.

The algorithms selected have a huge impact on performance. However, it is not always clear that the algorithm selected is wrong until the code is used. Data structures have different strengths, e.g. cheap to index and fetch from an array, but it expensive to insert elements into beginning of array.

...

Ad 2: Since you are working on high-level optimizations, you need to know stuff like "30% in metacity" and get a rough break-down of those 30%. The profiler must not be so intrusive that the applications become unusable, but slightly skewed data is not a disaster.

Yes, low overhead is more important than absolute accuracy. I think for right now the tuning is looking for the "low hanging fruit". Whether the profiler says that something take 30% or 33% is not going to make a big difference. For the most part just want to point out the major resource hogs. It would painful for users of the GUI on the desktop to be slowed by emulation, plus users might do things different if the speed is too different.

...

This high-level optimization is in contrast to tuning of inner loops, where the properties are reversed: 1 In which function do we spend the time 2 What, exactly, is the CPU doing. You want to know about cache misses and divisions and branch predictions and such things. You want to know in what lines of source code the time is spent. In this case you generally don't try to stop doing it, you try to do it faster.

OProfile can certainly provide information on cache misses, branch predictions, and other performance monitoring events.

...

The sysprof profiler, which can be checked out of GNOME cvs, is clearly aiming at the first kind of profiling. Sysprof works with a kernel module that 50 times per second generates a stacktrace of the process in the "current" variable, unless the pid of that process is 0. A userspace application then reads those stacktraces and presents the information graphically in lists and trees.

The oprofile support in Fedora Core 2 test3 has a similar mechanism to walk to the stack, but it typically uses the performance monitoring hardware to trigger the sampling. It only works for x86 (other processors do not include frame pointers). You might want to take a look at it. It won't work for hugemem kernels because there are separate address spaces for user and kernel mode, but I imagine for most desktop work people are not using hugemem kernels. On Pentium4 and Pentium M there are performance monitoring events that count calls, so the sampling can be done based on the number of calls. This may be more desirable than a time-based samples. However, one drawback of this statistical call grap information is one ends up with a call graph forest rather than a call graph tree. The sampling will miss the lone call that causes a lot of work unless the code happens to walk far enough up the stack. Does the sysprof stack tracer you use walked the entire user stack each time it takes a sample?

...

So it is a statistical, sampling profiler. The kernel code probably reveals that I am not an experienced kernel hacker. Generally I worked from various driver writing guides I found on the net, and I consider it quite likely to break on more exotic kernels, where "exotic" means different from mine. Its killer feature I think is the presentation of the data. For each function you can get a complete break-down of the children in which that function spends its time. This even works with recursion, including mutual recursion. Generally it never reports a function as calling itself, instead it combines the numbers correctly. The not completely trivial details would make this mail much longer. That you can change the view of the data quickly makes it possible to get a good high-level overview of the performance characteristics of the system. A different property sysprof has is that it is fairly easy to get running. Just install a kernel module and start the application and you are set. I found oprofile a bit more difficult to get started with.

oprofile has been more difficult to set up in the past. However, pretty much one can just install an RH smp kernel, boot the RH smp kernel, "opcontrol --setup --no-vmlinux; opcontrol --start", and one has profiling for user code. There is still room for improvement.

...

It seems to me that since oprofile probably reports more and better data than my kernel module, we should try and get the graphical presentation from sysprof to present oprofile data. It shouldn't be too difficult to do this; the presentation code was lifted from the memprof/speedprof profiler and is quite independent of the rest of the profiler. (Actually you could argue that the presentation code pretty much _is_ the entire profiler).

I will take a look at the sysprof to see how it presents data.

...

Another thing that might be nice is a library that would allow symbol lookup in binaries. I spent quite a bit of time whacking the memprof code to deal with prelinked binaries, and I am not too confident I got it completely right. Soeren

Thanks for the comments. -Will

Soren Pedersen

Wednesday, 12 May Wed, 12 May

4:59 a.m.

On Mon, 2004-05-10 at 18:12, Will Cohen wrote:

...

However, one drawback of this statistical call grap information is one ends up with a call graph forest rather than a call graph tree. The sampling will miss the lone call that causes a lot of work unless the code happens to walk far enough up the stack. Does the sysprof stack tracer you use walked the entire user stack each time it takes a sample?

It traces up to 256 addresses, which is normally enough room for the entire stack. Usually sysprof reports something like 96% time spent in _libc_start_main(). I assume the remaining 4% is accounting for applications that have either a very deep stack (> 256) causing another function to appear as the first call, or are just weird in other ways. Søren

Alexander Larsson

Tuesday, 11 May Tue, 11 May

8:31 a.m.

On Wed, 2004-05-05 at 23:11, Will Cohen wrote:

...

For the last few years I've been working on and off on the performance of Nautilus. At this time Nautilus is a lot better performance-wise, but we still don't fully grasp the performance properties of it. I'll try to give you some idea of what i've been doing.

...

Nautilus has had various problems. One important one is the time it takes to open a new window, other ones are startup time, time to read a large directory, and total memory use.

...

What metrics were used to gauged the effect of software changes on performance?

In general, the slowness have been on a scale that you could use a handwatch to time it (for e.g. directory load), and at other times I've put in printfs() to print time() at some specific points in the app. Often you can see the performance increase by just using the app.

...

What performance tools have people used so far to identify performance problems with desktop applications?

I use a variety of tools: * printfs in strategic places to try to figure out what gets called, when it gets called, and how long it takes. * Sprinkle "access ("doing <foo>", 0) in the code, then run the app under strace -tt, which will show you what sort of i/o is done. You can look at the access lines in the log to see what is happening at the code level, including timestamps. * I've used the sampling profiler in eazel-tools in gnome cvs. This is a sampling profiler that you LD_PRELOAD into your app. Its not perfect, but it gives you at least some data when used with shared libs (as opposed to gprof). It gives gprof output. * KCachegrind. I've only used this a bit, the performance of Nautilus while running it is pretty poor, so its hard to use. * memprof. This is an excellent app for tracking down leaks and large users of memory.

...

How well or poorly did the performance tools work in identifying the performance problem?

While they did help, they are not as useful as I would like. They require a lot of work to set up, and the presentation/data-mining features are pretty limited. In general, debugging desktop apps is quite different from lowlevel, non-interactive apps. First of all they are generally much more structurally complex, relying on many shared libraries, several processes with various sorts of IPC and lots of file I/O and user input. Secondly, the typical call traces are very deep (60 or even 80 frames are not uncommon), and often highly recursive. A typical backtrace involves several signal emissions, where each emission is on the order of 5 function calls deep (just for the signal emission code, not the called function). These functions are also typically the same, so they intermingle the stack traces. Take this simplified backtrace for instance: A: signal_a_handler () - foo(); return TRUE; g_signal_emit () - locate callback, call caller_a() - g_signal_emit (object, "signal_a", data) B: signal_b_handler () - bar(); return TRUE; g_signal_emit () - locate callback, call caller_b() - g_signal_emit (object, "signal_b", data) When looking at a profile for code such as this, what you see is that caller_a() uses a lot of time, but when you go into it to see what it does, you end up looking at g_signal_emit, which gets called by lots of other places like B, so its very hard to figure out what of that is actually from the A call. It gets even worse in the (very common) situation of the signal handler itself emitting a signal. This creates a mutual recursion into the g_signal_emit() function similar to the a+b case above: signal_b_handler() g_signal_emit ("signal b") signal_a_handler () g_signal_emit ("signal a") caller_a() When stepping into the g_signal_emit from signal_a_handler it seems like that calls signal_a_handler again, since thats another child of g_signal_emit. Profilers just don't handle this very well. Here is a couple of issues I have with current profiling tools: * They have no way of profiling i/o and seeks. A lot of our problems is due to reading to many files, reading files to often, or paging in data/code. Current profilers just don't show this at all. * Little support for tracking issues wrt IPC calls between different processes. Whether this be X inter-client calls for e.g. DnD, or Corba calls to some object. * Poor visialization support of the data, especially with mutually recursive calls as described above. Generally, all of the fixes I've done has been of the type "Don't do this incredibly stupid thing". Whether that has been O(n^2) algorithm due to treating a list as an arrays in loops, reading the same file over and over again, or something else. I've never *once* had to count cycles in some hot function or anything like that, Its always about adding a cache, doing something in a different way, or just avoid doing the expensive stupid thing. However, the stupidities are burried in lots and lots of code, and finding them in all the data a profiler spews out is the real hard part.

...

Were benchmarks used to test performance of desktop applications? If so, what type of benchmarks were used (e.g. micro benchmarks or measuring the amount of time required to do something in an application program)?

Typically not. They were all ad-hoc testing by the developer as part of trying to track down some specific slowness.

...

Were the benchmarks runnable in batch mode without human assistance?

Never. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Alexander Larsson Red Hat, Inc alexl(a)redhat.com alla(a)lysator.liu.se He's a superhumanly strong flyboy senator on the wrong side of the law. She's a time-travelling motormouth detective with only herself to blame. They fight crime!

Soren Pedersen

Wednesday, 12 May Wed, 12 May

4:50 a.m.

On Tue, 2004-05-11 at 15:31, Alexander Larsson wrote:

...

In general, debugging desktop apps is quite different from lowlevel, non-interactive apps. First of all they are generally much more structurally complex, relying on many shared libraries, several processes with various sorts of IPC and lots of file I/O and user input. Secondly, the typical call traces are very deep (60 or even 80 frames are not uncommon), and often highly recursive. A typical backtrace involves several signal emissions, where each emission is on the order of 5 function calls deep (just for the signal emission code, not the called function). These functions are also typically the same, so they intermingle the stack traces. Take this simplified backtrace for instance: A: signal_a_handler () - foo(); return TRUE; g_signal_emit () - locate callback, call caller_a() - g_signal_emit (object, "signal_a", data) B: signal_b_handler () - bar(); return TRUE; g_signal_emit () - locate callback, call caller_b() - g_signal_emit (object, "signal_b", data) When looking at a profile for code such as this, what you see is that caller_a() uses a lot of time, but when you go into it to see what it does, you end up looking at g_signal_emit, which gets called by lots of other places like B, so its very hard to figure out what of that is actually from the A call. It gets even worse in the (very common) situation of the signal handler itself emitting a signal. This creates a mutual recursion into the g_signal_emit() function similar to the a+b case above: signal_b_handler() g_signal_emit ("signal b") signal_a_handler () g_signal_emit ("signal a") caller_a() When stepping into the g_signal_emit from signal_a_handler it seems like that calls signal_a_handler again, since thats another child of g_signal_emit. Profilers just don't handle this very well.

If you haven't already, I'd suggest looking at either the speedprof profiler that comes with memprof (you'll have to use cvs HEAD) or the sysprof profiler. They both have a visualization that helps with this problem a lot. For example the stack trace above: signal_b_handler() g_signal_emit ("signal b") signal_a_handler () g_signal_emit ("signal a") caller_a() will be visualized as this tree: Self Total caller_a() 0% 100% g_signal_emit () 0% 100% signal_a_handler() 0% 100% signal_b_handler() 100% 100% So all the recursions through g_signal_emit() are combined and shown in one list. In other words you get a break-down even for recursive data. If either of the signal handlers were to emit another signal, the visualization would be the same, except that the numbers would be different.

...

Here is a couple of issues I have with current profiling tools: * They have no way of profiling i/o and seeks. A lot of our problems is due to reading to many files, reading files to often, or paging in data/code. Current profilers just don't show this at all.

I agree. What would be really nice is we could get the kernel for each disk access to provide this information: - what process caused it to happen - what is the stack trace of the process when it happened - what kind of disk access - page fault: what address faulted - read: what file was read - other kinds of disk access With that information it would be possible to see what parts of an application are responsible for disk access. Also minor page faults would be interesting to know about for startup time, because they represent the worst case page fault-wise (they *could* have been major faults if a different set of pages were in memory). Søren

Marius Andreiana

2 a.m.

On Thu, 2004-05-06 at 00:11, Will Cohen wrote:

...

What specific performance problems have people observed so far in the desktop?

when HDD is used by an application (copying a large file, updatedb...), all the applications respond slow. The preemptible kernel patch should solve this, but it's not enabled in FC2. -- Marius Andreiana Galuna - Solutii Linux in Romania http://www.galuna.ro

William Cohen

4:26 p.m.

Marius Andreiana wrote:

...

On Thu, 2004-05-06 at 00:11, Will Cohen wrote: >What specific performance problems have people observed so far in the >desktop? when HDD is used by an application (copying a large file, updatedb...), all the applications respond slow. The preemptible kernel patch should solve this, but it's not enabled in FC2.

Are you sure that the preemptible kernel fixes this problem. I thought this was related to application code getting swapped out to make more room for disk cache. http://www.spinics.net/lists/kernel/msg262920.html Anyway the slow response does make the desktop less pleasant to use. -Will

Féliciano Matias

7:38 p.m.

Le mer 12/05/2004 à 09:00, Marius Andreiana a écrit :

...

On Thu, 2004-05-06 at 00:11, Will Cohen wrote: > What specific performance problems have people observed so far in the > desktop? when HDD is used by an application (copying a large file, updatedb...), all the applications respond slow. The preemptible kernel patch should solve this, but it's not enabled in FC2.

Thought about preemptible kernel patch : http://kerneltrap.org/node/view/2702

...

-- Marius Andreiana Galuna - Solutii Linux in Romania http://www.galuna.ro

Warren Togami

Thursday, 13 May Thu, 13 May

3:28 a.m.

Marius Andreiana wrote:

...

Paraphrased from what Arjan van de Ven told me regarding kernel preempt. Arjan said, "preempt doesn't help for the desktop (1ms to 0.8ms latency you don't notice *at all*) but that 4K stacks does make a big dent" Warren said, "and overall runtime would be slower with preempt because the on-CPU cache is blown away more often" Arjan said, "and it generates worse IO patterns -> slower disk io" both aren't good for performance; worse disk IO means actually longer feeled latency but really, humans don't notice latency < 10ms or so. so anything below that is just fudzing with noise however disk IO and things like the VM before 4K stacks can cause delays (not latencys) far longer than that Warren Togami wtogami(a)redhat.com

William Cohen

Wednesday, 12 May Wed, 12 May

4:57 p.m.

I would like to thank people for the response to the thread on performance tuning the Fedora Desktop. The responses have been very helpful. Below is what I have written up as a result of the comments I got. APPLICATIONS There are a large number of packages that could be considered part of the desktop. However, the most critical ones appear to be the mail clients, web browsers, word processors, and window manager. There were several comments about up2date/rpm related software also. A short list of important apps rpms: evolution mozilla ooffice gnome-terminal gnome-* *FIXME* would like to narrow this down to a list of executables *FIXME* PERFORMANCE PROBLEMS The performance issues with Desktop applications differs signficantly from the tradition single-threaded, batch-oriented benchmarks, like SPEC CPU2000. The desktop applications often have multiple threads. The footprints of the desktop applications are larger due to the number of shared libraries being used to do GUI and network related operations. Also users typically have several desktop applications open concurrently. As a result, Cache and paging are more likely to be issue with desktop applications. Optimizing for code size may be boost performance more than trying to use aggressive compiler optimizations that increase program size. Latency is more of an issue than throughput for the desktop applications. How long it takes for the action for a mouse click to be observed is more important than the machine can do a certain number of the actions in a given period of time. Interactivity is the issue; ideally, the actions should be below a person's threshold of detecting, for example a new window appears to pop up instantly but in reality it may take 10millisecond to do. Some latency issues may be outside the applications control. For example performance limited by DNS lookups to convert URL address into IP number. Actual rate that network provides data to web browser affect the perceived performance. With the exception of eliminating the dependency on the outside application, nothing on the local machine is going to improve performance. METRICS Unfortunately, many people's metrics for desktop applications were literally eyeballed, click a menu item see how long it takes for the result to occur. This is difficult to automate and script. We really want benchmarks where at the very least the actual measure does not require the person to measure by hand and then transcribe the resulting measurement to a machine readable format. Better still would an automated test like the performance tracking for GCC, http://people.redhat.com/dnovillo/spec2000/. This would make it easier to see when a code change fixed a performance problem (or introduced a performance problem). Wall clock time obtained from strategic printf of gettimeofday and strace were uses to get timing informatoin. Memory footprint of desktop application another metric of interest. The larger the memory footprint the more time it takes to start up. Also large memory footprint application are likely to have more cache misses and page fault, slowing the execution of the program. File accesses can also affect performance. When a desktop application is initially started shared library files need to be opened. Additionally, other file with preference information may be opened. In the case of Nautilus it may need to examine all the files in the directory it is browsing. Restructing the code to reduce the number of file access could improve the performance of applications. Round trip for X protocol affect the latency of operations. For example client sends X server message to do operation and acknowledge when the operation is complete can hurt performance over channels with long latency. This type of problem affected the graphical installer for a distribution of linux for s390. Every screen update required a round trip, which tripled the install time because a round trip latency was encountered for every update of the progress bar. PERFORMANCE TUNING TOOLS The data collectedy by the performance tools does not need to be exact, in many cases tools are being used to identify areas of code that will make a difference in performance. Whether 30% or 33% percent of the time is spent in a single rountine is not that big a difference. Currently, developers are looking for the things that have large impact on performance. Tools that significantly slow the execution of the code change the interactions between the user and the code. The instrumention can also change the interactions between instrumented and uninstrumented code. The slowness of the code and the perturbations of the system make detailed instrumentation less attractive to developers. Some of the optimization work is finding inappropriate algorithms or data structures used for particular tasks. Knowing the context that a function is called is important, so call graph information is essential. Call graphs for desktop applications are complicated due to recursive code, signal handling, and co-routines. Some GUI developer have developed profiling tools. For example sysprof by Soeren Sandmann for Gnome (http://www.daimi.au.dk/~sandmann/sysprof-0.02.tar.gz) and Mozilla performance tools http://www.mozilla.org/performance/. COURSE OF ACTION 1) Develop Desktop benchmarks Need to have some benchmarks to determine performance. Very incomplete list of suggested benchmarks: -gdm login to setup desktop -menu select desktop program _____ to time desktop program ready to use -cat text file to xterm Should have clear procedures for each so the results can be generated by anyone and compared. It would be a bonus if can run benchmark from commandline, so the data collection can be automated. 2) Get baseline metrics on benchmarks Have baseline to determine whether code changes are increasing or decreasing performance. Allow us to avoid the nebulous "feels faster" and "feels slower". Also use this data to find out where the most significant problems are, for example important application take ten minutes to start. 3) Improve performance monitoring/tuning tools and scripts *FIXME* make the tuning tools information more concrete *FIXME* a) Need trigger mechanism to start and stop profiling data collection on certain conditions or events. For example start profiling when menu item selected and stop profiling when action complete. This would avoid interesting samples getting lost in the sea boring long term sampling. b) Better tools to map out memory foot print. Reducing memory use is likely to help performance by reducing time to load application and related shared libraries. Related to this is consider tools to reorder functions in code to get better locality (like grope) and produce hot and cold code. c) Easier means of navigating performance data. For example break down of time spent in parent and children. Maybe pull some of the sysprof data analysis into oprofile data analysis. Also maybe use the OProfile plug-in for eclipse to visualize data. d) Take advantage of the uses of shared libraries in code to insert instrumentation between the application and the function in the shared library when the library is load/linked in. -Will

John Williams

5:22 p.m.

On Wed, 2004-05-12 at 17:57 -0400, Will Cohen wrote: Lots of very encouraging stuff. Thanks Will. :-)

...

METRICS Unfortunately, many people's metrics for desktop applications were literally eyeballed, click a menu item see how long it takes for the result to occur. This is difficult to automate and script. We really

A small comment: the only really meaningful metric from the user's point of view is "does it take too long?". This is a binary variable. It is of course totally subjective, but _that is the point_ --- "objective" considerations are not meaningful to users. What I am trying to say is that (I humbly suggest) you face two problems: first finding the most pressing problems; and then fixing them. Objective metrics help with the second, but not the first. Examples: 1) Clicking on the Nautilus desktop menu to open a terminal results in a noticeable delay before the window has appeared on the screen. For such a simple application it should be instantaneous (my rig is a 2,.4GHz Athlon with 767MB RAM). 2) Opening a PDF file with gv results in an instantaneous appearance of the window; using ggv takes about five seconds. 3) A noticeable delay starting gedit (same reasoning as per terminal). I know that these "simple" apps hide hidden complexity due to their gnomeyness, but again, that is irrelevant to users. I hope the above is useful. Please don't read it as bitching. I _love_ Fedora and GNOME, and I want to express my gratitude to those of you who have provided me with such an enjoyable computing environment. cheers, John -- ICQ: 261810463 AIM: johnfrombluff AOL: johnfrombluff MSN: johnwilliamsFromBluff(a)hotmail.com Yahoo: JohnFromBluff Jabber: jwilliamsFromBluff(a)jabber.org

Julien Olivier

Thursday, 13 May Thu, 13 May

3:35 a.m.

...

1) Clicking on the Nautilus desktop menu to open a terminal results in a noticeable delay before the window has appeared on the screen. For such a simple application it should be instantaneous (my rig is a 2,.4GHz Athlon with 767MB RAM).

Maybe a bit off-topic, but the lack of launch feedback when opening a terminal via the desktop pop-up menu (as opposed as using System Tools -> Terminal) can make it *appear* even slower, or even broken. And the same thing for folders opened by nautilus: no launch feed-back makes it feel broken sometimes... -- Julien Olivier <julo(a)altern.org>

Warren Togami

5:17 a.m.

Julien Olivier wrote:

...

>1) Clicking on the Nautilus desktop menu to open a terminal results in a >noticeable delay before the window has appeared on the screen. For such >a simple application it should be instantaneous (my rig is a 2,.4GHz >Athlon with 767MB RAM). > Maybe a bit off-topic, but the lack of launch feedback when opening a terminal via the desktop pop-up menu (as opposed as using System Tools -> Terminal) can make it *appear* even slower, or even broken. And the same thing for folders opened by nautilus: no launch feed-back makes it feel broken sometimes...

I addressed this in my initial rant in this thread too. We need more fine granularity in application feedback control across the board. Warren

William Cohen

8:22 a.m.

John Williams wrote:

...

On Wed, 2004-05-12 at 17:57 -0400, Will Cohen wrote: Lots of very encouraging stuff. Thanks Will. :-) >METRICS > >Unfortunately, many people's metrics for desktop applications were >literally eyeballed, click a menu item see how long it takes for the >result to occur. This is difficult to automate and script. We really A small comment: the only really meaningful metric from the user's point of view is "does it take too long?". This is a binary variable. It is of course totally subjective, but _that is the point_ --- "objective" considerations are not meaningful to users. What I am trying to say is that (I humbly suggest) you face two problems: first finding the most pressing problems; and then fixing them. Objective metrics help with the second, but not the first.

True users only cares about "does it take too long?" However, developers need to know why it takes too long. There are a lot of possible causes. Just telling the developers this is too slow will point out the problem but does suggest any fixes to correct the problem. The metrics were for the benefit of the developers. Reading though my write up again I realize that I don't really have timing listed clearly as a metric. I will make sure that is explicit in the revision. Wall clock time is what we are trying to reduce. Thanks for the comments. -Will

...

Examples: 1) Clicking on the Nautilus desktop menu to open a terminal results in a noticeable delay before the window has appeared on the screen. For such a simple application it should be instantaneous (my rig is a 2,.4GHz Athlon with 767MB RAM). 2) Opening a PDF file with gv results in an instantaneous appearance of the window; using ggv takes about five seconds. 3) A noticeable delay starting gedit (same reasoning as per terminal). I know that these "simple" apps hide hidden complexity due to their gnomeyness, but again, that is irrelevant to users. I hope the above is useful. Please don't read it as bitching. I _love_ Fedora and GNOME, and I want to express my gratitude to those of you who have provided me with such an enjoyable computing environment. cheers, John

Panu Matilainen

7:37 a.m.

On Wed, 12 May 2004, Will Cohen wrote: -cat text file to xterm I wouldn't worry so much about xterm, gnome-terminal and konsole are far worse in terms of performance. I recently ran a couple of tests (which is why I noticed the xterm mark), just a relatively noisy rpmbuild of an application. On my IBM T40 the build times on otherwise idle box, and completely reproducable, give or take couple of seconds: xterm: ~1m 20s gnome-terminal: ~1m 40s konsole: ~1m 40s Building on a virtual console, redirecting output to /dev/null or a file were basically ~1:20 all. That's an awfully lot of time wasted waiting for software to build which lot of us do all the time :-/ For cat it's much more dramatic (obviously): time cat /usr/share/mime-info/gnome-vfs.keys on virtual console:~0.5s xterm: ~3.5s gnome-terminal: ~6.5s konsole: ~10.5s (not only is it slow but also corrupts the terminal leaving garbage on screen) Tests done on RHL 9'ish box, FWIW. - Panu -

Panu Matilainen

7:51 a.m.

On Thu, 13 May 2004, Panu Matilainen wrote:

...

On Wed, 12 May 2004, Will Cohen wrote: -cat text file to xterm I wouldn't worry so much about xterm, gnome-terminal and konsole are far worse in terms of performance.

Erm .. take that back - making them use same exact font xterm becomes actually slowest of them all. Still, it's sickening to think how much time gets wasted by printing pretty AA-fonts when nobody's actually looking (eg when compiling software) - Panu -

Warren Togami

8:24 a.m.

Panu Matilainen wrote:

...

I recently ran a couple of tests (which is why I noticed the xterm mark), just a relatively noisy rpmbuild of an application. On my IBM T40 the build times on otherwise idle box, and completely reproducable, give or take couple of seconds: xterm: ~1m 20s gnome-terminal: ~1m 40s konsole: ~1m 40s Building on a virtual console, redirecting output to /dev/null or a file were basically ~1:20 all. That's an awfully lot of time wasted waiting for software to build which lot of us do all the time :-/ For cat it's much more dramatic (obviously): time cat /usr/share/mime-info/gnome-vfs.keys on virtual console:~0.5s xterm: ~3.5s gnome-terminal: ~6.5s konsole: ~10.5s (not only is it slow but also corrupts the terminal leaving garbage on screen) Tests done on RHL 9'ish box, FWIW. - Panu -

In FC2 things have changed for the worse for gnome-terminal performance, where it has become far worse than both konsole and xterm. A GNOME developer explained to me that was the trade-off necessary in making gnome-terminal able to display unicode characters with pango. I do admit it is nice to have that ability, and it is awesome to see CJK characters working in gnome-terminal, but at the same time I wish it were faster. Very often I am forced to minimize my gnome-terminal sessions in order to prevent 100% CPU usage while using remote ssh sessions or building something locally. The bottleneck is always my terminal CPU usage. =( Warren Togami wtogami(a)redhat.com

Owen Taylor

9:47 a.m.

On Thu, 2004-05-13 at 09:24, Warren Togami wrote:

...

Except for a rebuild FC1 and FC2 have *exactly* the same version of VTE... Regards, Owen

7286

days inactive

7294

days old

desktop@lists.fedoraproject.org

Manage subscription

34 comments

18 participants

tags (0)

participants (18)

Alexander Larsson
Dan Williams
Daniel Veillard
David Holden
Ed Mack
Féliciano Matias
Havoc Pennington
John Williams
Julien Olivier
kardarisk＠upnet.gr
Marius Andreiana
Mark McLoughlin
Owen Taylor
Panu Matilainen
rada and gus
Soren Pedersen
Warren Togami
William Cohen

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

Performance tuning the Fedora Desktop