Problem description ===================
Currently, the time to boot the Linux desktop from the point where the power switch is turned on, to the point where the user can start doing work is roughly two minutes.
During that time, there are basically three resources being used: the hard disk, the CPU, and the natural latency of external systems - the time it takes a monitor to respond to a DDC probe, the time it takes for the system to get an IP via DCHP, and so forth.
Ideally, system boot would involve a 3-4 second sequential read of around 100 megabytes of data from the hard disk, CPU utilization would be parallelized with that, and all queries on external systems would be asynchronous ... startup continues and once the external system responds, the system state is updated. Plausibly the user could start work under 10 seconds on this ideal system.
The challenge is to create a single poster showing graphically what is going on during the boot, what is the utilization of resources, how the current boot differs from the ideal world of 100% disk and CPU utilization, and thus, where are the opportunities for optimization.
Graphical Ideas ===============
Presumably, the main display would be a timeline with wall clock time on the horizontal (or vertical) axis. Then, you'd have a tree with lines representing the processes running at a particular time.
The processes lines would have attributes indicating state - perhaps red when waiting for disk, green when running, dotted when sleeping or blocking on IO. Extra lines might be added to the graph to indicate dependencies between processes. If a process calls waitpid() on another process, a dotted line could be added connecting the end of the other process back to the first process. Similar lines could be added when a write from one process causes another process that was waiting in a read() or select() to wake up.
While many thousands of processes are run during system boot, this doesn't mean the graph has to have vertical space for all of them ... vertical space is basically determined by the number of processes that are running at once.
Parallel to the display of processes would be a display of overall CPU and disk utilization. CPU utilization on a single processor system is pretty straightforward... either the CPU is running at a point in time or it isn't. Considerations like memory bandwidth, processor stalls, and so forth matter when optimizing particular algorithms but an initial guess (that the poster would confirm or deny) is that CPU is not a significant bottleneck for system start.
Disk utilization is more complex, because of the huge cost of seeks; while modern drives can easily read 30-40 megabytes/second a seek still takes 5-10ms. Whether or not the drive is active tells little about how well we are doing using it. In addition, there is a significantly long pipeline of requests to the disk, and seeks aren't even completely predictable because the drive may reorder read requests.
But a simple display that might be sufficient is graph of instantaneous bandwidth (averaged over a small period of time) being achieved from the disk drive. If processes are red (waiting on the drive) and the bandwidth is low, then there is a problem with too much seeking that needs to be addressed.
You'd also want text in the poster; process names are one obvious textual annotation that should be easy to obtain. It might also be interested for processes to be able to provide extra annotations; for the X server to advertise that it is waiting for a DDC probe, and so forth.
Implementation thoughts =======================
It should be possible to start with a limited set of easily collected data and already get a useful picture. Useful data collection could be as simple as taking a snapshot of the data that the "top" program displays a few times a second during boot. That already gives you a list of the running processes, their states, and some statistics about global system load.
Moving beyond that would probably involve instrumenting the kernel to give notification of process start and termination (possibly providing times(2) style information on termination) to provide visibility for processes that run for too short a time to be picked up by polling. Better kernel reporting of disk utilization might also be needed.
It might be possible to employ existing tools like oprofile, however, the level of detail oprofile provides is really overkill... compressing 2 minutes of runtime involving 1000 processes onto a single poster doesn't really allow worrying about what code is getting run by a process at a particular point.
Obviously, one challenge of any profiling tool is to avoid affecting the collected data. Since CPU and memory don't seem to be bottlenecks, while disk definitely is a bottleneck, a low impact implementation might be a profiling daemon that started early in the boot process and accumulated information to be queried and analyzed after the boot finishes.
While producing a single poster would already be enormously useful, the ability to recreate the poster on any system at any point would be multiply times more so. So, changes to system components that can be gotten into the upstream projects and that can be activated at runtime rather than needing to be conditionally compiled in are best.
Motivation ==========
I think this project would be a lot of fun to work on; you'd learn a lot about how system boot up works and about performance measurement. And beyond that there is a significant design and visualization element in figuring out how to display the collected data. It would also make a good small-scale academic project.
But to provide a little extra motivation beyond that, if people pick this up and come up with interesting results, I'll (personally) pay for up to 3 posters of up to 4' x 6' to be professionally printed and laminated. I'll be flexible about how that works ... if multiple people collaborate on one design, they can get a copy each of that single design.
- Owen Taylor
lør, 13.11.2004 kl. 18.18 skrev Owen Taylor:
Problem description
Currently, the time to boot the Linux desktop from the point where the power switch is turned on, to the point where the user can start doing work is roughly two minutes.
During that time, there are basically three resources being used: the hard disk, the CPU, and the natural latency of external systems - the time it takes a monitor to respond to a DDC probe, the time it takes for the system to get an IP via DCHP, and so forth.
Ideally, system boot would involve a 3-4 second sequential read of around 100 megabytes of data from the hard disk, CPU utilization would be parallelized with that, and all queries on external systems would be asynchronous ... startup continues and once the external system responds, the system state is updated. Plausibly the user could start work under 10 seconds on this ideal system.
The challenge is to create a single poster showing graphically what is going on during the boot, what is the utilization of resources, how the current boot differs from the ideal world of 100% disk and CPU utilization, and thus, where are the opportunities for optimization.
Graphical Ideas
Presumably, the main display would be a timeline with wall clock time on the horizontal (or vertical) axis. Then, you'd have a tree with lines representing the processes running at a particular time.
The processes lines would have attributes indicating state - perhaps red when waiting for disk, green when running, dotted when sleeping or blocking on IO. Extra lines might be added to the graph to indicate dependencies between processes. If a process calls waitpid() on another process, a dotted line could be added connecting the end of the other process back to the first process. Similar lines could be added when a write from one process causes another process that was waiting in a read() or select() to wake up.
While many thousands of processes are run during system boot, this doesn't mean the graph has to have vertical space for all of them ... vertical space is basically determined by the number of processes that are running at once.
Parallel to the display of processes would be a display of overall CPU and disk utilization. CPU utilization on a single processor system is pretty straightforward... either the CPU is running at a point in time or it isn't. Considerations like memory bandwidth, processor stalls, and so forth matter when optimizing particular algorithms but an initial guess (that the poster would confirm or deny) is that CPU is not a significant bottleneck for system start.
Disk utilization is more complex, because of the huge cost of seeks; while modern drives can easily read 30-40 megabytes/second a seek still takes 5-10ms. Whether or not the drive is active tells little about how well we are doing using it. In addition, there is a significantly long pipeline of requests to the disk, and seeks aren't even completely predictable because the drive may reorder read requests.
But a simple display that might be sufficient is graph of instantaneous bandwidth (averaged over a small period of time) being achieved from the disk drive. If processes are red (waiting on the drive) and the bandwidth is low, then there is a problem with too much seeking that needs to be addressed.
You'd also want text in the poster; process names are one obvious textual annotation that should be easy to obtain. It might also be interested for processes to be able to provide extra annotations; for the X server to advertise that it is waiting for a DDC probe, and so forth.
Implementation thoughts
It should be possible to start with a limited set of easily collected data and already get a useful picture. Useful data collection could be as simple as taking a snapshot of the data that the "top" program displays a few times a second during boot. That already gives you a list of the running processes, their states, and some statistics about global system load.
Moving beyond that would probably involve instrumenting the kernel to give notification of process start and termination (possibly providing times(2) style information on termination) to provide visibility for processes that run for too short a time to be picked up by polling. Better kernel reporting of disk utilization might also be needed.
It might be possible to employ existing tools like oprofile, however, the level of detail oprofile provides is really overkill... compressing 2 minutes of runtime involving 1000 processes onto a single poster doesn't really allow worrying about what code is getting run by a process at a particular point.
Obviously, one challenge of any profiling tool is to avoid affecting the collected data. Since CPU and memory don't seem to be bottlenecks, while disk definitely is a bottleneck, a low impact implementation might be a profiling daemon that started early in the boot process and accumulated information to be queried and analyzed after the boot finishes.
While producing a single poster would already be enormously useful, the ability to recreate the poster on any system at any point would be multiply times more so. So, changes to system components that can be gotten into the upstream projects and that can be activated at runtime rather than needing to be conditionally compiled in are best.
Motivation
I think this project would be a lot of fun to work on; you'd learn a lot about how system boot up works and about performance measurement. And beyond that there is a significant design and visualization element in figuring out how to display the collected data. It would also make a good small-scale academic project.
But to provide a little extra motivation beyond that, if people pick this up and come up with interesting results, I'll (personally) pay for up to 3 posters of up to 4' x 6' to be professionally printed and laminated. I'll be flexible about how that works ... if multiple people collaborate on one design, they can get a copy each of that single design.
- Owen Taylor
-- fedora-devel-list mailing list fedora-devel-list@redhat.com http://www.redhat.com/mailman/listinfo/fedora-devel-list
Great to see that things are finally moving in reducing boot time! Good initiative :)
Kyrre
On Sat, 2004-11-13 at 12:18 -0500, Owen Taylor wrote:
Ideally, system boot would involve a 3-4 second sequential read of around 100 megabytes of data from the hard disk,
make that 7 seconds.. Note: I did this experiment about a year ago, during boot first read everything into cache and then do the rest of boot basically without disk IO (there are some writes but that's async). The total time to boot did not decrease ......
CPU utilization would be parallelized with that, and all queries on external systems would be asynchronous ... startup continues and once the external system responds, the system state is updated. Plausibly the user could start work under 10 seconds on this ideal system.
given the 7 second disk read time... 10 seconds is a bit unrealistic. One of the critical paths will be getting an IP address and mounting the /home dir over nfs... ethernet negotiation can easily be 10 seconds already with gige, and DHCP is depending on that to complete before it can get a lease.
One of the things we should investigate is just reduce the shere number of different files that get opened.... its about 11000 iirc right now.
Arjan van de Ven writes:
On Sat, 2004-11-13 at 12:18 -0500, Owen Taylor wrote:
CPU utilization would be parallelized with that, and all queries on external systems would be asynchronous ... startup continues and once the external system responds, the system state is updated. Plausibly the user could start work under 10 seconds on this ideal system.
given the 7 second disk read time... 10 seconds is a bit unrealistic. One of the critical paths will be getting an IP address and mounting the /home dir over nfs... ethernet negotiation can easily be 10 seconds already with gige, and DHCP is depending on that to complete before it can get a lease.
Add 30 seconds if you're booting off aic79xx.o
Why does loading that gawd-awful microcode take so long?
I don't notice ANY delay when booting XP on the same box.
It shouldn't take more than one or two seconds to initialize the SCSI card.
Sam Varshavchik wrote:
Arjan van de Ven writes:
Add 30 seconds if you're booting off aic79xx.o
Why does loading that gawd-awful microcode take so long?
I don't notice ANY delay when booting XP on the same box.
It shouldn't take more than one or two seconds to initialize the SCSI card.
Same with mptscsi / mptbase here. At least 30 (x2 - once in the bios post and again when the kernel driver loads)
-Mark
Mark Heslep writes:
Sam Varshavchik wrote:
Arjan van de Ven writes:
Add 30 seconds if you're booting off aic79xx.o
Why does loading that gawd-awful microcode take so long?
I don't notice ANY delay when booting XP on the same box.
It shouldn't take more than one or two seconds to initialize the SCSI card.
Same with mptscsi / mptbase here. At least 30 (x2 - once in the bios post and again when the kernel driver loads)
Well, BIOS POST you can't do anything about.
But XP initializes an Adaptec SCSI controller in, at most, 1-2 seconds.
No way did Microsoft write the SCSI driver code themselves. It had to have come from Adaptec, at some point down the line.
So, if Adaptec is, supposedly, gung-ho about Linux these days, why can't they produce a version of aic7xxx.ko that doesn't give you plenty of time to brew a pot of coffee, before it's done whatever the hell it's doing?
Sam Varshavchik wrote:
Mark Heslep writes:
Sam Varshavchik wrote:
Arjan van de Ven writes:
Add 30 seconds if you're booting off aic79xx.o
Why does loading that gawd-awful microcode take so long?
I don't notice ANY delay when booting XP on the same box.
It shouldn't take more than one or two seconds to initialize the SCSI card.
Same with mptscsi / mptbase here. At least 30 (x2 - once in the bios post and again when the kernel driver loads)
Well, BIOS POST you can't do anything about.
Unless we can use Linux BIOS. I've got a couple scsi boxes that are candidates.
But XP initializes an Adaptec SCSI controller in, at most, 1-2 seconds.
No way did Microsoft write the SCSI driver code themselves. It had to have come from Adaptec, at some point down the line.
So, if Adaptec is, supposedly, gung-ho about Linux these days, why can't they produce a version of aic7xxx.ko that doesn't give you plenty of time to brew a pot of coffee, before it's done whatever the hell it's doing?
Dont know if LSI Logic / Symbios is as eager to please.
On Sat, 2004-11-13 at 18:35 +0100, Arjan van de Ven wrote:
On Sat, 2004-11-13 at 12:18 -0500, Owen Taylor wrote:
Ideally, system boot would involve a 3-4 second sequential read of around 100 megabytes of data from the hard disk,
make that 7 seconds.. Note: I did this experiment about a year ago, during boot first read everything into cache and then do the rest of boot basically without disk IO (there are some writes but that's async). The total time to boot did not decrease ......
That experiment was one of the things that convinced me that getting a good visualization of the critical path is crucial to actually speeding up the boot process.
CPU utilization would be parallelized with that, and all queries on external systems would be asynchronous ... startup continues and once the external system responds, the system state is updated. Plausibly the user could start work under 10 seconds on this ideal system.
given the 7 second disk read time... 10 seconds is a bit unrealistic. One of the critical paths will be getting an IP address and mounting the /home dir over nfs... ethernet negotiation can easily be 10 seconds already with gige, and DHCP is depending on that to complete before it can get a lease.
I'd agree 10 seconds isn't a realistic target; my point was more that if we have only 7 seconds of disk access, and, say, 10 seconds of computation to do, and maybe 15 seconds too negotiate gige, get a dhcp lease, and mount your homedir (if relevant), then the time between 15 seconds and 2 minutes needs to be investigated in terms of dependencies.
One of the things we should investigate is just reduce the shere number of different files that get opened.... its about 11000 iirc right now.
If we're actually spending all our time waiting on a DHCP lease, or for probing serial mice to timeout, then 11000 opens don't matter a whole lot. Not that eliminating the opens isn't a good idea...
Regards, Owen
On Sunday 14 November 2004 01:35, Arjan van de Ven wrote:
One of the things we should investigate is just reduce the shere number of different files that get opened.... its about 11000 iirc right now.
Random thoughts ...
sysconfig/ may need revamping to bring configuration together. Maybe write a sysconfig compiler script that compiles it all into one file. (I know half of it has nothing to do with SysV stuff, but that's another discussion.)
For experiment sakes, it would be nice to see other init alternatives come in and make comparisons using the Boot Poster. And, it would be nice to have compatibility wrappers around chkconfig and friends to provide interfaces folks are used to.
I'm still leaning towards Felix's minit as a potential alternative: http://www.fefe.de/minit/minit-linux-kongress2004.pdf
But, it'll take some work getting it up to where system-config, editing sysconfig/ files, and using familiar script utils are compatible.
Although most people don't care now a days, the mem footprint is a lot smaller on many of these alternatives. Might be good hack a config preload hook that brought in a binary config file making subsequent boot extremely fast...
Hmm, I think I'll poke at this with a stick for awhile and see what bubbles up. No promises... ;)
On Sunday 14 November 2004 20:38, Jeff Pitman wrote:
Hmm, I think I'll poke at this with a stick for awhile and see what bubbles up. No promises... ;)
And, yes, I've read 99540 and know about LSB implications. It's an alternative. (Too bad alternatives was stuck with chkconfig, because now it's a little difficult to provide an alternative to chkconfig.)
One thing to remember ... XFree86 became X.org; mp3 was dropped; and many more visible changes have been made over the years. Something that changes under the covers making things run faster wouldn't be a bad idea. :D (Course, if suspend worked reliably we wouldn't have this digression--I mean, discussion.)
First step, though, is to experiment as an alternative.
On Sun, 2004-11-14 at 20:38 +0800, Jeff Pitman wrote:
I'm still leaning towards Felix's minit as a potential alternative: http://www.fefe.de/minit/minit-linux-kongress2004.pdf
Keep in mind the requirements of GUI: dynamic monitoring and control of what's happening, including reliable propagation of errors in machine- readable form (i.e. not stderr/syslog)
Slide 19 ("How do I know which services are running?") in the minit deck gave me some "nobody has thought about how this works with a UI" sense ;-) so there might be some work left there.
UI could be everything from graphical boot progress meter, to an admin start/stop services tool, to some part of the desktop with no visible initscripts relationship but underneath it happens to need to manipulate a service.
See also David's post on fedora-desktop yesterday about getting to the login prompt more quickly.
Havoc
On Sunday 14 November 2004 22:37, Havoc Pennington wrote:
UI could be everything from graphical boot progress meter, to an admin start/stop services tool, to some part of the desktop with no visible initscripts relationship but underneath it happens to need to manipulate a service.
Nothing like firing it up and seeing what breaks! I'll be using Enrico's fedora.us packaging as a base but will update to CVS. It was put together last year to help with Vserver to boot up faster.
It will definitely require a lot of polishing to be a compelling alternative. I don't use the wording "replacement" as there is a lot of intertwined political mumbo jumbo about SysV.
Thanks for the tips.
Jeff Pitman (symbiont@berlios.de) said:
I'm still leaning towards Felix's minit as a potential alternative: http://www.fefe.de/minit/minit-linux-kongress2004.pdf
But, it'll take some work getting it up to where system-config, editing sysconfig/ files, and using familiar script utils are compatible.
Using minit as a replacement for init saves zero time (and adds additional complexity due to its (IMO, broken) dependency model.)
It's fixing the underlying actions under init that is the big win.
Bill
On Monday 15 November 2004 13:32, Bill Nottingham wrote:
Using minit as a replacement for init saves zero time (and adds additional complexity due to its (IMO, broken) dependency model.)
It's fixing the underlying actions under init that is the big win.
I agree. Though, it's nice to look at it to see what it can bring to the table, because Minit you can get a call tree of:
Minit -> service (single exec called from C)
In most cases. Whereas with the current init process you get a calltree similar to this:
Init -> Shell (rc) -> Exec'd Shells (rc?.d/S*) -> Config Files -> service
Several levels of indirection brings flexibility in what you can do and how you can configure it (Just look at the difference between SuSE and Redhat; yeah, LSB, whatever). But, this comes at a cost.
So, playing with Minit does not necessarily mean an immediate call for a replacement of init. Playing with minit highlights the changes needed in the system configuration. Though, trying to keep compliant with LSB and maintaining use of /bin/sh is going to be a tough job with the current init. With minit, the tough job would be integration with current infra. Course, this is tougher than the former, but nonetheless, as I stated earlier, a good exercise.
Maybe in the end, a C version of rc, a config compiler, and a parallelization technique with deps will get the job done using the current init.
take care,
Hi.
Arjan van de Ven arjanv@redhat.com wrote:
given the 7 second disk read time... 10 seconds is a bit unrealistic. One of the critical paths will be getting an IP address and mounting the /home dir over nfs... ethernet negotiation can easily be 10 seconds already with gige, and DHCP is depending on that to complete before it can get a lease.
WRT DHCP: why does FC wait in the foreground to get an IP? Is there a magic switch I haven't found yet to make that a background process?
It's quite boring to watch my notebook wait for a nonexistent DHCP server because it's location changed from the last boot.
On Mon, 2004-11-15 at 12:56 +0100, Ralf Ertzinger wrote:
Hi.
Arjan van de Ven arjanv@redhat.com wrote:
given the 7 second disk read time... 10 seconds is a bit unrealistic. One of the critical paths will be getting an IP address and mounting the /home dir over nfs... ethernet negotiation can easily be 10 seconds already with gige, and DHCP is depending on that to complete before it can get a lease.
WRT DHCP: why does FC wait in the foreground to get an IP? Is there a magic switch I haven't found yet to make that a background process?
nfs mounted dirs ...
Arjan van de Ven schrieb:
On Mon, 2004-11-15 at 12:56 +0100, Ralf Ertzinger wrote:
Hi.
Arjan van de Ven arjanv@redhat.com wrote:
given the 7 second disk read time... 10 seconds is a bit unrealistic. One of the critical paths will be getting an IP address and mounting the /home dir over nfs... ethernet negotiation can easily be 10 seconds already with gige, and DHCP is depending on that to complete before it can get a lease.
WRT DHCP: why does FC wait in the foreground to get an IP? Is there a magic switch I haven't found yet to make that a background process?
nfs mounted dirs ...
but there should be an option to let it be done in the background ....
On Mon, Nov 15, 2004 at 01:24:55PM +0100, dragoran wrote:
nfs mounted dirs ...
but there should be an option to let it be done in the background ....
NFS mounts have the "bg" option.
bg If the first NFS mount attempt times out, retry the mount in the background. After a mount operation is backgrounded, all subsequent mounts on the same NFS server will be backgrounded immediately, without first attempting the mount. A missing mount point is treated as a timeout, to allow for nested NFS mounts.
On Mon, 2004-11-15 at 17:13 +0000, Richard Allen wrote:
On Mon, Nov 15, 2004 at 01:24:55PM +0100, dragoran wrote:
nfs mounted dirs ...
but there should be an option to let it be done in the background ....
NFS mounts have the "bg" option.
bg If the first NFS mount attempt times out, retry the mount in the background. After a mount operation is backgrounded, all subsequent mounts on the same NFS server will be backgrounded immediately, without first attempting the mount. A missing mount point is treated as a timeout, to allow for nested NFS mounts.
This would work in the case where you the gdm prompt just sits there for 5 minutes until all the timeouts fire and the dirs mount. But doesn't really handle the case of a rapid login.
What you really want is backgrounded:
- When ethernet negotiation completes, immediately start getting a dhcp lease - When dhcp lease completes, immediately try to mount all mount points
And when the user goes to log in, and we actually *need* the NFS mount, block until it completes.
A lot of that is already there for the case of dynamic network connections later.
Regards, Owen
On Mon, 2004-11-15 at 13:05 +0100, Arjan van de Ven wrote:
On Mon, 2004-11-15 at 12:56 +0100, Ralf Ertzinger wrote:
Hi.
Arjan van de Ven arjanv@redhat.com wrote:
given the 7 second disk read time... 10 seconds is a bit unrealistic. One of the critical paths will be getting an IP address and mounting the /home dir over nfs... ethernet negotiation can easily be 10 seconds already with gige, and DHCP is depending on that to complete before it can get a lease.
WRT DHCP: why does FC wait in the foreground to get an IP? Is there a magic switch I haven't found yet to make that a background process?
nfs mounted dirs ...
That's a pretty good argument for initscripts having dependencies, rather than *just* a position in the list.
Arjan van de Ven wrote:
On Mon, 2004-11-15 at 12:56 +0100, Ralf Ertzinger wrote:
WRT DHCP: why does FC wait in the foreground to get an IP? Is there a magic switch I haven't found yet to make that a background process?
nfs mounted dirs ...
A dependency based init system that registers the necessary devices, services, and paths (all known as "entities" from here forward) for proper startup for a service seems to be an elegant solution that provides enough information to facilitate the decision making process for the init process. The idea of dependency based inits without specifying what each service is actually needs and provides is always going to yield no improvement because it doesnt attack the real granularity of the problem. (e.g. the example above really needs a directory provided by a NFS mount not the NFS service itself so checking for the availability of that path would test the real necessity of adding network infrastructure to the dep list). Sometime you require actual services (hald), certain devices (/dev/video0) or maybe just particular paths (/home/myUsername) regardless the point is the same.
Required/requested entities are registered when the service begins its initialization and provided entities are registered upon its completion. In case a service is queued up without its proper dep entities it will wait until they are provided or the init process is complete upon which they will error out. Once they are located it will add the service responsible for providing the required entity to its registered list of entity deps so next time the boot is more streamlined. Because the registered entity dep list knows which services provide particular entities (because services register the entities they provide). It can easily recognize when a registered dep is no longer necessary and can remove it from the dep list for the next reboot. This dynamic relationship might increase the time for any single boot early on if poorly configured by default but will decrease the average time for the boot process with each restart. It also evolves when required entities change how they are provided.
I acknowledge that i am grossly unfamiliar with a large portion of the startup process but i ask that you attempt to find the benefits of such a system and try to mitigate the detrimental effects before you emphasize its shortcomings. I'm probably missing entity types and oversimplifying the tasks but it just makes sense that if we want these dynamic environments to be as efficient as possible we have to provide them with the information to make that decision in the first place. No "default" will work for everyone and i think if done properly the overhead of a dynamic system can be a net gain for all if it isnt "reconfigured" every time but instead cached until next boot which ensures proper function and the ability to undo any negative changes if necessary.
Best Wishes, -mf
On Sat, 2004-11-13 at 18:35 +0100, Arjan van de Ven wrote:
given the 7 second disk read time... 10 seconds is a bit unrealistic. One of the critical paths will be getting an IP address and mounting the /home dir over nfs... ethernet negotiation can easily be 10 seconds already with gige, and DHCP is depending on that to complete before it can get a lease.
But if you're stalling on network I/O, we could run the things that don't need the network to be there at the same time. There's enough of it that I don't think dhcp should actually delay booting, if it's async.
Of course, if you're using NetworkManager, it already is... but there're some kinks in that that still need smoothing over, like the part of ntpd's initscript that does the initial clock sync never getting run while the network is up.
On Mon, 2004-11-15 at 12:24 -0500, Peter Jones wrote:
On Sat, 2004-11-13 at 18:35 +0100, Arjan van de Ven wrote:
given the 7 second disk read time... 10 seconds is a bit unrealistic. One of the critical paths will be getting an IP address and mounting the /home dir over nfs... ethernet negotiation can easily be 10 seconds already with gige, and DHCP is depending on that to complete before it can get a lease.
But if you're stalling on network I/O, we could run the things that don't need the network to be there at the same time. There's enough of it that I don't think dhcp should actually delay booting, if it's async.
Of course, if you're using NetworkManager, it already is... but there're some kinks in that that still need smoothing over, like the part of ntpd's initscript that does the initial clock sync never getting run while the network is up.
I think that in a NetworkManager world, you want NetworkManager to control ntpd entirely. And probably a number of other services too, at least by default.
About the dhcp delay, why not just use the lease from the previous boot? If not expired and for desktops only, this should be configurable for notebooks versus desktops ofcourse.
Regards,
Hans
Peter Jones wrote:
On Sat, 2004-11-13 at 18:35 +0100, Arjan van de Ven wrote:
given the 7 second disk read time... 10 seconds is a bit unrealistic. One of the critical paths will be getting an IP address and mounting the /home dir over nfs... ethernet negotiation can easily be 10 seconds already with gige, and DHCP is depending on that to complete before it can get a lease.
But if you're stalling on network I/O, we could run the things that don't need the network to be there at the same time. There's enough of it that I don't think dhcp should actually delay booting, if it's async.
Of course, if you're using NetworkManager, it already is... but there're some kinks in that that still need smoothing over, like the part of ntpd's initscript that does the initial clock sync never getting run while the network is up.
On Sat, 2004-11-13 at 12:18 -0500, Owen Taylor wrote:
Moving beyond that would probably involve instrumenting the kernel to give notification of process start and termination (possibly providing times(2) style information on termination) to provide visibility for processes that run for too short a time to be picked up by polling. Better kernel reporting of disk utilization might also be needed.
fwiw the kernel rpms have a patch that print all the files that get opened and the programs that get execed; the patch isn't applied by default but it's not hard to change that for running experiments...
IBM did some work regarding this recently. http://www-106.ibm.com/developerworks/linux/library/l-boot.html?ca=dgr-lnxw1... The biggest set of problems is figuring out what can be parallelized, and what needs to be sequential. Personally I think that starting up an xserver is not the best way to ensure a speedy boot process. We do have bootsplash availible.
You can also look at the work that gentoo has done in this area http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?part=2&chap=4 .
Their initscripts allow for dependencies which allows for an easily maintainable parallelized boot process. Unfortunately their runlevels do not conform to either LSB or POSIX, I forget which one, but they do work quickly.
On Sat, 2004-11-13 at 12:49 -0500, Christopher Hotchkiss wrote:
IBM did some work regarding this recently. http://www-106.ibm.com/developerworks/linux/library/l-boot.html?ca=dgr-lnxw1... The biggest set of problems is figuring out what can be parallelized, and what needs to be sequential. Personally I think that starting up an xserver is not the best way to ensure a speedy boot process. We do have bootsplash availible.
You can also look at the work that gentoo has done in this area http://www.gentoo.org/doc/en/handbook/handbook-x86.xml?part=2&chap=4 .
Their initscripts allow for dependencies which allows for an easily maintainable parallelized boot process. Unfortunately their runlevels do not conform to either LSB or POSIX, I forget which one, but they do work quickly.
My point is basically to move away from "is rhgb a bottleneck", "would parallelization help?" speculation, and have a single graphical display that can immediately provide those answers.
To me, the BootFast article really is a bit of a disappointment. It ends with the cop-out paragraph: (*)
The effectiveness of this technique depends on the number of services that need to be run as well as the time it takes for each service to run. The degree of parallelization possible is controlled largely by the dependencies between services. It may be that using this technique makes little improvement for some systems, while for others, it could have a dramatic impact on boot speed. This can be explained by the fact that each system has a different set of services enabled, and each of these services takes differing amounts of time to run. Once again, to use this technique, you need to establish the dependencies between the services you use for your particular system.
And doesn't provide any numbers about what speedups the author achieved. Did the author not measure anything? Did the author not get any speedup on his system? Who knows. Having before and after graphs would let you not only see whether parallelization helped, but see why it helped, or why it didn't help, and what would need to be fixed to make it help.
Thanks for the pointers, Owen
(*) Admittedly the article is really more of a tutorial than a scientific paper.
On Sat, 2004-11-13 at 14:41 -0500, Owen Taylor wrote:
My point is basically to move away from "is rhgb a bottleneck", "would parallelization help?" speculation, and have a single graphical display that can immediately provide those answers.
if you go parallal, you *have* to do the readahead-early thing, otherwise you just seek your disk to death and end up being slower most likely....
Not that I'd oppose that; resurrecting that shouldn't be hard if people want to play with it.
Arjan van de Ven (arjanv@redhat.com) said:
On Sat, 2004-11-13 at 14:41 -0500, Owen Taylor wrote:
My point is basically to move away from "is rhgb a bottleneck", "would parallelization help?" speculation, and have a single graphical display that can immediately provide those answers.
if you go parallal, you *have* to do the readahead-early thing, otherwise you just seek your disk to death and end up being slower most likely....
Bah.
Just do the readahead on the first boot., and then reorient your disk blocks in the desktop background thereafter. :)
Bill
On Mon, Nov 15, 2004 at 12:35:49AM -0500, Bill Nottingham wrote:
Bah.
Just do the readahead on the first boot., and then reorient your disk blocks in the desktop background thereafter. :)
but in a parallel startup situation, the order in which you read stuff no longer is deterministic... so reordering no longer has the effect you expect it to have..
On Mon, 2004-11-15 at 08:48 +0100, Arjan van de Ven wrote:
On Mon, Nov 15, 2004 at 12:35:49AM -0500, Bill Nottingham wrote:
Bah.
Just do the readahead on the first boot., and then reorient your disk blocks in the desktop background thereafter. :)
but in a parallel startup situation, the order in which you read stuff no longer is deterministic...
Why not? Just because you're running more than one script at once, that doesn't mean they can't still be ordered.
That is, we still don't want to start nfs until the network is up, but is there any compelling reason not to start up irqbalance, pcmcia, bluetooth, hidd, lm_sensors, smartd, acpid, messagebus, or haldaemon before then?
so reordering no longer has the effect you expect it to have..
That doesn't have to be the case; we don't need to run _everything_ in parallel, we just need to delay whatever depends on our most latent chunks, and run other things while we're stalling.
Arjan van de Ven (arjanv@redhat.com) said:
Just do the readahead on the first boot., and then reorient your disk blocks in the desktop background thereafter. :)
but in a parallel startup situation, the order in which you read stuff no longer is deterministic... so reordering no longer has the effect you expect it to have..
If you're going to do readahead, having it all contiguous is much better than having it scattered.
Bill
Owen Taylor (otaylor@redhat.com) said:
My point is basically to move away from "is rhgb a bottleneck", "would parallelization help?" speculation, and have a single graphical display that can immediately provide those answers.
Well, the answers are 'yes', and 'not immediately'. At least in testing. :)
Bill
On Mon, Nov 15, 2004 at 09:05:25PM +0100, Enrico Scholz wrote:
christopher.hotchkiss@gmail.com (Christopher Hotchkiss) writes:
You can also look at the work that gentoo has done in this area Unfortunately their runlevels do not conform to either LSB
Current RH/FC initscripts do not do this either...
They should do. Detail the error please
alan@redhat.com (Alan Cox) writes:
You can also look at the work that gentoo has done in this area Unfortunately their runlevels do not conform to either LSB
Current RH/FC initscripts do not do this either...
They should do. Detail the error please
* http://refspecs.freestandards.org/LSB_2.0.1/LSB-Core/LSB-Core/iniscrptact.ht...
RH/FC initscripts do not have: * try-restart, force-reload actions * do not return the LSB error-codes
* http://refspecs.freestandards.org/LSB_2.0.1/LSB-Core/LSB-Core/initscrcomconv...
These comments are neither in RH/FC initscripts, nor supported by the initsystem.
* http://refspecs.freestandards.org/LSB_2.0.1/LSB-Core/LSB-Core/iniscrptfunc.h...
These functions are not used by the RH/FC initscript; instead of 'daemon' or 'killproc' are there.
I do not say that these are errors; just that the RH/FC initscripts are not LSB compliant.
Enrico
On Tue, 2004-11-16 at 13:17 +0100, Enrico Scholz wrote:
I do not say that these are errors; just that the RH/FC initscripts are not LSB compliant.
one nitpick; the LSB doesn't require the "distribution native" scripts to be compliant, only that the distribution accepts compliant scripts...
http://refspecs.freestandards.org/LSB_2.0.1/LSB-Core/LSB-Core/iniscrptact.ht...
RH/FC initscripts do not have:
- try-restart, force-reload actions
- do not return the LSB error-codes
It has been higher priority to allow LSB software to run ontop of our distribution than making our own software packages LSB- conformant.
greetings,
Florian La Roche
On Sat, Nov 13, 2004 at 12:18:39PM -0500, Owen Taylor wrote:
You'd also want text in the poster; process names are one obvious textual annotation that should be easy to obtain. It might also be interested for processes to be able to provide extra annotations; for
Why not reuse the X11 visualiser for bandwidth. The one that shows all the toolkit caused stalls. It'll do the same job for run versus disk wait
On Sat, 2004-11-13 at 12:53 -0500, Alan Cox wrote:
On Sat, Nov 13, 2004 at 12:18:39PM -0500, Owen Taylor wrote:
You'd also want text in the poster; process names are one obvious textual annotation that should be easy to obtain. It might also be interested for processes to be able to provide extra annotations; for
Why not reuse the X11 visualiser for bandwidth. The one that shows all the toolkit caused stalls. It'll do the same job for run versus disk wait
I'm guessing you mean the xplot/netplot displays from Jim Gettys and Keith Packard's talk about X Window System performance - (http://keithp.com/~keithp/talks/usenix2003/).
It's certainly related to the kind of thing I'm talking about here, though I'm not sure its quite directly applicable... it applies more to displaying how well disk read-ahead is working for a particular application, then to viewing the dependency graph for a set of processes.
I'd agree it's a relevant reference point, Owen
On Sat, 2004-11-13 at 12:18 -0500, Owen Taylor wrote:
While producing a single poster would already be enormously useful, the ability to recreate the poster on any system at any point would be multiply times more so. So, changes to system components that can be gotten into the upstream projects and that can be activated at runtime rather than needing to be conditionally compiled in are best.
How about going even further and solving the optimization task automatically? Create a system which would (given certain constraints) make small rearrangements in the order of services, relocate files on the disk, etc. It would then keep rebooting, timing and making more changes. Eventually converging to... <evil>Windows XP boot times</evil> :)
Seriously though, I think this project is a great idea. The problem of boot times is especially pesky for laptop users. We're the ones stuck with crappy hard disks and even worse, have to reboot several times a day. The solution here might be a stable suspend-to-disk implementation. But this is not happening (yet), so thumbs up for a 10 second boot!
On Sun, 2004-11-14 at 01:05 +0100, Ziga Mahkovec wrote:
How about going even further and solving the optimization task automatically? Create a system which would (given certain constraints) make small rearrangements in the order of services, relocate files on the disk, etc. It would then keep rebooting, timing and making more changes. Eventually converging to... <evil>Windows XP boot times</evil> :)
to be honest, reordering on disk I don't expect to be useful on the count that we're not really seek bound...
On Sat, 2004-11-13 at 12:18 -0500, Owen Taylor wrote:
It should be possible to start with a limited set of easily collected data and already get a useful picture. Useful data collection could be as simple as taking a snapshot of the data that the "top" program displays a few times a second during boot. That already gives you a list of the running processes, their states, and some statistics about global system load.
So I gave this a try:
1. I modified the boot procedure so that early in rc.sysinit, a tmpfs is mounted and top is run in batch mode (to output every 0.2 seconds). The logged output is later parsed only up to the point where gdmgreeter is running and the system is relatively idle (i.e. boot complete and ready for login).
2. A Java program parses the log file, builds the process tree and finally renders a PNG chart. Processes are sorted by PID and traversed depth first.
This still needs more work but here's a sneak preview: http://www.klika.si/ziga/bootchart/bootchart.png
(as a result of http://www.klika.si/ziga/bootchart/bootop.log.gz )
Some processes were filtered out for clarity -- mostly sleepy kernel processes and the ones that only live for the duration of a single top sample. This skews the chart a bit but is definitely more comprehensible (compare with http://www.klika.si/ziga/bootchart/bootchart-complete.png ).
Some things I plan on adding: - start logging earlier in the boot process (possibly in initrd), - add additional layers (e.g. make use of the kernel patch Arjan suggested for showing the number of open files), - improve process tree representation and add dependency lines, - render SVG instead, for scalability and interactivity.
This definitely helped me with my boot times -- the 4-second load gap at the start I found to be "modprobe floppy", apparently timing out on my floppyless laptop :)
Any ideas or comments are welcome,
Once upon a time, Ziga Mahkovec ziga.mahkovec@klika.si said:
- I modified the boot procedure so that early in rc.sysinit, a tmpfs is
mounted and top is run in batch mode (to output every 0.2 seconds). The logged output is later parsed only up to the point where gdmgreeter is running and the system is relatively idle (i.e. boot complete and ready for login).
It would probably be easier (and more accurate) to use process accounting to gather the data. You could add it at the beginning of rc.sysinit and turn if off in rc.local.
On Mon, 2004-11-15 at 23:24 +0100, Ziga Mahkovec wrote:
- I modified the boot procedure so that early in rc.sysinit, a tmpfs is
mounted and top is run in batch mode (to output every 0.2 seconds). The logged output is later parsed only up to the point where gdmgreeter is running and the system is relatively idle (i.e. boot complete and ready for login).
- A Java program parses the log file, builds the process tree and
finally renders a PNG chart. Processes are sorted by PID and traversed depth first.
This still needs more work but here's a sneak preview: http://www.klika.si/ziga/bootchart/bootchart.png
Dude. You are awesome.
On Mon, 2004-11-15 at 23:24 +0100, Ziga Mahkovec wrote:
On Sat, 2004-11-13 at 12:18 -0500, Owen Taylor wrote:
It should be possible to start with a limited set of easily collected data and already get a useful picture. Useful data collection could be as simple as taking a snapshot of the data that the "top" program displays a few times a second during boot. That already gives you a list of the running processes, their states, and some statistics about global system load.
So I gave this a try:
- I modified the boot procedure so that early in rc.sysinit, a tmpfs is
mounted and top is run in batch mode (to output every 0.2 seconds). The logged output is later parsed only up to the point where gdmgreeter is running and the system is relatively idle (i.e. boot complete and ready for login).
- A Java program parses the log file, builds the process tree and
finally renders a PNG chart. Processes are sorted by PID and traversed depth first.
This still needs more work but here's a sneak preview: http://www.klika.si/ziga/bootchart/bootchart.png
Wow, this is fabulous work, and fast too!
What sort of libraries are you using in the Java program? Do you have any idea whether getting it to run on top of open source Java would be feasible?
How are you computing the different shades of yellow and gray? Are you looking at differences in the TIME column?
(as a result of http://www.klika.si/ziga/bootchart/bootop.log.gz )
Some processes were filtered out for clarity -- mostly sleepy kernel processes and the ones that only live for the duration of a single top sample. This skews the chart a bit but is definitely more comprehensible (compare with http://www.klika.si/ziga/bootchart/bootchart-complete.png ).
Some things I plan on adding:
- start logging earlier in the boot process (possibly in initrd),
- add additional layers (e.g. make use of the kernel patch Arjan
suggested for showing the number of open files),
- improve process tree representation and add dependency lines,
- render SVG instead, for scalability and interactivity.
All sound good. Grouping processes by the process tree would clearly make things a little clearer, though if you know what's going on, it's not hard to figure out that xkbcomp is being run by X, not by wait_for_sysfs.
Disk throughput would probably be the extra layer I'd be most interested in. We seem to be IO bound during almost the entire process, but how are we doing for efficiency? Are we doing significantly better during readahead than during start-random-services?
This definitely helped me with my boot times -- the 4-second load gap at the start I found to be "modprobe floppy", apparently timing out on my floppyless laptop :)
Just glancing at the initial image certainly brings all sorts of questions to mind:
- Why is rhgb eating so much CPU? if you run 'rhgb -i' it displays basically 0 CPU to display the animation. That looks like a pretty obvious bug we completely missed.
- Is it just a coincidence that dhclient gets the lease almost exactly simultaneously with readahead finishing? Is readahead blocking the rest of the system?
- Is readahead doing any good at all? Would it still be doing good if we fixed blocking boot for 20 seconds on dhclient?
- What does GNOME login look like?
You can also see that most of the time is eaten in just a few things ... initial module probing, starting X twice, dhclient. There are only 7 seconds from when dhclient finishes until the point prefdm starts, and that's the portion people have mostly worked on parallelizing.
Anyways, I'm very impressed, looks like I'll have to start figuring out shipping to Slovenia :-) (*) Owen
(*) Let me know when you think you you are at a point where you think you have something you'd like to have as a poster, and we can work out how to best implement the details of my offer.
On Mon, 2004-11-15 at 20:05 -0500, Owen Taylor wrote:
- Why is rhgb eating so much CPU? if you run 'rhgb -i' it displays basically 0 CPU to display the animation. That looks like a pretty obvious bug we completely missed.
Tracked this one down:
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=139463
Hopefully that will shave a few seconds off the boot when we get it fixed. Owen
On Mon, Nov 15, 2004 at 11:10:51PM -0500, Owen Taylor wrote:
On Mon, 2004-11-15 at 20:05 -0500, Owen Taylor wrote:
- Why is rhgb eating so much CPU? if you run 'rhgb -i' it displays basically 0 CPU to display the animation. That looks like a pretty obvious bug we completely missed.
Tracked this one down:
http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=139463
Hopefully that will shave a few seconds off the boot when we get it fixed.
Oops, I totally missed that ! Thanks a lot !
Daniel
On Mon, 2004-11-15 at 20:05 -0500, Owen Taylor wrote:
Wow, this is fabulous work, and fast too!
Thanks :)
What sort of libraries are you using in the Java program? Do you have any idea whether getting it to run on top of open source Java would be feasible?
I'm using the java2d and imageio packages with IBM's JDK. It doesn't work out of the box with libgcj though, so I'll have to come up to speed with the java2d/cairo development. Alternatively, I can always drop the alpha/antialias pertiness. Or switch to SVG instead and let librsvg do the work.
Anyway, I'll upload the script and source code once I clean things up.
How are you computing the different shades of yellow and gray? Are you looking at differences in the TIME column?
Only running (yellow) processes are shaded. It goes like this: - check S (status) column: - D (unint. sleep) -> gray - S (sleeping) -> light gray - Z (zombie) -> dark gray - T (traced) -> redish (but I haven't seen any) - R (running) -> check %CPU column, use #ffcb00 with alpha ranging from 50% to 100% (128 + CPU*128)
There were also some white gaps which needed squashing (fixed and updated the chart).
Just glancing at the initial image certainly brings all sorts of questions to mind:
- Why is rhgb eating so much CPU? if you run 'rhgb -i' it displays basically 0 CPU to display the animation. That looks like a pretty obvious bug we completely missed.
You seem to have tracked this one down, but here's the output without rhgb for comparison: http://www.klika.si/ziga/bootchart/bootchart-norhgb.png (boot time went from 1:27 to 0:51)
Is it just a coincidence that dhclient gets the lease almost exactly simultaneously with readahead finishing? Is readahead blocking the rest of the system?
Is readahead doing any good at all? Would it still be doing good if we fixed blocking boot for 20 seconds on dhclient?
http://www.klika.si/ziga/bootchart/bootchart-noreadahead.png (boot time: 0:49 -- note that this is *with* rhgb)
Without rhgb and readahead: http://www.klika.si/ziga/bootchart/bootchart-norhgbreadahead.png (boot time: 0:51 -- so these guys obviously don't play well together)
- What does GNOME login look like?
If I parse up to the point where gnome-panel is running and the system is 90% idle: http://www.klika.si/ziga/bootchart/bootchart-login.png
All corresponding bootop.log.{norhgb,login,...}.gz log files are also available.
Anyways, I'm very impressed, looks like I'll have to start figuring out shipping to Slovenia :-) (*)
I'd think this has gotten easier since we joined the EU :)
(*) Let me know when you think you you are at a point where you think you have something you'd like to have as a poster, and we can work out how to best implement the details of my offer.
Will do, thanks!
Hi.
Ziga Mahkovec ziga.mahkovec@klika.si wrote:
Without rhgb and readahead: http://www.klika.si/ziga/bootchart/bootchart-norhgbreadahead.png (boot time: 0:51 -- so these guys obviously don't play well together)
This shows what I have "felt" while watching my machine crawl though the boot process: as soon as the syslog daemon comes up, everything gets a lot slower.
On Tue, Nov 16, 2004 at 08:22:27PM +0100, Ralf Ertzinger wrote:
Hi.
Ziga Mahkovec ziga.mahkovec@klika.si wrote:
Without rhgb and readahead: http://www.klika.si/ziga/bootchart/bootchart-norhgbreadahead.png (boot time: 0:51 -- so these guys obviously don't play well together)
This shows what I have "felt" while watching my machine crawl though the boot process: as soon as the syslog daemon comes up, everything gets a lot slower.
Well, syslogd syncs on every message written, and the boot process spews a lot of information.
The first thing I did when installing to my laptop was to disable the sync (slow harddisk).
Perhaps the default could be changed to do syncs only a few minutes after starting up?
Regards, Luciano Rocha
Ziga Mahkovec (ziga.mahkovec@klika.si) said:
Without rhgb and readahead: http://www.klika.si/ziga/bootchart/bootchart-norhgbreadahead.png (boot time: 0:51 -- so these guys obviously don't play well together)
I'd be curious to see what happens if you turn off synchronous logging.
Bill
On Tue, 2004-11-16 at 14:45 -0500, Bill Nottingham wrote:
Without rhgb and readahead: http://www.klika.si/ziga/bootchart/bootchart-norhgbreadahead.png (boot time: 0:51 -- so these guys obviously don't play well together)
I'd be curious to see what happens if you turn off synchronous logging.
http://www.klika.si/ziga/bootchart/bootchart-asyncsyslog.png
syslogd definitely behaves better. It also decreases boot time, though this is not immediately evident since kmodule took longer this time. I've observed this with kudzu probes before.
Ziga Mahkovec (ziga.mahkovec@klika.si) said:
I'd be curious to see what happens if you turn off synchronous logging.
http://www.klika.si/ziga/bootchart/bootchart-asyncsyslog.png
syslogd definitely behaves better. It also decreases boot time, though this is not immediately evident since kmodule took longer this time. I've observed this with kudzu probes before.
You running FC3 stock or updated? (There's a 3-4 second+ delay in kmodule fixed in the update...)
Bill
On Wed, 2004-11-17 at 01:28 -0500, Bill Nottingham wrote:
http://www.klika.si/ziga/bootchart/bootchart-asyncsyslog.png
syslogd definitely behaves better. It also decreases boot time, though this is not immediately evident since kmodule took longer this time. I've observed this with kudzu probes before.
You running FC3 stock or updated? (There's a 3-4 second+ delay in kmodule fixed in the update...)
I was running initscripts-7.93.5-1 from fedora-updates. I upgraded it with the one in rawhide now (initscripts-7.96-1), but this only contains rc.sysinit changes, correct? Anyway, kmodule does seem more stable now. I also did an "alias floppy off" and upgraded rhbg (see Daniel's post):
http://www.klika.si/ziga/bootchart/bootchart-rhgbfix.png (boot time: 0:46, yay)
On Wed, Nov 17, 2004 at 01:06:07PM +0100, Ziga Mahkovec wrote:
http://www.klika.si/ziga/bootchart/bootchart-rhgbfix.png (boot time: 0:46, yay)
Looks to me like readahead_early and readahead are spending their time doing a whole lot of nothing (uninter. sleep, which I guess means "waiting for IO"). But if these two wait for IO, what process blocks it? Nothing else seems to be running at the same time, everyone waits.
On Wed, Nov 17, 2004 at 01:06:07PM +0100, Ziga Mahkovec wrote:
On Wed, 2004-11-17 at 01:28 -0500, Bill Nottingham wrote:
http://www.klika.si/ziga/bootchart/bootchart-asyncsyslog.png
syslogd definitely behaves better. It also decreases boot time, though this is not immediately evident since kmodule took longer this time. I've observed this with kudzu probes before.
You running FC3 stock or updated? (There's a 3-4 second+ delay in kmodule fixed in the update...)
I was running initscripts-7.93.5-1 from fedora-updates. I upgraded it with the one in rawhide now (initscripts-7.96-1), but this only contains rc.sysinit changes, correct? Anyway, kmodule does seem more stable now. I also did an "alias floppy off" and upgraded rhbg (see Daniel's post):
http://www.klika.si/ziga/bootchart/bootchart-rhgbfix.png (boot time: 0:46, yay)
Looks way better ! Thanks,
Daniel
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Ziga Mahkovec ziga.mahkovec@klika.si wrote:
http://www.klika.si/ziga/bootchart/bootchart-rhgbfix.png (boot time: 0:46, yay)
Hmmm.
What happens to overall boot time if rhgb and dhclient is disabled, with and without readahead?
Both rhgb and dhclient seem like they can be optimized individually — so their contribution to overall boot time can be disregarded while studying the rest of the boot process — and given the relative runtime and order of readahead vs. the following processes, readahead looks like it may be wasted here.
Also interesting would be a graph of a minimal boot — disable all nonessential services; portmap, rpc, gpm, cups, etc. — into text console. It'd reduce the “noise” in the graph and might reveal something interesting about the remaining processes.
Oh, BTW, what is that apparent zombie of S04readahead_early? Just a timing issue with the sampling?
- -- We've gotten to the point where a human-readable, human-editable text format for structured data has become a complex nightmare where somebody can safely say "As many threads on xml-dev have shown, text-based processing of XML is hazardous at best" and be perfectly valid in saying it. -- Tom Bradford
On Wed, 2004-11-17 at 13:45 +0100, Terje Bless wrote:
What happens to overall boot time if rhgb and dhclient is disabled, with and without readahead?
I already posted some results with different rhgb/readahead combinations. Before I try all other options, I'd much rather complete the work, put it online and let people try it out. Some of the issues seen in these charts (e.g. 'modprobe floppy') are specific to my laptop.
Also interesting would be a graph of a minimal boot — disable all nonessential services; portmap, rpc, gpm, cups, etc. — into text console. It'd reduce the “noise” in the graph and might reveal something interesting about the remaining processes.
Ha, no horizontal bar for this one: http://www.klika.si/ziga/bootchart/bootchart-minimal.png
That's with runlevel 3 and only the following services: xfs, iptables, messagebus, irqbalance, syslog, haldaemon, crond, atd, anacron, cpuspeed, xinetd.
Exit criteria are mingetty running and system idle.
Oh, BTW, what is that apparent zombie of S04readahead_early? Just a timing issue with the sampling?
If you take a look at the log file that produced the image (http://www.klika.si/ziga/bootchart/bootop.log.rhgbfix.gz):
@@ 3704 2085 2079 4356 3372 S 0.0 0:00.00 initlog -q -c /etc/rc5.d/S04readahead_ea 2086 2085 0 0 Z 0.0 0:00.00 [S04readahead_ea] <defunct>
It seems related to: http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=64603
Why does starting xfs do a find ? Me thinks we can save a few cycels there.
Regards,
Hans
Ziga Mahkovec wrote:
On Wed, 2004-11-17 at 13:45 +0100, Terje Bless wrote:
What happens to overall boot time if rhgb and dhclient is disabled, with and without readahead?
I already posted some results with different rhgb/readahead combinations. Before I try all other options, I'd much rather complete the work, put it online and let people try it out. Some of the issues seen in these charts (e.g. 'modprobe floppy') are specific to my laptop.
Also interesting would be a graph of a minimal boot — disable all nonessential services; portmap, rpc, gpm, cups, etc. — into text console. It'd reduce the “noise” in the graph and might reveal something interesting about the remaining processes.
Ha, no horizontal bar for this one: http://www.klika.si/ziga/bootchart/bootchart-minimal.png
That's with runlevel 3 and only the following services: xfs, iptables, messagebus, irqbalance, syslog, haldaemon, crond, atd, anacron, cpuspeed, xinetd.
Exit criteria are mingetty running and system idle.
Oh, BTW, what is that apparent zombie of S04readahead_early? Just a timing issue with the sampling?
If you take a look at the log file that produced the image (http://www.klika.si/ziga/bootchart/bootop.log.rhgbfix.gz):
@@ 3704 2085 2079 4356 3372 S 0.0 0:00.00 initlog -q -c /etc/rc5.d/S04readahead_ea 2086 2085 0 0 Z 0.0 0:00.00 [S04readahead_ea] <defunct>
It seems related to: http://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=64603
On Wed, 2004-11-17 at 13:06 +0100, Ziga Mahkovec wrote:
On Wed, 2004-11-17 at 01:28 -0500, Bill Nottingham wrote:
http://www.klika.si/ziga/bootchart/bootchart-asyncsyslog.png
syslogd definitely behaves better. It also decreases boot time, though this is not immediately evident since kmodule took longer this time. I've observed this with kudzu probes before.
You running FC3 stock or updated? (There's a 3-4 second+ delay in kmodule fixed in the update...)
I was running initscripts-7.93.5-1 from fedora-updates. I upgraded it with the one in rawhide now (initscripts-7.96-1), but this only contains rc.sysinit changes, correct? Anyway, kmodule does seem more stable now. I also did an "alias floppy off" and upgraded rhbg (see Daniel's post):
http://www.klika.si/ziga/bootchart/bootchart-rhgbfix.png (boot time: 0:46, yay)
another interesting test might be the following: disable readahead (both services), store the attached file as /tmp/files and put the following near the top of /etc/rc.sysinit: /usr/sbin/readahead `/bin/cat /tmp/files`
like this:
# Rerun ourselves through initlog if [ -z "$IN_INITLOG" -a -x /sbin/initlog ]; then exec /sbin/initlog -r /etc/rc.d/rc.sysinit fi
/usr/sbin/readahead `/bin/cat /tmp/files`
HOSTNAME=`/bin/hostname` HOSTTYPE=`uname -m` unamer=`uname -r`
this should make the boot almost diskless except for the first few seconds.
On Wed, 2004-11-17 at 21:28 +0100, Arjan van de Ven wrote:
another interesting test might be the following: disable readahead (both services), store the attached file as /tmp/files and put the following near the top of /etc/rc.sysinit: /usr/sbin/readahead `/bin/cat /tmp/files`
I had to move it a bit further down so it gets picked up by top (which in turn requires /proc):
--- /etc/rc.sysinit.orig +++ /etc/rc.sysinit @@ -26,8 +26,13 @@ mount -n -t proc /proc /proc [ -d /proc/bus/usb ] && mount -n -t usbfs /proc/bus/usb /proc/bus/usb mount -n -t sysfs /sys /sys >/dev/null 2>&1
+# Log top output for boot analysis +/usr/local/sbin/bootop start + +/usr/sbin/readahead `/bin/cat /tmp/files` + . /etc/init.d/functions
# Check SELinux status selinuxfs=`awk '/ selinuxfs / { print $2 }' /proc/mounts`
The result: http://www.klika.si/ziga/bootchart/bootchart-diskless.png
There's a 20 MB load of files in your readahead list and they are being read for 15 seconds. I guess a 26.60 MB/sec 'hdparm -t' suggests room for improvement? Note that this is on a 4200 RPM drive.
Could you upload the java program that is generating the graphs? I'm curious to see how my system performs and I'm sure others are too.
David
On Thu, 18 Nov 2004 02:33:53 +0100, Ziga Mahkovec ziga.mahkovec@klika.si wrote:
On Wed, 2004-11-17 at 21:28 +0100, Arjan van de Ven wrote:
another interesting test might be the following: disable readahead (both services), store the attached file as /tmp/files and put the following near the top of /etc/rc.sysinit: /usr/sbin/readahead `/bin/cat /tmp/files`
I had to move it a bit further down so it gets picked up by top (which in turn requires /proc):
--- /etc/rc.sysinit.orig +++ /etc/rc.sysinit @@ -26,8 +26,13 @@ mount -n -t proc /proc /proc [ -d /proc/bus/usb ] && mount -n -t usbfs /proc/bus/usb /proc/bus/usb mount -n -t sysfs /sys /sys >/dev/null 2>&1
+# Log top output for boot analysis +/usr/local/sbin/bootop start
+/usr/sbin/readahead `/bin/cat /tmp/files`
. /etc/init.d/functions
# Check SELinux status selinuxfs=`awk '/ selinuxfs / { print $2 }' /proc/mounts`
The result: http://www.klika.si/ziga/bootchart/bootchart-diskless.png
There's a 20 MB load of files in your readahead list and they are being read for 15 seconds. I guess a 26.60 MB/sec 'hdparm -t' suggests room for improvement? Note that this is on a 4200 RPM drive.
-- Ziga
-- fedora-devel-list mailing list fedora-devel-list@redhat.com http://www.redhat.com/mailman/listinfo/fedora-devel-list
On Wed, 2004-11-17 at 17:59 -0800, David Corrigan wrote:
Could you upload the java program that is generating the graphs? I'm curious to see how my system performs and I'm sure others are too.
That is certainly my intention but it needs another weekend of coding. I'll keep you posted.
On Thu, Nov 18, 2004 at 02:33:53AM +0100, Ziga Mahkovec wrote:
On Wed, 2004-11-17 at 21:28 +0100, Arjan van de Ven wrote:
another interesting test might be the following: disable readahead (both services), store the attached file as /tmp/files and put the following near the top of /etc/rc.sysinit: /usr/sbin/readahead `/bin/cat /tmp/files`
There's a 20 MB load of files in your readahead list and they are being read for 15 seconds. I guess a 26.60 MB/sec 'hdparm -t' suggests room for improvement? Note that this is on a 4200 RPM drive.
hmm yeah there ought to be room; I'll need to think about how to use that though.
One question: was the drive (light) mostly quiet when readahead was finished ?
On Thu, 2004-11-18 at 08:36 +0100, Arjan van de Ven wrote:
There's a 20 MB load of files in your readahead list and they are being read for 15 seconds. I guess a 26.60 MB/sec 'hdparm -t' suggests room for improvement? Note that this is on a 4200 RPM drive.
hmm yeah there ought to be room; I'll need to think about how to use that though.
Yeah I guess the fact that disk caches are loaded on a per-file basis doesn't help either. Because in theory: stat-ing your list takes about 3 sec and readahead on a tarball is instantaneous (both of course without boot-time readahead).
(A kick in the dark probably, but I'm thinking a contiguous readahead cache file kept updated with the changed files only.)
One question: was the drive (light) mostly quiet when readahead was finished ?
Well there are occasional flashes, even if I turn off synchronous logging (which wasn't the case when I posted the chart). I guess I'd have to tailor the list first.
BTW, some time ago (http://kerneltrap.org/node/view/2157) you mentioned 11,000 files being read during boot. This list only contains 923. Are all the rest gnome-related?
On Thu, Nov 18, 2004 at 12:06:44PM +0100, Ziga Mahkovec wrote:
logging (which wasn't the case when I posted the chart). I guess I'd have to tailor the list first.
BTW, some time ago (http://kerneltrap.org/node/view/2157) you mentioned 11,000 files being read during boot. This list only contains 923. Are all the rest gnome-related?
the 11.000 was total file opens; 923 are the unique, non-proc non-sysfs ones. The total now is 35.000 (eg up more than 3 times)
On Thu, 2004-11-18 at 12:06 +0100, Ziga Mahkovec wrote:
On Thu, 2004-11-18 at 08:36 +0100, Arjan van de Ven wrote:
There's a 20 MB load of files in your readahead list and they are being read for 15 seconds. I guess a 26.60 MB/sec 'hdparm -t' suggests room for improvement? Note that this is on a 4200 RPM drive.
hmm yeah there ought to be room; I'll need to think about how to use that though.
Yeah I guess the fact that disk caches are loaded on a per-file basis doesn't help either. Because in theory: stat-ing your list takes about 3 sec and readahead on a tarball is instantaneous (both of course without boot-time readahead).
ok here is another try; I hacked up a tool to sort the list in disk order.
use it like this:
make fileblock fileblock `cat readfiles` | sort -n | cut -f2 > sortedfiles
and use sortedfiles as filelist for readahead as before
Arjan van de Ven arjanv@redhat.com writes:
On Thu, 2004-11-18 at 12:06 +0100, Ziga Mahkovec wrote:
On Thu, 2004-11-18 at 08:36 +0100, Arjan van de Ven wrote:
There's a 20 MB load of files in your readahead list and they are being read for 15 seconds. I guess a 26.60 MB/sec 'hdparm -t' suggests room for improvement? Note that this is on a 4200 RPM drive.
hmm yeah there ought to be room; I'll need to think about how to use that though.
Yeah I guess the fact that disk caches are loaded on a per-file basis doesn't help either. Because in theory: stat-ing your list takes about 3 sec and readahead on a tarball is instantaneous (both of course without boot-time readahead).
ok here is another try; I hacked up a tool to sort the list in disk order.
use it like this:
make fileblock fileblock `cat readfiles` | sort -n | cut -f2 > sortedfiles
and use sortedfiles as filelist for readahead as before
I'm wondering..
What would be the most efficient: readahead of files or readahead of diskblocks?!
ie.. I was thinking of the following:
Have the kernel dump the I/O blocks to the kernel log (klogd shouldn't be running for obvious reasons):
# echo 1 >/proc/sys/vm/block_dump
Then using that list to preload the blocks at startup (using POSIX_FADV_RANDOM on the block device to prevent it from doing a readahead).
On Sat, 2004-11-20 at 18:47 +0100, Arjan van de Ven wrote:
Yeah I guess the fact that disk caches are loaded on a per-file basis doesn't help either. Because in theory: stat-ing your list takes about 3 sec and readahead on a tarball is instantaneous (both of course without boot-time readahead).
ok here is another try; I hacked up a tool to sort the list in disk order.
Arjan,
not quite the improvement I expected, but it does shave off an additional 2 seconds: http://www.klika.si/ziga/bootchart/bootchart-sortreadahead.png
This chart includes data from iostat (sysstat package). Notice how the disk is fully utilized (%util) during readahead, but the throughput (rkB/s) is *really* low. This could very well be a problem with my hard disk. hdparm seems fine though (and I checked the parameters before running readahead).
Let me know if you'd find any other iostat columns useful in the chart. The log file is here: http://www.klika.si/ziga/bootchart/boot.io.log.sortreadahead.gz
Anyway, the bootchart scripts and source code will be on SourceForge tomorrow so people will be able to post their results.
Thanks,
On Sat, Nov 20, 2004 at 11:50:49PM +0100, Ziga Mahkovec wrote:
This chart includes data from iostat (sysstat package). Notice how the disk is fully utilized (%util) during readahead, but the throughput (rkB/s) is *really* low. This could very well be a problem with my hard disk. hdparm seems fine though (and I checked the parameters before running readahead).
Disks are very very seek constrained. You get wonderful performance reading linear data. The moment you read a lot of scattered files or a file with a lot of segments you will get low performance - even more so on laptops than desktops
On Sun, Nov 21, 2004 at 10:42:45AM -0500, Alan Cox wrote:
On Sat, Nov 20, 2004 at 11:50:49PM +0100, Ziga Mahkovec wrote:
This chart includes data from iostat (sysstat package). Notice how the disk is fully utilized (%util) during readahead, but the throughput (rkB/s) is *really* low. This could very well be a problem with my hard disk. hdparm seems fine though (and I checked the parameters before running readahead).
Disks are very very seek constrained. You get wonderful performance reading linear data. The moment you read a lot of scattered files or a file with a lot of segments you will get low performance - even more so on laptops than desktops
yeah we saw that; sorting the list on disk sector shaved 2 seconds off... if we want to save more we'll have to fix the on disk layout to be less spread out. That's not going to be fun...
On Sun, 2004-11-21 at 16:55 +0100, Arjan van de Ven wrote:
Disks are very very seek constrained. You get wonderful performance reading linear data. The moment you read a lot of scattered files or a file with a lot of segments you will get low performance - even more so on laptops than desktops
yeah we saw that; sorting the list on disk sector shaved 2 seconds off... if we want to save more we'll have to fix the on disk layout to be less spread out. That's not going to be fun...
To elaborate, Arjan suggested analyzing the distribution of the blocks and [1] does seem to explain why the throughput was good at start but declined later.
[1] http://www.klika.si/ziga/bootchart/blocks.png (the image shows the distribution of the readahead file blocks, with a log scale filesize radius and random vertical distribution; only the first 1/4th of the 34GB device is shown).
On Sun, Nov 21, 2004 at 04:55:47PM +0100, Arjan van de Ven wrote:
On Sun, Nov 21, 2004 at 10:42:45AM -0500, Alan Cox wrote:
On Sat, Nov 20, 2004 at 11:50:49PM +0100, Ziga Mahkovec wrote:
This chart includes data from iostat (sysstat package). Notice how the disk is fully utilized (%util) during readahead, but the throughput (rkB/s) is *really* low. This could very well be a problem with my hard disk. hdparm seems fine though (and I checked the parameters before running readahead).
Disks are very very seek constrained. You get wonderful performance reading linear data. The moment you read a lot of scattered files or a file with a lot of segments you will get low performance - even more so on laptops than desktops
yeah we saw that; sorting the list on disk sector shaved 2 seconds off... if we want to save more we'll have to fix the on disk layout to be less spread out. That's not going to be fun...
Let's see ... Suppose we isolate all the resources we need to load quickly, we have a list of files, hopefully all from the same / partition, while in single user mode and without concurrent activity:
for foo in $list: cp $foo $foo.new for foo in $list: rm $foo for foo in $list: mv $foo.new $foo
We could expect filesystems to allocate the new blocks (data and possibly metadata) more or less sequentially on disk. What would led the filesystem code to not be sequential (most of the time assuming a single block device underneath) ?
Just wondering...
Daniel
On Sun, Nov 21, 2004 at 11:28:43AM -0500, Daniel Veillard wrote: [cut defrag method]
We could expect filesystems to allocate the new blocks (data and possibly metadata) more or less sequentially on disk. What would led the filesystem code to not be sequential (most of the time assuming a single block device underneath) ?
Here is an article on the topic. Do we have ext3 block reservation in FC kernels?
On Sun, 2004-11-21 at 11:49 -0500, Charles R. Anderson wrote:
On Sun, Nov 21, 2004 at 11:28:43AM -0500, Daniel Veillard wrote: [cut defrag method]
We could expect filesystems to allocate the new blocks (data and possibly metadata) more or less sequentially on disk. What would led the filesystem code to not be sequential (most of the time assuming a single block device underneath) ?
Here is an article on the topic. Do we have ext3 block reservation in FC kernels?
yes.
the files themselves aren't internally fragmented. it's teh seeks between teh various files that causes things.
On Sun, 2004-11-21 at 11:49 -0500, Charles R. Anderson wrote:
On Sun, Nov 21, 2004 at 11:28:43AM -0500, Daniel Veillard wrote: [cut defrag method]
We could expect filesystems to allocate the new blocks (data and possibly metadata) more or less sequentially on disk. What would led the filesystem code to not be sequential (most of the time assuming a single block device underneath) ?
Here is an article on the topic. Do we have ext3 block reservation in FC kernels?
yes.
the files themselves aren't internally fragmented. it's teh seeks between teh various files that causes things.
On Sun, 2004-11-21 at 11:28 -0500, Daniel Veillard wrote:
Let's see ... Suppose we isolate all the resources we need to load quickly, we have a list of files, hopefully all from the same / partition,
we have that ;-)
while in single user mode and without concurrent activity:
for foo in $list: cp $foo $foo.new for foo in $list: rm $foo for foo in $list: mv $foo.new $foo
We could expect filesystems to allocate the new blocks (data and possibly metadata) more or less sequentially on disk. What would led the filesystem code to not be sequential (most of the time assuming a single block device underneath)
nope this doesn't work; while each file individually will be sequential, they are not sequential on disk. Note: teh files already aren't fragmented, at least on my testsystem.
On Sun, Nov 21, 2004 at 05:51:49PM +0100, Arjan van de Ven wrote:
On Sun, 2004-11-21 at 11:28 -0500, Daniel Veillard wrote:
while in single user mode and without concurrent activity:
for foo in $list: cp $foo $foo.new
[...]
We could expect filesystems to allocate the new blocks (data and possibly metadata) more or less sequentially on disk. What would led the filesystem code to not be sequential (most of the time assuming a single block device underneath)
nope this doesn't work; while each file individually will be sequential, they are not sequential on disk. Note: teh files already aren't fragmented, at least on my testsystem.
yeah, but why does ext3 allocator doesn't allocate consecutive blocks for such a pattern ? Directory locality ? Still wondering :-) It must be possible one way or another to do this without going though very complex reservation interfaces. The problem is not to 100% garantee we will not seek at all while going though this bunch of files but to have only a reasonable amount of seeks. Suppose there is only 10 seeks instead of a single block that would amount only for 1 tenth of a second delay on "normal" hardware.
Daniel
If each individual file is unfragmented then why not create a loop back device with all the necessary files for booting, copy it into memory, and mount it? As long as that one file remains unfragmented then there will be a minimal amount of drive seeking involved.
David
On Sun, 21 Nov 2004 17:41:38 -0500, Daniel Veillard veillard@redhat.com wrote:
On Sun, Nov 21, 2004 at 05:51:49PM +0100, Arjan van de Ven wrote:
On Sun, 2004-11-21 at 11:28 -0500, Daniel Veillard wrote:
while in single user mode and without concurrent activity:
for foo in $list: cp $foo $foo.new
[...]
We could expect filesystems to allocate the new blocks (data and possibly metadata) more or less sequentially on disk. What would led the filesystem code to not be sequential (most of the time assuming a single block device underneath)
nope this doesn't work; while each file individually will be sequential, they are not sequential on disk. Note: teh files already aren't fragmented, at least on my testsystem.
yeah, but why does ext3 allocator doesn't allocate consecutive blocks for such a pattern ? Directory locality ? Still wondering :-) It must be possible one way or another to do this without going though very complex reservation interfaces. The problem is not to 100% garantee we will not seek at all while going though this bunch of files but to have only a reasonable amount of seeks. Suppose there is only 10 seeks instead of a single block that would amount only for 1 tenth of a second delay on "normal" hardware.
Daniel
-- Daniel Veillard | Red Hat Desktop team http://redhat.com/ veillard@redhat.com | libxml GNOME XML XSLT toolkit http://xmlsoft.org/ http://veillard.com/ | Rpmfind RPM search engine http://rpmfind.net/
-- fedora-devel-list mailing list fedora-devel-list@redhat.com http://www.redhat.com/mailman/listinfo/fedora-devel-list
On 11/21/2004 10:55:47 AM, Arjan van de Ven wrote:
On Sun, Nov 21, 2004 at 10:42:45AM -0500, Alan Cox wrote:
On Sat, Nov 20, 2004 at 11:50:49PM +0100, Ziga Mahkovec wrote:
This chart includes data from iostat (sysstat package). Notice how the disk is fully utilized (%util) during readahead, but the throughput (rkB/s) is *really* low. This could very well be a problem with my hard disk. hdparm seems fine though (and I checked the parameters before running readahead).
Disks are very very seek constrained. You get wonderful performance reading linear data. The moment you read a lot of scattered files or a file with a lot of segments you will get low performance - even more so on laptops than desktops
yeah we saw that; sorting the list on disk sector shaved 2 seconds off... if we want to save more we'll have to fix the on disk layout to be less spread out. That's not going to be fun...
Is there any way to have a "pre-cooked" swap image of all the files you need so that when you boot you can swap it all in in one big contiguous read instead of having to read file by file?
Regards, Willem Riede.
yeah we saw that; sorting the list on disk sector shaved 2 seconds off... if we want to save more we'll have to fix the on disk layout to be less spread out. That's not going to be fun...
Is there any way to have a "pre-cooked" swap image of all the files you need so that when you boot you can swap it all in in one big contiguous read instead of having to read file by file?
not currently; file contents also never hits swap, it would require like a full vm subsystem rewrite to achieve this. It's probably a lot easier to either write some defrag tool that can move stuff, or to make a hidden automatic buffer in the fs
Arjan van de Ven wrote:
yeah we saw that; sorting the list on disk sector shaved 2 seconds off... if we want to save more we'll have to fix the on disk layout to be less spread out. That's not going to be fun...
Is there any way to have a "pre-cooked" swap image of all the files you need so that when you boot you can swap it all in in one big contiguous read instead of having to read file by file?
not currently; file contents also never hits swap, it would require like a full vm subsystem rewrite to achieve this. It's probably a lot easier to either write some defrag tool that can move stuff, or to make a hidden automatic buffer in the fs
I was thinking along side the hidden automatic buffer in the fs, just set asside a partition. and dump everything there in the order needed.
Problem 1: how do we tell the kernel that the block just read equals a block on one of the mounted FS and that the kernel can use the read block instead of reading it from the mounted FS?
Problem 2: how do we keep the partition uptodate?
Which makes me wonder if this is the path to follow.
Regards,
Hans
On Sun, Nov 21, 2004 at 09:41:21PM +0100, Hans de Goede wrote:
Problem 1: how do we tell the kernel that the block just read equals a block on one of the mounted FS and that the kernel can use the read block instead of reading it from the mounted FS?
if you do it inside the filesystem then the fs can do it internally
Problem 2: how do we keep the partition uptodate?
againm the fs can just do that transparent
Arjan van de Ven arjanv@redhat.com writes:
On Sun, Nov 21, 2004 at 09:41:21PM +0100, Hans de Goede wrote:
Problem 1: how do we tell the kernel that the block just read equals a block on one of the mounted FS and that the kernel can use the read block instead of reading it from the mounted FS?
if you do it inside the filesystem then the fs can do it internally
Wild idea...
Couldn't the cachefs be used for something like this?! Then it would work with all the filesystems.
søn, 21.11.2004 kl. 21.31 skrev Arjan van de Ven:
yeah we saw that; sorting the list on disk sector shaved 2 seconds off... if we want to save more we'll have to fix the on disk layout to be less spread out. That's not going to be fun...
Is there any way to have a "pre-cooked" swap image of all the files you need so that when you boot you can swap it all in in one big contiguous read instead of having to read file by file?
not currently; file contents also never hits swap, it would require like a full vm subsystem rewrite to achieve this. It's probably a lot easier to either write some defrag tool that can move stuff, or to make a hidden automatic buffer in the fs
Hmm... I thougth there was no "defrag.ext3" in Linux because it wasn't neccesary. Was i wrong?
On Sun, Nov 21, 2004 at 10:01:54PM +0100, Kyrre Ness Sjobak wrote:
søn, 21.11.2004 kl. 21.31 skrev Arjan van de Ven:
yeah we saw that; sorting the list on disk sector shaved 2 seconds off... if we want to save more we'll have to fix the on disk layout to be less spread out. That's not going to be fun...
Is there any way to have a "pre-cooked" swap image of all the files you need so that when you boot you can swap it all in in one big contiguous read instead of having to read file by file?
not currently; file contents also never hits swap, it would require like a full vm subsystem rewrite to achieve this. It's probably a lot easier to either write some defrag tool that can move stuff, or to make a hidden automatic buffer in the fs
Hmm... I thougth there was no "defrag.ext3" in Linux because it wasn't neccesary. Was i wrong?
the files themselves don't get fragmented in this case.. it's just that we want to play with moving certain files to certain locations....
Kyrre Ness Sjobak kyrre@solution-forge.net writes:
Hmm... I thougth there was no "defrag.ext3" in Linux because it wasn't neccesary. Was i wrong?
There is 1, albeit not open source:
O&O Defrag - Defragmentation for Linux ext2/ext3. (c) 2004 O&O Software GmbH. All rights reserved. O&O Defrag CommandLine Utility Beta Version 1.0 Build 4758
On Sun, Nov 21, 2004 at 04:55:47PM +0100, Arjan van de Ven wrote:
On Sun, Nov 21, 2004 at 10:42:45AM -0500, Alan Cox wrote:
On Sat, Nov 20, 2004 at 11:50:49PM +0100, Ziga Mahkovec wrote:
This chart includes data from iostat (sysstat package). Notice how the disk is fully utilized (%util) during readahead, but the throughput (rkB/s) is *really* low. This could very well be a problem with my hard disk. hdparm seems fine though (and I checked the parameters before running readahead).
Disks are very very seek constrained. You get wonderful performance reading linear data. The moment you read a lot of scattered files or a file with a lot of segments you will get low performance - even more so on laptops than desktops
yeah we saw that; sorting the list on disk sector shaved 2 seconds off... if we want to save more we'll have to fix the on disk layout to be less spread out. That's not going to be fun...
Is it possible to get a list of the sectors read for each file, instead of only the file name?
Or do you expect that, for each file read, it will be read almost entirely?
Regards, Luciano Rocha
On Tue, 2004-11-23 at 12:12 +0000, Luciano Miguel Ferreira Rocha wrote:
yeah we saw that; sorting the list on disk sector shaved 2 seconds off... if we want to save more we'll have to fix the on disk layout to be less spread out. That's not going to be fun...
Is it possible to get a list of the sectors read for each file, instead of only the file name?
I posted a program the other day that does this for the first sector; the bmap program from http://www.zip.com.au/~akpm/linux/patches/stuff/ext3-tools.tar.gz lists all sectors in use (but in a more complex format)
Alan Cox wrote:
Disks are very very seek constrained. You get wonderful performance reading linear data. The moment you read a lot of scattered files or a file with a lot of segments you will get low performance - even more so on laptops than desktops
How much of a factor is rotational latency? Or does sucking entire track into buffer make rotational latency a non-factor these days?
73 de Jeff
OOo widgets font look terrible in my desktop. I followed this guide to change it, without success:
http://www.openoffice.org/FAQs/fontguide.html#9
I suspect OOo Fedora packages have the widget font changed to something else than Andale Sans UI. Anybody knows which font?
Anybody knows how to change OOo widgets fonts ?
Thank you, Avi
I use the gtk-qt engine from http://freedesktop.org/wiki/Software_2fgtk_2dqt OOo uses the same fonts my gtk/qt apps do, once configured inside the kcontrol center. -Nick Bargnesi
On Sun, 2004-11-21 at 15:19 -0200, Avi Alkalay wrote:
OOo widgets font look terrible in my desktop. I followed this guide to change it, without success:
http://www.openoffice.org/FAQs/fontguide.html#9
I suspect OOo Fedora packages have the widget font changed to something else than Andale Sans UI. Anybody knows which font?
Anybody knows how to change OOo widgets fonts ?
Thank you, Avi
On Sun, 21 Nov 2004, Nick Bargnesi wrote:
I use the gtk-qt engine from http://freedesktop.org/wiki/Software_2fgtk_2dqt OOo uses the same fonts my gtk/qt apps do, once configured inside the kcontrol center.
FYI, gtk-qt-engine is in Extras, and has a update awaiting QA to the latest version: http://bugzilla.fedora.us/show_bug.cgi?id=2035
-- Rex
Hi,
What desktop environment are you using? OOo pulls the UI font from the desktop environment, in the GNOME case from GTK, and in the KDE case from QT. You should try changing the system-wide application font to see if that helps. But if you could check a copule things:
1) What desktop environment? 2) Do you have openoffice.org-kde package installed if you're using KDE? 3) What's your normal application font from the desktop env?
Dan
On Sun, 21 Nov 2004, Avi Alkalay wrote:
OOo widgets font look terrible in my desktop. I followed this guide to change it, without success:
http://www.openoffice.org/FAQs/fontguide.html#9
I suspect OOo Fedora packages have the widget font changed to something else than Andale Sans UI. Anybody knows which font?
Anybody knows how to change OOo widgets fonts ?
Thank you, Avi
-- fedora-devel-list mailing list fedora-devel-list@redhat.com http://www.redhat.com/mailman/listinfo/fedora-devel-list
On Sun, 21 Nov 2004 19:34:35 -0500 (EST), Dan Williams dcbw@redhat.com wrote:
- What desktop environment?
KDE
- Do you have openoffice.org-kde package installed if you're using KDE?
Yep
- What's your normal application font from the desktop env?
Tahoma, with a byte code interpreter enabled FreeType lib. My desktop looks simply great. But OOo fonts still looks like s***t.
Any idea ?
On Mon, 22 Nov 2004, Avi Alkalay wrote:
- What's your normal application font from the desktop env?
Tahoma, with a byte code interpreter enabled FreeType lib. My desktop looks simply great. But OOo fonts still looks like s***t.
Ironically, could be the fact you've enabled the byte-code interpreter. Try backing out that change.
-- Rex
On Mon, 22 Nov 2004 02:20:27 -0600 (CST), Rex Dieter rdieter@math.unl.edu wrote:
On Mon, 22 Nov 2004, Avi Alkalay wrote:
- What's your normal application font from the desktop env?
Tahoma, with a byte code interpreter enabled FreeType lib. My desktop looks simply great. But OOo fonts still looks like s***t.
Ironically, could be the fact you've enabled the byte-code interpreter. Try backing out that change.
Impossible. Then all my desktop will look like s*****it.
Any better idea ?
On Mon, 22 Nov 2004, Avi Alkalay wrote:
On Mon, 22 Nov 2004 02:20:27 -0600 (CST), Rex Dieter rdieter@math.unl.edu wrote:
On Mon, 22 Nov 2004, Avi Alkalay wrote:
- What's your normal application font from the desktop env?
Tahoma, with a byte code interpreter enabled FreeType lib. My desktop looks simply great. But OOo fonts still looks like s***t.
Ironically, could be the fact you've enabled the byte-code interpreter. Try backing out that change.
Impossible. Then all my desktop will look like s*****it.
Not impossible. I've seen cases where AA fonts look *worse* after turning on the byte-code interpreter.
However, my bet is that you've possibly hit http://bugzilla.redhat.com/bugzilla/133741
-- Rex
On Mon, 22 Nov 2004, Rex Dieter wrote:
However, my bet is that you've possibly hit http://bugzilla.redhat.com/bugzilla/133741
That's probably the case here, yes.
Dan
On Sun, Nov 21, 2004 at 07:34:35PM -0500, Dan Williams wrote:
What desktop environment are you using? OOo pulls the UI font from the desktop environment, in the GNOME case from GTK, and in the KDE case from QT. You should try changing the system-wide application font to see if that helps. But if you could check a copule things:
I get the same problem btw - Gnome uses a sensible font but OpenOffice chooses some ghastly fixed font for the widgets. In my case this is locale dependant and the bug was put down to font issues and limits in Oo's font capabilities compared with Gnome/KDE.
On Mon, 22 Nov 2004, Alan Cox wrote:
On Sun, Nov 21, 2004 at 07:34:35PM -0500, Dan Williams wrote:
What desktop environment are you using? OOo pulls the UI font from the desktop environment, in the GNOME case from GTK, and in the KDE case from QT. You should try changing the system-wide application font to see if that helps. But if you could check a copule things:
I get the same problem btw - Gnome uses a sensible font but OpenOffice chooses some ghastly fixed font for the widgets. In my case this is locale dependant and the bug was put down to font issues and limits in Oo's font capabilities compared with Gnome/KDE.
Ok, this is a known bug then, the OOo KDE code needs to do fontconfig substitution on the font that KDE returns using LANG as the constraint.
Dan
On Monday 22 November 2004 08:02 am, Dan Williams wrote:
On Mon, 22 Nov 2004, Alan Cox wrote:
On Sun, Nov 21, 2004 at 07:34:35PM -0500, Dan Williams wrote:
What desktop environment are you using? OOo pulls the UI font from the desktop environment, in the GNOME case from GTK, and in the KDE case from QT. You should try changing the system-wide application font to see if that helps. But if you could check a copule things:
I get the same problem btw - Gnome uses a sensible font but OpenOffice chooses some ghastly fixed font for the widgets. In my case this is locale dependant and the bug was put down to font issues and limits in Oo's font capabilities compared with Gnome/KDE.
Ok, this is a known bug then, the OOo KDE code needs to do fontconfig substitution on the font that KDE returns using LANG as the constraint.
Dan
Hello, Everyone :) This weekend I had to use Gnome for a day or two, and I was struck by how much better OpenOffice 1.9m62 looked in Gnome than it does in KDE. Before I get into this, I need to say that this might not have anything to do with the problem you are having. I found the configuration setting that fixed my problem at the following location: "Tools | Options | OpenOffice.org | View" On that window was this item: "Use system font for user interface" I unchecked that box, clicked OK and OpenOffice.org 1.9m62 now looks just as good in KDE as it does in GNOME.
I do apologize if this had nothing to do with your problem.
Steven P. Ulrick
On Mon, 22 Nov 2004 09:54:04 -0600, Steven P. Ulrick ulrick2@faith4miracle.org wrote:
Hello, Everyone :) This weekend I had to use Gnome for a day or two, and I was struck by how much better OpenOffice 1.9m62 looked in Gnome than it does in KDE. Before I get into this, I need to say that this might not have anything to do with the problem you are having. I found the configuration setting that fixed my problem at the following location: "Tools | Options | OpenOffice.org | View" On that window was this item: "Use system font for user interface" I unchecked that box, clicked OK and OpenOffice.org 1.9m62 now looks just as good in KDE as it does in GNOME.
I do apologize if this had nothing to do with your problem.
Steven P. Ulrick
It seems this is a valid workaround. I'll tray later.
Thank you, Avi
I found the solution.
Menu Tools -> Options -> Tree OOo -> View -> Screen Font antialiasing
I set it to antialias begining from 13 pixels (about 10pt). Then I found it is using the same wonderfull excelent MS Tahoma 8pt I used to configure KDE. So it looks KDE font configurations were propagated, except my anti-aliasing settings.
I'm attaching a screenshot with the dialog to let you see were the solution is, and how wonderful OOo looks now.
Regards, Avi
On Mon, 22 Nov 2004 14:57:24 -0300, Avi Alkalay avibrazil@gmail.com wrote:
On Mon, 22 Nov 2004 09:54:04 -0600, Steven P. Ulrick
ulrick2@faith4miracle.org wrote:
Hello, Everyone :) This weekend I had to use Gnome for a day or two, and I was struck by how much better OpenOffice 1.9m62 looked in Gnome than it does in KDE. Before I get into this, I need to say that this might not have anything to do with the problem you are having. I found the configuration setting that fixed my problem at the following location: "Tools | Options | OpenOffice.org | View" On that window was this item: "Use system font for user interface" I unchecked that box, clicked OK and OpenOffice.org 1.9m62 now looks just as good in KDE as it does in GNOME.
I do apologize if this had nothing to do with your problem.
Steven P. Ulrick
It seems this is a valid workaround. I'll tray later.
Thank you, Avi
On Sun, Nov 21, 2004 at 12:11:56PM -0500, Jeff Johnson wrote:
How much of a factor is rotational latency? Or does sucking entire track into buffer make rotational latency a non-factor these days?
It still seems to matter from timing. I would assume the drives are smart enough to hand back sectors as they read them even if they are caching ahead and doing other clever stuff.
Ziga Mahkovec ziga.mahkovec@klika.si writes:
On Wed, 2004-11-17 at 21:28 +0100, Arjan van de Ven wrote:
another interesting test might be the following: disable readahead (both services), store the attached file as /tmp/files and put the following near the top of /etc/rc.sysinit: /usr/sbin/readahead `/bin/cat /tmp/files`
The following output might be nice to see:
# xargs </tmp/files filefrag|grep -v "1 extent"
it will show you which files are fragmented. (assuming you are using ext3)
On Thu, 2004-11-18 at 15:26 +0100, Karl Vogel wrote:
On Wed, 2004-11-17 at 21:28 +0100, Arjan van de Ven wrote:
another interesting test might be the following: disable readahead (both services), store the attached file as /tmp/files and put the following near the top of /etc/rc.sysinit: /usr/sbin/readahead `/bin/cat /tmp/files`
The following output might be nice to see: # xargs </tmp/files filefrag|grep -v "1 extent" it will show you which files are fragmented. (assuming you are using ext3)
I am using ext3 and none of the files are fragmented (it's a freshly installed system).
On Wed, Nov 17, 2004 at 09:28:49PM +0100, Arjan van de Ven wrote:
another interesting test might be the following: disable readahead (both services), store the attached file as /tmp/files and put the following near the top of /etc/rc.sysinit: /usr/sbin/readahead `/bin/cat /tmp/files`
Hey! Don't waste cycles like that! If you are trying to squeeze things as much as possible, and since rc.sysinit already requires bash anyway, you want this instead:
/usr/sbin/readahead $(< /tmp/files)
(assuming that the file list will always exist and be readable)
Now let me go back to rewriting my init scripts into native code...
Ziga Mahkovec (ziga.mahkovec@klika.si) said:
On Wed, 2004-11-17 at 01:28 -0500, Bill Nottingham wrote:
http://www.klika.si/ziga/bootchart/bootchart-asyncsyslog.png
syslogd definitely behaves better. It also decreases boot time, though this is not immediately evident since kmodule took longer this time. I've observed this with kudzu probes before.
You running FC3 stock or updated? (There's a 3-4 second+ delay in kmodule fixed in the update...)
I was running initscripts-7.93.5-1 from fedora-updates.
Ah, ok, that has the kmodule fix.
I upgraded it with the one in rawhide now (initscripts-7.96-1), but this only contains rc.sysinit changes, correct?
Correct. Guaranteed to be faster, and almost certainly guaranteed to break something.
Bill
FWIW.. my bootlog (no flashy graphs though):
http://users.telenet.be/kvogel/boot.html
Created by following ugly hackish scripting:
--- rc.sysinit.orig 2004-11-18 16:49:52.624083048 +0100 +++ /etc/rc.sysinit 2004-11-18 03:24:43.000000000 +0100 @@ -29,6 +29,11 @@
. /etc/init.d/functions
+cmdline=$(cat /proc/cmdline) +if strstr "$cmdline" trace; then + nohup </dev/null >/dev/null 2>&1 /bin/bash /etc/rc.tracer & +fi + # Check SELinux status selinuxfs=`awk '/ selinuxfs / { print $2 }' /proc/mounts` SELINUX=
----------- /etc/rc.tracer #!/bin/bash --login mount -t tmpfs none /mnt/f ( /sbin/hwclock --hctosys --localtime while true do echo ============================== $(date) ps -eHwwo ppid,pid,fname:20,stat,wchan:16,ni,cp,start,cputime,maj_flt,min_flt,rss,sz,vsz,cmd vmstat -d |sed -e '/hda/p' -e '3,$d' sleep 0.5 done ) >/mnt/f/boot.stats 2>&1 ------------
And I put the following in /etc/rc.local :
-------------- . /etc/init.d/functions cmdline=$(cat /proc/cmdline) if strstr "$cmdline" trace; then echo Stopping tracer pkill rc.tracer egrep -v 'ps -eHwwo|bash --login' /mnt/f/boot.stats >/var/log/boot.stats date >>/var/log/boot.stats echo -n Unmounting tmpfs: umount /mnt/f if [ $? -eq 0 ]; then success else failure sleep 8 fi fi ---------------
Some caveats: - at the end of rc.sysinit (just after turning the swap on), init waits for a key. Pressing CTRL+C makes it continue. (I thought the nohup would help, but it doesn't) - my /usr and /var are in the same partition as my root, so the script needs some work when using separate partitions. - the tmpfs is mounted on /mnt/f - umount fails, haven't looked why yet
Usage: - boot with 'trace' on the kernel commandline. After bootup there will be a /var/log/boot.stats file.
Use the following perl script to generate the html output:
--- boot2html --- #!/usr/bin/perl open(IN, "</var/log/boot.stats") or die; open(OUT, ">boot.html"); print OUT "<HTML><BODY><PRE>"; while(<IN>) { s/</</g; s/>/>/g; @arg= split; if (/=======/) { print OUT "<HR><FONT SIZE=+2>$_</FONT><HR>"; } elsif ( $arg[3] =~ /R/) { print OUT "<FONT COLOR=red>$_</FONT>"; } elsif ( $arg[3] =~ /D/) { print OUT "<FONT COLOR=blue>$_</FONT>"; } else { print OUT $_; } }
print OUT "</PRE></BODY></HTML>"; close(OUT); close(IN); --------------
Feel free to add some GD.pm lovin' to the script :-)
Karl Vogel karl.vogel@telenet.be writes:
FWIW.. my bootlog (no flashy graphs though):
Feel free to add some GD.pm lovin' to the script :-)
Grabbed GD.pm and fooled around with it a bit:
http://users.telenet.be/kvogel/boot.png
Still looks kinda lame compared to some other graphs, but then design has never been one of my strong points :)
Updated this log with a new run using sleep 0.10 interval and also added vmstat to the output, which shows that most of the time, my machine is idle (even with the extra cpu time needed for the logger). The boot is to initlevel 3 without X11 and no DHCP, which takes 29 seconds.
NOTE: this laptop boots in 10 seconds from GRUB to GNOME (with firefox, XEmacs and a couple of terminals open) using swsusp2, so there is certainly some room for improvement :)
Hi.
Bill Nottingham notting@redhat.com wrote:
Ah, ok, that has the kmodule fix.
What exactly does kmodule do, anyway? I know it spits out a list of kernel modules to be loaded, together with the type, but how does it come up with the list?
Hi,
I just ran it and the output seems to be a list of the ports installed on a computer. Here is the output and what it is:
NETWORK sis900 Network card AUDIO snd-intel8x0 Sound Card HD usb-storage 5 in 1 reader? CAPTURE snd-bt87x TV tuner card - audio CAPTURE bttv TV tuner card - video USB ehci-hcd USB port USB uhci-hcd USB port USB uhci-hcd USB port USB ohci-hcd USB port USB ohci-hcd USB port FIREWIRE ohci1394 Firewire card
It doesn't seem to list stuff like the parallel and serial ports.
David
On Fri, 19 Nov 2004 15:14:51 +0100, Ralf Ertzinger fedora-devel@camperquake.de wrote:
Hi.
Bill Nottingham notting@redhat.com wrote:
Ah, ok, that has the kmodule fix.
What exactly does kmodule do, anyway? I know it spits out a list of kernel modules to be loaded, together with the type, but how does it come up with the list?
-- "It is better to be silent and thought a fool, than to speak and remove all doubt." -- Silvan Engel
-- fedora-devel-list mailing list fedora-devel-list@redhat.com http://www.redhat.com/mailman/listinfo/fedora-devel-list
Ralf Ertzinger (fedora-devel@camperquake.de) said:
Ah, ok, that has the kmodule fix.
What exactly does kmodule do, anyway? I know it spits out a list of kernel modules to be loaded, together with the type, but how does it come up with the list?
It uses the same code that kudzu uses.
Bill
Sorry to hang up on this part of the discussion, I've just subscribed fedora-devel because I knew there was a discussion about the boot time on Fedora.
I was trying to improve the boot time with FC2 some weeks ago, and I got very interesting results. This is not directly related to the boot poster, it is just a trial and error work to improve FC2 boot time.
My machine:
P4-M 2.8Ghz HT 512MB RAM 4200RM hard disk drive (dell notebook, very slow compared to a desktop machine) Fedora Core 2 + updates Nvidia card with nvidia drivers
Critical times for me:
GDM: Time from GRUB until GDM gets ready to login an user GNOME: Time to load the whole gnome-session for the fist time EPIPHANY: Time to load epiphany for the fist time OPENOFFICE: Time to load any openoffice application for the fist time
Original timings:
- GDM: 75 seconds (more than 1 minute!) - GNOME: 41 seconds - EPIPHANY: 5 seconds - OPENOFFICE: 22 seconds
Results: - GDM: 38 seconds (!!) - GNOME: 19 seconds - EPIPHANY: 2 seconds - OPENOFFICE: 5 seconds (!!)
Bottlenecks I couldn't resolve:
- IDE detection in the kernel - init start time - USB detection - nvidia driver probing for monitor/lcd/tv - gnome trying to load 1500 files (it seems to be resolved now with icon-cache, owen?)
FC3 seems to be slower. I've just installed FC3 this week, and I really miss my old boot timings
Method:
Since I am the only one who uses this notebook, and it doesn't change the hardware, there are several things that doesn't need to be running, and several other things that doesn't need to be checked (quota for example)
I've also changed the order of things that are started, for example there is no need to load httpd nor sshd before bringing gdm up.
Gory details:
- I disabled: - kudzu - rhgb - netfs - sendmail
- I added "fastboot" option to the kernel. I saw that rc.sysinit use this flag to skip some things, I don't remember what exactly did at this time
- I commented out every unneeded check by rc.sysinit on my machine (like quotas)
- I prepared an readahead.early.files running strace -e trace=open on X and gdm
- the fist thing that I put in rc.sysinit was an hdparm command the get the best of my slow disk
- readahead.early.files is processed asynchronously with low priority as soon as possible (on top of rc.sysinit). It preloads X+gdm
- I commented out the line that loads prefdm in iniitab, replacing it with a chkconfig script
- I disabled "unix:" fonts in X, so xfs was disabled
- the fisrt script to run runlevel 5 is the prefdm chkconfig script, I added "ifup lo" in before loading the real prefedm script
- services like pcmcia, sshd, httpd, and others are loaded with low priority behinds the scenes, while x+gdm are starting. They ever load fine, so I don't need to see if they load or not.
- I prepared a readahead.files running strace -e trace=open over gnome- session + gnome componentes (gnome-panel, gconfd and others). There I found all the icons loaded by gnome-panel (by the theme I suppose)
- readahead is executed against readahead.files just after gdm is ready to accept a login. I did it measuring the time and adding a sleep command before running readahead (cof cof)
- while the user (me) is entering the username and password, the gnome session is preloaded with readahead.
- I prepared another readahead file running strace -e trace=open over openoffice and epiphany
- I set up gnome-session-properties to run this readahead just after loading the gnome-session (priority 90 did the work)
This approach doesn't make the system faster at all, but it seems faster to the user. The main trick is to change the order of loading the system components and preload the slowest applications.
I don't think that this is applicable to the distribution as a whole, but I know that some things than can help on this type of customization and may be applied on Fedora
1. rc.sysinit could be modularized, so the user can disable/enable some bits of it without touching the script file. An easy way could be to move some part of it to chkconfig scripts
2. prefdm in inittab could be moved to a chkconfig script, and the order of services could be changed so prefdm could be started before starting services like httpd, sshd, and others
3. figure out a way to automate the generation of readahead files to match the most used files
I know that Seth Nickell was working on 1 and 2, but I haven't heard of that again (SystemServices)
I would be glad to help on improving Fedora boot time
Franco Catrin (fcatrin@tuxpan.com) said:
- prefdm in inittab could be moved to a chkconfig script, and the order
of services could be changed so prefdm could be started before starting services like httpd, sshd, and others
This has interesting effects on the available VTs for use.
Bill
El vie, 19-11-2004 a las 18:56, Bill Nottingham escribió:
Franco Catrin (fcatrin@tuxpan.com) said:
- prefdm in inittab could be moved to a chkconfig script, and the order
of services could be changed so prefdm could be started before starting services like httpd, sshd, and others
This has interesting effects on the available VTs for use.
what do you exactly mean?
mingetty lines are still on inittab, they run fine and X still loads in VT7
-- Franco
Franco Catrin (fcatrin@tuxpan.com) said:
This has interesting effects on the available VTs for use.
what do you exactly mean?
mingetty lines are still on inittab, they run fine and X still loads in VT7
*dm takes the first available vt. If you run it before other VTs are initialized, it will take, say, vt2.
However, now that I think about it, since we open all the configured vts (for unicode initialization) in rc.sysinit, this should work.
Bill
El sáb, 20-11-2004 a las 00:14, Bill Nottingham escribió:
Franco Catrin (fcatrin@tuxpan.com) said:
This has interesting effects on the available VTs for use.
what do you exactly mean?
mingetty lines are still on inittab, they run fine and X still loads in VT7
*dm takes the first available vt. If you run it before other VTs are initialized, it will take, say, vt2.
you can force it in gdm.conf if you want
However, now that I think about it, since we open all the configured vts (for unicode initialization) in rc.sysinit, this should work.
that's true, that's true man :-)
Those times are really good. I think I'll start tinkering with my box and see what it produces. But first: How do I turn on boot logging and where is the log saved?
David
On Fri, 19 Nov 2004 18:22:50 -0300, Franco Catrin fcatrin@tuxpan.com wrote:
Sorry to hang up on this part of the discussion, I've just subscribed fedora-devel because I knew there was a discussion about the boot time on Fedora.
I was trying to improve the boot time with FC2 some weeks ago, and I got very interesting results. This is not directly related to the boot poster, it is just a trial and error work to improve FC2 boot time.
My machine:
P4-M 2.8Ghz HT 512MB RAM 4200RM hard disk drive (dell notebook, very slow compared to a desktop machine) Fedora Core 2 + updates Nvidia card with nvidia drivers
Critical times for me:
GDM: Time from GRUB until GDM gets ready to login an user GNOME: Time to load the whole gnome-session for the fist time EPIPHANY: Time to load epiphany for the fist time OPENOFFICE: Time to load any openoffice application for the fist time
Original timings:
- GDM: 75 seconds (more than 1 minute!)
- GNOME: 41 seconds
- EPIPHANY: 5 seconds
- OPENOFFICE: 22 seconds
Results:
- GDM: 38 seconds (!!)
- GNOME: 19 seconds
- EPIPHANY: 2 seconds
- OPENOFFICE: 5 seconds (!!)
Bottlenecks I couldn't resolve:
- IDE detection in the kernel
- init start time
- USB detection
- nvidia driver probing for monitor/lcd/tv
- gnome trying to load 1500 files (it seems to be resolved now with
icon-cache, owen?)
FC3 seems to be slower. I've just installed FC3 this week, and I really miss my old boot timings
Method:
Since I am the only one who uses this notebook, and it doesn't change the hardware, there are several things that doesn't need to be running, and several other things that doesn't need to be checked (quota for example)
I've also changed the order of things that are started, for example there is no need to load httpd nor sshd before bringing gdm up.
Gory details:
I disabled:
- kudzu
- rhgb
- netfs
- sendmail
I added "fastboot" option to the kernel. I saw that rc.sysinit use
this flag to skip some things, I don't remember what exactly did at this time
- I commented out every unneeded check by rc.sysinit on my machine (like
quotas)
- I prepared an readahead.early.files running strace -e trace=open on X
and gdm
- the fist thing that I put in rc.sysinit was an hdparm command the get
the best of my slow disk
- readahead.early.files is processed asynchronously with low priority as
soon as possible (on top of rc.sysinit). It preloads X+gdm
- I commented out the line that loads prefdm in iniitab, replacing it
with a chkconfig script
I disabled "unix:" fonts in X, so xfs was disabled
the fisrt script to run runlevel 5 is the prefdm chkconfig script, I
added "ifup lo" in before loading the real prefedm script
- services like pcmcia, sshd, httpd, and others are loaded with low
priority behinds the scenes, while x+gdm are starting. They ever load fine, so I don't need to see if they load or not.
- I prepared a readahead.files running strace -e trace=open over gnome-
session + gnome componentes (gnome-panel, gconfd and others). There I found all the icons loaded by gnome-panel (by the theme I suppose)
- readahead is executed against readahead.files just after gdm is ready
to accept a login. I did it measuring the time and adding a sleep command before running readahead (cof cof)
- while the user (me) is entering the username and password, the gnome
session is preloaded with readahead.
- I prepared another readahead file running strace -e trace=open over
openoffice and epiphany
- I set up gnome-session-properties to run this readahead just after
loading the gnome-session (priority 90 did the work)
This approach doesn't make the system faster at all, but it seems faster to the user. The main trick is to change the order of loading the system components and preload the slowest applications.
I don't think that this is applicable to the distribution as a whole, but I know that some things than can help on this type of customization and may be applied on Fedora
- rc.sysinit could be modularized, so the user can disable/enable some
bits of it without touching the script file. An easy way could be to move some part of it to chkconfig scripts
- prefdm in inittab could be moved to a chkconfig script, and the order
of services could be changed so prefdm could be started before starting services like httpd, sshd, and others
- figure out a way to automate the generation of readahead files to
match the most used files
I know that Seth Nickell was working on 1 and 2, but I haven't heard of that again (SystemServices)
I would be glad to help on improving Fedora boot time
Franco Catrin L. TUXPAN http://www.tuxpan.com/fcatrin
-- fedora-devel-list mailing list fedora-devel-list@redhat.com http://www.redhat.com/mailman/listinfo/fedora-devel-list
- I prepared an readahead.early.files running strace -e trace=open on X
and gdm
Could you provide some more details about how the readahead files are created and loaded? I ran strace in the terminal and it said the X server was already running so I assume I need to have it start X. Where does the command need to be executed?
David
El sáb, 20-11-2004 a las 09:25 -0800, David Corrigan escribió:
- I prepared an readahead.early.files running strace -e trace=open on X
and gdm
Could you provide some more details about how the readahead files are created and loaded?
basically:
strace -e trace=open executabletobetraced 2> executable.log
then I apply a mix of grep/cut/sort to get a list of files being opened, something like:
grep RDONLY | grep -v DIRECTORY | grep "^open" | grep -v "such file" | cut -c7- | \ cut -d""" -f1 | sort -u | grep -v "^/home/" | grep -v "^/proc/" | \ grep -v "^/tmp/" | grep -v "^/dev"
(I think it could be simpler that that)
I ran strace in the terminal and it said the X server was already running so I assume I need to have it start X. Where does the command need to be executed?
if you plan to strace X, I recommend you to go to runlevel 3 and do it from there
El dom, 21-11-2004 a las 17:14 +0100, Kyrre Ness Sjobak escribió:
- EPIPHANY: 2 seconds
- OPENOFFICE: 5 seconds (!!)
Those are very important for UI experience. I wonder if it migth be possible to add ooquickstart and some preloading of webbrowser...
I think that readahead of OOo and epiphany is enough. ooquickstart does almost the same, but it is not transparent to the user
On Tue, 2004-11-16 at 14:09, Ziga Mahkovec wrote:
On Mon, 2004-11-15 at 20:05 -0500, Owen Taylor wrote:
Wow, this is fabulous work, and fast too!
Thanks :)
What sort of libraries are you using in the Java program? Do you have any idea whether getting it to run on top of open source Java would be feasible?
I'm using the java2d and imageio packages with IBM's JDK. It doesn't work out of the box with libgcj though, so I'll have to come up to speed with the java2d/cairo development. Alternatively, I can always drop the alpha/antialias pertiness. Or switch to SVG instead and let librsvg do the work.
You tried this on the FC3 libgcj? The most current java2d/cairo work is happening on the java-gui-branch in gcc CVS and lots of improvements have gone in since we branched the FC3 libgcj. Once the code that generates these graphs is available we can make it work on open source Java.
Tom
On Tue, 2004-11-16 at 16:18 -0500, Thomas Fitzsimmons wrote:
I'm using the java2d and imageio packages with IBM's JDK. It doesn't work out of the box with libgcj though, so I'll have to come up to speed with the java2d/cairo development. Alternatively, I can always drop the alpha/antialias pertiness. Or switch to SVG instead and let librsvg do the work.
You tried this on the FC3 libgcj?
Yes.
The most current java2d/cairo work is happening on the java-gui-branch in gcc CVS and lots of improvements have gone in since we branched the FC3 libgcj. Once the code that generates these graphs is available we can make it work on open source Java.
That's what I thought. I tried using JHBuild to get the branch but unfortunately freedesktop.org is down.
Thanks,
On Tue, Nov 16, 2004 at 08:09:01PM +0100, Ziga Mahkovec wrote:
- What does GNOME login look like?
If I parse up to the point where gnome-panel is running and the system is 90% idle: http://www.klika.si/ziga/bootchart/bootchart-login.png
One thing that really sticks out in this one for me is rhn-applet-gui. That thing is just huge. I booted my machine an hour ago, and here's what top has to say about it..
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 3370 davej 26 10 198m 26m 182m S 0.0 1.3 0:01.50 rhn-applet-gui ^^^^^^^^^^^^^^ WOW. -------------------'
It's not even _doing_ anything right now. (digs out strace). Ugh.. It's stuck in a loop polling a socket.
ioctl(14, FIONREAD, [0]) = 0 poll([{fd=9, events=POLLIN}, {fd=11, events=POLLIN|POLLPRI}, {fd=14, events=POLLIN}, {fd=3, events=POLLIN}, {fd=5, events=POLLIN|POLLPRI}, {fd=6, events=POLLIN|POLLPRI}, {fd=13, events=POLLIN|POLLPRI}, {fd=15, events=POLLIN|POLLPRI}], 8, 99) = 0
This thing really needs to go to sleep more often.
Ugh, it got worse..
3370 davej 26 10 200m 28m 185m S 0.0 1.4 0:02.10 rhn-applet-gui
Dave
Hi.
Dave Jones davej@redhat.com wrote:
One thing that really sticks out in this one for me is rhn-applet-gui. That thing is just huge.
That's the flashy "update me" thingy in the notification area, right? First thing I kill after reinstalling :)
On Tue, Nov 16, 2004 at 11:28:16PM +0100, Ralf Ertzinger wrote:
One thing that really sticks out in this one for me is rhn-applet-gui. That thing is just huge.
That's the flashy "update me" thingy in the notification area, right? First thing I kill after reinstalling :)
Yes. Also known as "evil force of forceful evil" in some circles.
Dave
On Tue, Nov 16, 2004 at 05:34:18PM -0500, Dave Jones wrote:
One thing that really sticks out in this one for me is rhn-applet-gui. That thing is just huge.
That's the flashy "update me" thingy in the notification area, right? First thing I kill after reinstalling :)
Yes. Also known as "evil force of forceful evil" in some circles.
Utterly. On multihead boxes I've seen it take 30% of the total CPU time and 20% of the network bandwidth. Its eeeeevil because it should be a service daemon so it runs *ONCE* and it should chat over dbus or something to the display which -should-not-flash- - it's very bad UI design (movement out of the user focus area is distracting) and sucks resources.
If someone could have that fixed and in testing tomorrow that would be fantastic ;)
Alan
On Tue, Nov 16, 2004 at 06:29:40PM -0500, Alan Cox wrote:
Utterly. On multihead boxes I've seen it take 30% of the total CPU time and 20% of the network bandwidth. Its eeeeevil because it should be a service daemon so it runs *ONCE* and it should chat over dbus or something to the display which -should-not-flash- - it's very bad UI design (movement out of the user focus area is distracting) and sucks resources.
It also gets 'stuck' sometimes, making the user believe that everything is up to date, whilst running up2date -l, or yum will find packages that need updating. I've also seen it claim updates are available that running up2date on the command line can't find. *boggle*
The whole thing needs a bullet in its head imo.
I never thought I'd say it, but after having recently bought a mac for my wife, Apple did something right. They have something (possibly a cron job) that looks for updates at a user specified interval, and if nothing is found, it does nothing. You don't even know it checked. If it does find something, it pops up a dialog. None of this flashing red bubble nonsense. The whole time you're blissfully unaware of this going on, which is a big win memory footprint wise.
I've heard from other quarters that even Microsoft's update notifier is becoming more sensible than ours. They even have a 'download the updates in the background when things are idle' option aparently, which sounds cute. (think I'd rather be around when it applies them though).
If someone could have that fixed and in testing tomorrow that would be fantastic ;)
Wouldn't it be great ? They'd be my fedora hero-of-the-day.
Dave
On Tue, Nov 16, 2004 at 06:43:50PM -0500, Dave Jones wrote:
mac for my wife, Apple did something right. They have something (possibly a cron job) that looks for updates at a user specified interval, and if nothing is found, it does nothing. You don't even know it checked. If it does find something, it pops up a dialog. None of this flashing red bubble nonsense. The whole time you're blissfully unaware of this going on, which is a big win memory footprint wise.
That didn't go down well in some places. One of the problems with automatic updates and any network tool that is impolite is when your box does a major update over your GPRS phone at Â3.50 per megabyte, or clogs a customer wireless network when you are in a sales call.
notifier is becoming more sensible than ours. They even have a 'download the updates in the background when things are idle' option aparently, which sounds cute. (think I'd rather be around when it applies them though).
We do too. It's just not well documented. chkconfig service yum on for the runlevels you want.,,
On Tue, Nov 16, 2004 at 06:48:42PM -0500, Alan Cox wrote:
mac for my wife, Apple did something right. They have something (possibly a cron job) that looks for updates at a user specified interval, and if nothing is found, it does nothing. You don't even know it checked. If it does find something, it pops up a dialog.
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
None of this flashing red bubble nonsense. The whole time you're blissfully unaware of this going on, which is a big win memory footprint wise.
That didn't go down well in some places. One of the problems with automatic updates and any network tool that is impolite is when your box does a major update over your GPRS phone at Â3.50 per megabyte, or clogs a customer wireless network when you are in a sales call.
The update frequency can be configured. And as underlined above, you get prompted 'found updates, do you want to install these now or later?' when it does find something to do.
Dave
On Tue, Nov 16, 2004 at 06:43:50PM -0500, Dave Jones wrote:
On Tue, Nov 16, 2004 at 06:29:40PM -0500, Alan Cox wrote:
Utterly. On multihead boxes I've seen it take 30% of the total CPU time and 20% of the network bandwidth. Its eeeeevil because it should be a service daemon so it runs *ONCE* and it should chat over dbus or something to the display which -should-not-flash- - it's very bad UI design (movement out of the user focus area is distracting) and sucks resources.
It also gets 'stuck' sometimes, making the user believe that everything is up to date, whilst running up2date -l, or yum will find packages that need updating. I've also seen it claim updates are available that running up2date on the command line can't find. *boggle*
file a bug on any cases like that you see, I haven't seen one in ages.
The whole thing needs a bullet in its head imo.
I'll buy the gun, the bullets, and the beer if I get to shoot it.
If someone could have that fixed and in testing tomorrow that would be fantastic ;)
Wouldn't it be great ? They'd be my fedora hero-of-the-day.
Not likely to happen. If it were my call, the rhn-applet wouldn't be in Fedora at all (or at least, not on by default). But I don't see anyone wanting to fix it and not just replace it. Maybe for fc4...
Adrian The unfortunate sap who got handed the rhn-applet to maintain...
On Tue, 2004-11-16 at 18:43 -0500, Dave Jones wrote:
On Tue, Nov 16, 2004 at 06:29:40PM -0500, Alan Cox wrote:
Utterly. On multihead boxes I've seen it take 30% of the total CPU time and 20% of the network bandwidth. Its eeeeevil because it should be a service daemon so it runs *ONCE* and it should chat over dbus or something to the display which -should-not-flash- - it's very bad UI design (movement out of the user focus area is distracting) and sucks resources.
It also gets 'stuck' sometimes, making the user believe that everything is up to date, whilst running up2date -l, or yum will find packages that need updating. I've also seen it claim updates are available that running up2date on the command line can't find. *boggle*
The whole thing needs a bullet in its head imo.
I never thought I'd say it, but after having recently bought a mac for my wife, Apple did something right. They have something (possibly a cron job) that looks for updates at a user specified interval, and if nothing is found, it does nothing. You don't even know it checked. If it does find something, it pops up a dialog. None of this flashing red bubble nonsense. The whole time you're blissfully unaware of this going on, which is a big win memory footprint wise.
I've had some ideas for that, but keeping the memory footprint down might be a bit taxing.
thought is easy enough, though. have the nightly cron job generate an rss feed of the available updates. (yum generate-rss updates)
then have the applet just look for and read the rss file.
-sv
Once upon a time, Dave Jones davej@redhat.com said:
I've heard from other quarters that even Microsoft's update notifier is becoming more sensible than ours. They even have a 'download the updates in the background when things are idle' option aparently, which sounds cute. (think I'd rather be around when it applies them though).
On a couple of workstations I use:
45 6 * * 1-5 t=`mktemp /tmp/yum.XXXXXXXX`; yum check-update >& /dev/null; yum -C check-update >& $t || (h=`hostname | cut -d. -f1`; cat $t | mail -s "Updates available for $h" root; yum -y --download-only update >& /dev/null); rm -f $t
When there are updates, I get an email and they are automatically downloaded so they are ready to go when I am ready to load them.
On Wed, 2004-11-17 at 06:13 +0100, Ralf Corsepius wrote:
On Tue, 2004-11-16 at 22:44 -0600, Chris Adams wrote:
yum -y --download-only update >& /dev/null);
Which version of yum is this?
From what I experienced with the version of yum shipped with FC3,
yum --download-only is neither documented nor functional.
--download-only didn't get put back into 2.1.11. It may show up before long, though.
-sv
On Wed, 2004-11-17 at 00:14 -0500, seth vidal wrote:
On Wed, 2004-11-17 at 06:13 +0100, Ralf Corsepius wrote:
On Tue, 2004-11-16 at 22:44 -0600, Chris Adams wrote:
yum -y --download-only update >& /dev/null);
Which version of yum is this?
From what I experienced with the version of yum shipped with FC3,
yum --download-only is neither documented nor functional.
--download-only didn't get put back into 2.1.11. It may show up before long, though.
OK.
Nevertheless yum still accepts --download-only without complaint.
Ralf
On Tue, 2004-11-16 at 23:43, Dave Jones wrote:
On Tue, Nov 16, 2004 at 06:29:40PM -0500, Alan Cox wrote:
It also gets 'stuck' sometimes, making the user believe that everything is up to date, whilst running up2date -l, or yum will find packages that need updating. I've also seen it claim updates are available that running up2date on the command line can't find. *boggle*
I can confirm the same behaviour.
The whole thing needs a bullet in its head imo.
:/
It happily confirms that the 2.6.9-1.3 kernel installs correctly into my FC2 installation.
But no 2.6.9-1.3 option is offered in the boot menu.
After some puzzlement I tried installing the version from freshRPM's using Synaptic. This fails as well, but has the good grace to error, reporting that it is unable to generate an initrd, and reporting an error relating to a MegaRAID controller
seems I have one of these
megaraid: found 0x101e:0x1960:bus 2:slot 4:func 0 scsi0:Found MegaRAID controller at 0x2285f000, IRQ:185
and that kernel does not like it.
Indeed I have no 2.6.9-1.3 initrd, which is clearly an issue, but the silent loss of the error, and worse still the confirmation of a correct install **when the update process clearly knew there was an issue** since it did not add the 2.8.9 kernel to the boot menu is seriously broken.
HarryM
Harry Moyes wrote :
It happily confirms that the 2.6.9-1.3 kernel installs correctly into my FC2 installation.
But no 2.6.9-1.3 option is offered in the boot menu.
After some puzzlement I tried installing the version from freshRPM's using Synaptic. This fails as well, but has the good grace to error, reporting that it is unable to generate an initrd, and reporting an error relating to a MegaRAID controller
seems I have one of these
megaraid: found 0x101e:0x1960:bus 2:slot 4:func 0 scsi0:Found MegaRAID controller at 0x2285f000, IRQ:185
and that kernel does not like it.
Indeed I have no 2.6.9-1.3 initrd, which is clearly an issue, but the silent loss of the error, and worse still the confirmation of a correct install **when the update process clearly knew there was an issue** since it did not add the 2.8.9 kernel to the boot menu is seriously broken.
I have quite a few machines running FC2 that run yum nightly to keep up to date, and they did install the latest kernel, and the output reported that the initrd wouldn't be created because no megaraid module was to be found. I haven't looked into this yet but either : - The module/driver isn't included and this is a serious bug. - The module has been renamed or support for those MegaRaid cards has been moved into another module... this is a bug also, but the remedy is to add more "glue" or some other non trivial mechanism to kernel updates.
...or I'm simply missing something ;-)
Matthias
On Sat, 2004-11-20 at 22:21, Matthias Saou wrote:
Harry Moyes wrote :
I have quite a few machines running FC2 that run yum nightly to keep up to date, and they did install the latest kernel, and the output reported that the initrd wouldn't be created because no megaraid module was to be found. I haven't looked into this yet but either :
- The module/driver isn't included and this is a serious bug.
- The module has been renamed or support for those MegaRaid cards has been
moved into another module... this is a bug also, but the remedy is to add more "glue" or some other non trivial mechanism to kernel updates.
...or I'm simply missing something ;-)
I'm betting on a bug, but my bitch was not the bug in the new kernel, its the silent hiding of that error by up2date, and the misleading report of success. I removed the new kernel, and re ran up2date, and it still reported success despite the failure >:|
So consider this a vote for a serious review of up2date.
HarryM
On Sat, 2004-11-20 at 23:12 +0000, Harry Moyes wrote:
I'm betting on a bug, but my bitch was not the bug in the new kernel, its the silent hiding of that error by up2date, and the misleading report of success. I removed the new kernel, and re ran up2date, and it still reported success despite the failure >:|
So consider this a vote for a serious review of up2date.
The problem is that the initrd and the boot menu item is created when the kernel package is installed, in the post-install scripts. What happened was surely that the kernel installed fine but the post-install scripts failed. As far as I can tell there's no way to tell that the post-install scripts failed using RPM, so this really isn't up2date's fault, it's a limitation of rpm.
/Per
On Sunday 21 November 2004 09:46, Per Bjornsson wrote:
As far as I can tell there's no way to tell that the post-install scripts failed using RPM, so this really isn't up2date's fault, it's a limitation of rpm.
Any non-zero return code in scriptlets causes RPM to issue a statement about it. When uninstalling, RPMs simply don't uninstall if the postun scriptlets fail. I've not experimented with $?, but I would assume up2date could fully detect these issues as long as the scriptlets produce appropriate error returns.
On Sun, 21 Nov 2004, Jeff Pitman wrote:
On Sunday 21 November 2004 09:46, Per Bjornsson wrote:
As far as I can tell there's no way to tell that the post-install scripts failed using RPM, so this really isn't up2date's fault, it's a limitation of rpm.
Any non-zero return code in scriptlets causes RPM to issue a statement about it. When uninstalling, RPMs simply don't uninstall if the postun scriptlets fail. I've not experimented with $?, but I would assume up2date could fully detect these issues as long as the scriptlets produce appropriate error returns.
Few people know this is functionality of the shell. See the -e option.
What is a problem though, is that rpm seldom allows you to see what failed. Just like rpmbuild does not (from the return code) give you a clue what happened and your only resort is to process the output :/
-- dag wieers, dag@wieers.com, http://dag.wieers.com/ -- [Any errors in spelling, tact or fact are transmission errors]
On Sunday 21 November 2004 17:50, Dag Wieers wrote:
What is a problem though, is that rpm seldom allows you to see what failed. Just like rpmbuild does not (from the return code) give you a clue what happened and your only resort is to process the output :/
Hard to build a good framework when you can do just about anything in a scriptlet. I mean, one could put "awktris" in the scriptlet for kernels while the user waits for the hardlink step. ;) In other words, any old scriptlet can return any old return code and enforcing a rule in up2date based on a subset would be next to impossible.
Jumping off of this because it was never resolved. Is the megaraid modules included/supported anymore? FC3 doesn't use it during install, but FC2 did. Also all the newer kernels no longer have it.
On Sat, Nov 20, 2004 at 09:47:33PM +0000, Harry Moyes wrote:
After some puzzlement I tried installing the version from freshRPM's using Synaptic. This fails as well, but has the good grace to error, reporting that it is unable to generate an initrd, and reporting an error relating to a MegaRAID controller
seems I have one of these
megaraid: found 0x101e:0x1960:bus 2:slot 4:func 0 scsi0:Found MegaRAID controller at 0x2285f000, IRQ:185
and that kernel does not like it.
Indeed I have no 2.6.9-1.3 initrd, which is clearly an issue, but the silent loss of the error, and worse still the confirmation of a correct install **when the update process clearly knew there was an issue** since it did not add the 2.8.9 kernel to the boot menu is seriously broken.
This is fixed in the -1.6 RPM in updates-testing. (Which is due to move to updates-proper)
Dave
Dave Jones wrote:
On Tue, Nov 16, 2004 at 11:28:16PM +0100, Ralf Ertzinger wrote:
One thing that really sticks out in this one for me is rhn-applet-gui. That thing is just huge.
That's the flashy "update me" thingy in the notification area, right? First thing I kill after reinstalling :)
Yes. Also known as "evil force of forceful evil" in some circles.
tee hee.
Yes it has interesting performance characteristics for a box serving multiple vnc sessions.
Pádraig.
On Tue, Nov 16, 2004 at 08:09:01PM +0100, Ziga Mahkovec wrote:
- Why is rhgb eating so much CPU? if you run 'rhgb -i' it displays basically 0 CPU to display the animation. That looks like a pretty obvious bug we completely missed.
You seem to have tracked this one down, but here's the output without rhgb for comparison: http://www.klika.si/ziga/bootchart/bootchart-norhgb.png (boot time went from 1:27 to 0:51)
If you have time to regeerate a graphic with the fixed rhgb-0.15.1 I would really appreciate the comparison with the broken version and this may give a more interesting data for others to work on. I have put FC3 rebuilt rpms at ftp://rpmfind.net/pub/veillard/
thanks a lot !
Daniel
On Wed, 2004-11-17 at 04:39 -0500, Daniel Veillard wrote:
If you have time to regeerate a graphic with the fixed rhgb-0.15.1 I would really appreciate the comparison with the broken version and this may give a more interesting data for others to work on. I have put FC3 rebuilt rpms at ftp://rpmfind.net/pub/veillard/
Looks much better: http://www.klika.si/ziga/bootchart/bootchart-rhgbfix.png (see my previous post for other changes)
Ziga Mahkovec wrote :
On Wed, 2004-11-17 at 04:39 -0500, Daniel Veillard wrote:
If you have time to regeerate a graphic with the fixed rhgb-0.15.1 I would really appreciate the comparison with the broken version and this may give a more interesting data for others to work on. I have put FC3 rebuilt rpms at ftp://rpmfind.net/pub/veillard/
Looks much better: http://www.klika.si/ziga/bootchart/bootchart-rhgbfix.png (see my previous post for other changes)
Wow, nearly divided by two! This is only with a fixed rhgb, a Rawhide initscripts and the floppy module "off", right? Seems like an rhgb errata might be worth it too.
Matthias
On Wed, Nov 17, 2004 at 01:15:51PM +0100, Matthias Saou wrote:
This is only with a fixed rhgb, a Rawhide initscripts and the floppy module "off", right? Seems like an rhgb errata might be worth it too.
yes, I will do this today...
Daniel
On Wed, 2004-11-17 at 13:15 +0100, Matthias Saou wrote:
Looks much better: http://www.klika.si/ziga/bootchart/bootchart-rhgbfix.png (see my previous post for other changes)
Wow, nearly divided by two! This is only with a fixed rhgb, a Rawhide initscripts and the floppy module "off", right?
That and asynchronous logging (see my reply to Bill about the changes in syslogd behavior). So add about 4 seconds if you want to play it safe.
man, 15.11.2004 kl. 23.24 skrev Ziga Mahkovec:
On Sat, 2004-11-13 at 12:18 -0500, Owen Taylor wrote:
It should be possible to start with a limited set of easily collected data and already get a useful picture. Useful data collection could be as simple as taking a snapshot of the data that the "top" program displays a few times a second during boot. That already gives you a list of the running processes, their states, and some statistics about global system load.
So I gave this a try:
- I modified the boot procedure so that early in rc.sysinit, a tmpfs is
mounted and top is run in batch mode (to output every 0.2 seconds). The logged output is later parsed only up to the point where gdmgreeter is running and the system is relatively idle (i.e. boot complete and ready for login).
- A Java program parses the log file, builds the process tree and
finally renders a PNG chart. Processes are sorted by PID and traversed depth first.
This still needs more work but here's a sneak preview: http://www.klika.si/ziga/bootchart/bootchart.png
(as a result of http://www.klika.si/ziga/bootchart/bootop.log.gz )
Some processes were filtered out for clarity -- mostly sleepy kernel processes and the ones that only live for the duration of a single top sample. This skews the chart a bit but is definitely more comprehensible (compare with http://www.klika.si/ziga/bootchart/bootchart-complete.png ).
Some things I plan on adding:
- start logging earlier in the boot process (possibly in initrd),
- add additional layers (e.g. make use of the kernel patch Arjan
suggested for showing the number of open files),
- improve process tree representation and add dependency lines,
- render SVG instead, for scalability and interactivity.
This definitely helped me with my boot times -- the 4-second load gap at the start I found to be "modprobe floppy", apparently timing out on my floppyless laptop :)
Ah! that's why the floppy ligth flashes during "kudzu" (at the end of it) :)
Any ideas or comments are welcome,
Ziga
On Tuesday 16 November 2004 21:29, Kyrre Ness Sjobak wrote:
This definitely helped me with my boot times -- the 4-second load gap at the start I found to be "modprobe floppy", apparently timing out on my floppyless laptop :)
Ah! that's why the floppy ligth flashes during "kudzu" (at the end of it) :)
The question is, who needs a floppy before the system is fully up?
(The second question would be, who needs a floppy at all ;)
Le mardi 16 novembre 2004 à 22:56 +0100, Ronny Buchmann a écrit :
On Tuesday 16 November 2004 21:29, Kyrre Ness Sjobak wrote:
This definitely helped me with my boot times -- the 4-second load gap at the start I found to be "modprobe floppy", apparently timing out on my floppyless laptop :)
Ah! that's why the floppy ligth flashes during "kudzu" (at the end of it) :)
The question is, who needs a floppy before the system is fully up?
(The second question would be, who needs a floppy at all ;)
Handy for kickstart & driver files. Or is anaconda able to read USB keys nowadays ?
Ronny Buchmann (ronny-vlug@vlugnet.org) said:
On Tuesday 16 November 2004 21:29, Kyrre Ness Sjobak wrote:
This definitely helped me with my boot times -- the 4-second load gap at the start I found to be "modprobe floppy", apparently timing out on my floppyless laptop :)
Ah! that's why the floppy ligth flashes during "kudzu" (at the end of it) :)
The question is, who needs a floppy before the system is fully up?
It's for the benefit of udev... for the device nodes to be available you have to load it, and it can't be sanely hotplugged. (Heck, you can't even tell whether or not it's there without loading the module.)
Bill