Hey,
So, I've looked a bit more into the booting process and how to optimize it. Mostly based on the discussion triggered by Owen's boot poster challenge, here
http://www.redhat.com/archives/fedora-devel-list/2004-November/msg00447.html
and also some experiments that I did - basically replacing rhgb with gdm as described here
http://www.redhat.com/archives/fedora-desktop-list/2004-November/msg00066.ht...
What I've done is a bit crude - I've replaced init(1) with a shell script based on /etc/rc.d/rc.sysinit and tried to optimize specifically for my system: IBM Thinkpad T41 laptop with a Pentium M processor at 1600MHz along with 512MB of RAM.
The results are pretty good I think, here is the general time line made with a wallclock
00: exit grub; start booting the kernel 04: kernel prints audit() 11: initrd is mounted; Red Hat nash visible mount / ro (normal initrd procedure) 13: start bootchart logging; start readahead of approx 193MB files sleep until readahead is complete 24: readahead done; now create /dev and modprobe (in background) mount / rw, enable swap start xfs startx as user davidz in background start messagebus start hald start acpid start NetworkManager 32: X claims the display 34: GNOME desktop banner 40: GNOME desktop is usable (Nautilus desktop, panel fully populated)
Here is a bootchart made with the bootchart software from Ziga Mahkovec:
http://people.redhat.com/davidz/bootchart.png
You may notice that I also start firefox after login and it starts very very fast - that's because readahead loads all files used by Firefox - in earlier experiments I've also added files from OpenOffice.org to readahead and that meant I could start up OpenOffice.org Writer in about three seconds. More below.
I've made the following observations
1. The kernel patch, linux-2.6.3-printopen.patch, wasn't really working well for me - it reported far to few files - instead I added a printk() to fs/namei.c:link_path_walk() (disclaimer: I don't know much about the kernel so there may be a better solution than this).
2. The data captured from link_path_walk() was massaged into a list of unique files to preload and sorted on sectors.
3. While capturing the data link_path_walk() and before processing I went through all the menus in the GNOME desktop (to make sure their icon and desktop files would be added to the list) as well as loading Firefox. The list contains 5189 unique files - 231 of these from my home directory - 103 of these from gconf in my home directory and 302 from gconf in /etc. 2267 were .png files and 814 of them were .desktop files. 488 files had ".so" in their name. There was a total of 193MB of files (which says something about the footprint of the GNOME desktop on Fedora :-/)
4. Doing the readahead really helped the time from startx till a usable desktop - less than 10 seconds!
5. Doing readahead on the 5189 files took about 45 seconds on my system, mostly because the files were scattered around the disk. Since I had a spare partition 17GB partition, I did this: a. format spare partition as ext3 b. copy all readahead files to spare partition (193MB) c. copy rest of files from main partition to spare partition (about 9GB) Now the readahead is down to 11 seconds which averages out to be 18MB/s. On the other hand, I can still see (using fileblock) that the files in the readahead is still scattered out and hdparm says I should be able to get 33.87 MB/sec with no seeks.
6. I made a hack to cache /dev (a dev.tar file) and the list of modules to load. This could be used in production if the kernel could give us basically a hash value for the kobject hierarchy representing the hardware (perhaps even a 'tree /sys |md5sum' would suffice). This shaved some seconds of as well.
7. A number of things was started in parallel - I found that doing readahead while running modprobe wasn't helping anything; in fact it contributed negatively to performance (a bit to my surprise, I guess because the kernel was busy).
8. readahead on the right files is A Good Thing(tm). Booting my system without readahead on the partition with the readahead files scattered took 58 seconds (compared to 39 with readahead on the optimized partition)
http://people.redhat.com/davidz/bootchart-without-readahead-scattered.png
and without readahead on on the optimized partition it took 43 seconds
http://people.redhat.com/davidz/bootchart-without-readahead-nonscattered.png
again compared to 39 seconds. As an added bonus, the readahead makes sure that e.g Firefox loads fast; all .png and .desktop files are in place for when using the menus. As mentioned, one could put very big apps like e.g. OO.o in the readahead set.
So, I think these numbers are good and there's still some room for improvement; e.g. it takes ten seconds from grub to when the initrd is mounted - surely the kernel can boot faster? It's after all 25% of the time spent from grub until I have usable desktop.
The bad thing is that this approach is highly specific to my system (and thus why I'm not posting an RPM with it :-), however I think it clearly shows where improvements should be made; here are some random thoughts
a. We should keep track of files being loaded and maintain the readahead fileset as appropriate. printk() doesn't seem like the right solution; perhaps a system daemon using inotify or the kernel events layer is the road ahead? This would enable us to readahead the KDE stuff if the user is e.g. using KDE a lot.
b. ext3 should support operations for moving blocks around; e.g. optimize around the readahead fileset - when idle the system should rearrange the files to facilitate faster booting
c. the start_udev and kmodule process could be cached as I did above
d. The whole init(1) procedure seems dated; perhaps something more modern built on top of D-BUS is the right choice - SystemServices by Seth Nickell comes to mind [1]. Ideally services to be started would have dependencies such as 1) don't start the gdm service before /usr/bin/gdm is available; 2) the SSH service would only be active when NetworkManager says there is a network connection; /usr from LABEL=/usr would only be mounted when there is a volume with that label and so forth. Also, such a system would of course have support for LSB init scripts. (This is probably a whole project on it's own so I'm omitting detailed thinking on it for now)
Thanks a lot to Ziga Mahkovec for the bootchart software - it's been very useful.
Have fun, David
[1] : http://www.osnews.com/story.php?news_id=4711 http://www.gnome.org/~seth/blog/2003/Sep/27
On Sun, 2004-11-28 at 20:49 -0500, David Zeuthen wrote:
Awesome! So... is no-one else going to reply to this email?
Various comments inline below...
Hey,
So, I've looked a bit more into the booting process and how to optimize it. Mostly based on the discussion triggered by Owen's boot poster challenge, here
http://www.redhat.com/archives/fedora-devel-list/2004-November/msg00447.html
and also some experiments that I did - basically replacing rhgb with gdm as described here
http://www.redhat.com/archives/fedora-desktop-list/2004-November/msg00066.ht...
What I've done is a bit crude - I've replaced init(1) with a shell script based on /etc/rc.d/rc.sysinit and tried to optimize specifically for my system: IBM Thinkpad T41 laptop with a Pentium M processor at 1600MHz along with 512MB of RAM.
If I understand, you're also optimising for a specific user of that system. Is it worth splitting the readahead into a system-wide list of files (enough to get to the login screen), followed by a per-user list for logging in as a user? To what extent will the files needed to get to a usable desktop vary between Alice and Bob?
The results are pretty good I think, here is the general time line made with a wallclock
00: exit grub; start booting the kernel 04: kernel prints audit() 11: initrd is mounted; Red Hat nash visible mount / ro (normal initrd procedure) 13: start bootchart logging; start readahead of approx 193MB files sleep until readahead is complete 24: readahead done; now create /dev and modprobe (in background) mount / rw, enable swap start xfs startx as user davidz in background start messagebus start hald start acpid start NetworkManager 32: X claims the display 34: GNOME desktop banner 40: GNOME desktop is usable (Nautilus desktop, panel fully populated)
Here is a bootchart made with the bootchart software from Ziga Mahkovec:
http://people.redhat.com/davidz/bootchart.png You may notice that I also start firefox after login and it starts very very fast - that's because readahead loads all files used by Firefox - in earlier experiments I've also added files from OpenOffice.org to readahead and that meant I could start up OpenOffice.org Writer in about three seconds. More below.
I've made the following observations
The kernel patch, linux-2.6.3-printopen.patch, wasn't really working well for me - it reported far to few files - instead I added a printk() to fs/namei.c:link_path_walk() (disclaimer: I don't know much about the kernel so there may be a better solution than this).
The data captured from link_path_walk() was massaged into a list of unique files to preload and sorted on sectors.
While capturing the data link_path_walk() and before processing I went through all the menus in the GNOME desktop (to make sure their icon and desktop files would be added to the list) as well as loading Firefox. The list contains 5189 unique files - 231 of these from my home directory - 103 of these from gconf in my home directory and 302 from gconf in /etc. 2267 were .png files and 814 of them were .desktop files. 488 files had ".so" in their name. There was a total of 193MB of files (which says something about the footprint of the GNOME desktop on Fedora :-/)
Doing the readahead really helped the time from startx till a usable desktop - less than 10 seconds!
Doing readahead on the 5189 files took about 45 seconds on my system, mostly because the files were scattered around the disk. Since I had a spare partition 17GB partition, I did this: a. format spare partition as ext3 b. copy all readahead files to spare partition (193MB) c. copy rest of files from main partition to spare partition (about 9GB) Now the readahead is down to 11 seconds which averages out to be 18MB/s. On the other hand, I can still see (using fileblock) that the files in the readahead is still scattered out and hdparm says I should be able to get 33.87 MB/sec with no seeks.
I made a hack to cache /dev (a dev.tar file) and the list of modules to load. This could be used in production if the kernel could give us basically a hash value for the kobject hierarchy representing the hardware (perhaps even a 'tree /sys |md5sum' would suffice). This shaved some seconds of as well.
A number of things was started in parallel - I found that doing readahead while running modprobe wasn't helping anything; in fact it contributed negatively to performance (a bit to my surprise, I guess because the kernel was busy).
readahead on the right files is A Good Thing(tm). Booting my system without readahead on the partition with the readahead files scattered took 58 seconds (compared to 39 with readahead on the optimized partition)
http://people.redhat.com/davidz/bootchart-without-readahead-scattered.png
and without readahead on on the optimized partition it took 43 seconds
http://people.redhat.com/davidz/bootchart-without-readahead-nonscattered.png
again compared to 39 seconds. As an added bonus, the readahead makes sure that e.g Firefox loads fast; all .png and .desktop files are in place for when using the menus. As mentioned, one could put very big apps like e.g. OO.o in the readahead set.
So, I think these numbers are good and there's still some room for improvement; e.g. it takes ten seconds from grub to when the initrd is mounted - surely the kernel can boot faster? It's after all 25% of the time spent from grub until I have usable desktop.
The bad thing is that this approach is highly specific to my system (and thus why I'm not posting an RPM with it :-), however I think it clearly shows where improvements should be made; here are some random thoughts
a. We should keep track of files being loaded and maintain the readahead fileset as appropriate. printk() doesn't seem like the right solution; perhaps a system daemon using inotify or the kernel events layer is the road ahead? This would enable us to readahead the KDE stuff if the user is e.g. using KDE a lot.
b. ext3 should support operations for moving blocks around; e.g. optimize around the readahead fileset - when idle the system should rearrange the files to facilitate faster booting
c. the start_udev and kmodule process could be cached as I did above
d. The whole init(1) procedure seems dated; perhaps something more modern built on top of D-BUS is the right choice - SystemServices by Seth Nickell comes to mind [1]. Ideally services to be started would have dependencies such as 1) don't start the gdm service before /usr/bin/gdm is available; 2) the SSH service would only be active when NetworkManager says there is a network connection; /usr from LABEL=/usr would only be mounted when there is a volume with that label and so forth. Also, such a system would of course have support for LSB init scripts. (This is probably a whole project on it's own so I'm omitting detailed thinking on it for now)
Hopefully this could also allow us to make system-config-services look a lot slicker. I've never liked the way it has random text spewage for each service's status - some kind of widgetry would satisfy my eye-candy cravings.
Thanks a lot to Ziga Mahkovec for the bootchart software - it's been very useful.
Have fun, David
[1] : http://www.osnews.com/story.php?news_id=4711 http://www.gnome.org/~seth/blog/2003/Sep/27
On Tue, 2004-11-30 at 15:20 -0500, David Malcolm wrote:
On Sun, 2004-11-28 at 20:49 -0500, David Zeuthen wrote:
Awesome! So... is no-one else going to reply to this email?
Various comments inline below...
Thanks!
Hey,
So, I've looked a bit more into the booting process and how to optimize it. Mostly based on the discussion triggered by Owen's boot poster challenge, here
http://www.redhat.com/archives/fedora-devel-list/2004-November/msg00447.html
and also some experiments that I did - basically replacing rhgb with gdm as described here
http://www.redhat.com/archives/fedora-desktop-list/2004-November/msg00066.ht...
What I've done is a bit crude - I've replaced init(1) with a shell script based on /etc/rc.d/rc.sysinit and tried to optimize specifically for my system: IBM Thinkpad T41 laptop with a Pentium M processor at 1600MHz along with 512MB of RAM.
If I understand, you're also optimising for a specific user of that system.
Yeah, out of 5189 files (193MB), 231 of them (31MB) are from my home directory. Due to the way I've instrumented the kernel (printk() in fs/namei.c function link_path_walk()) this also includes files that are only stat(2)'ed. Which means that e.g.
/home/davidz/Desktop/Stuff/kernel-2.6.9-1.751_EL.i686.rpm
of size approx 10MB are loaded. So 193MB is a bit high, however, since files are being stat(2)'ed anyway there will be a disk seek to the inode so to play it safe we need that block as well. Thus, readahead would be better to do on the block layer level; also because shared objects like e.g. libgtk-x11-2.0.so contains a lot of deprecated API's nobody uses anyway (e.g. don't load those pages).
Is it worth splitting the readahead into a system-wide list of files (enough to get to the login screen), followed by a per-user list for logging in as a user?
Could be done, yeah.
To what extent will the files needed to get to a usable desktop vary between Alice and Bob?
I dunno; if Bob uses, say, 50MB worth of .pdf documents every time he logs in we should preload those (that isn't at all out of the ordinary if Bob uses a lot of reference manuals).
The list of files (blocks) should be generated from several criteria including how often they are read. Ideally we always start a daemon in early boot to monitor this and rearrange blocks on the disk when idle or perhaps every week.
d. The whole init(1) procedure seems dated; perhaps something more modern built on top of D-BUS is the right choice - SystemServices by Seth Nickell comes to mind [1]. Ideally services to be started would have dependencies such as 1) don't start the gdm service before /usr/bin/gdm is available; 2) the SSH service would only be active when NetworkManager says there is a network connection; /usr from LABEL=/usr would only be mounted when there is a volume with that label and so forth. Also, such a system would of course have support for LSB init scripts. (This is probably a whole project on it's own so I'm omitting detailed thinking on it for now)
Hopefully this could also allow us to make system-config-services look a lot slicker. I've never liked the way it has random text spewage for each service's status - some kind of widgetry would satisfy my eye-candy cravings.
Heh, and sweet icons too :-)
Cheers, David
On Sun, 2004-11-28 at 20:49 -0500, David Zeuthen wrote:
So, I've looked a bit more into the booting process and how to optimize it.
Great work!
The results are pretty good I think, here is the general time line made with a wallclock
00: exit grub; start booting the kernel 04: kernel prints audit() 11: initrd is mounted; Red Hat nash visible mount / ro (normal initrd procedure) 13: start bootchart logging; start readahead of approx 193MB files sleep until readahead is complete 24: readahead done; now create /dev and modprobe (in background) mount / rw, enable swap start xfs startx as user davidz in background start messagebus start hald start acpid start NetworkManager
Do you have an idea of how much kudzu, cups and syslogd would add to these times? rhgb too probably, or would it make sense to dump it completely?
- A number of things was started in parallel - I found that doing readahead while running modprobe wasn't helping anything; in fact it contributed negatively to performance (a bit to my surprise, I guess because the kernel was busy).
You think it might make sense to try running readahead in background, but after the modules are loaded? Especially if the readahead list could somehow coincide with the order of services started, to further reduce seeking. Or is readahead best left running alone?
So, I think these numbers are good and there's still some room for improvement; e.g. it takes ten seconds from grub to when the initrd is mounted - surely the kernel can boot faster? It's after all 25% of the time spent from grub until I have usable desktop.
I did some experiments with bootchart logging in the initrd phase. Packed the initrd image with bash, ps and a bunch of libraries and started logging early in the nash script... only to find out that the whole phase flies by in less than a second :)
I would like to visualize the kernel boot though. But I'd need pointers on what kind of data to collect, and how.
Thanks a lot to Ziga Mahkovec for the bootchart software - it's been very useful.
BTW, I've had loads of fun with SVG lately, so you might want to try regenerating these charts. Makes them scalable and about 15x smaller in file size. Have a look at the SVG samples (rsvg does a pretty good job): http://www.klika.si/ziga/bootchart/#Samples
Hey,
sorry for the lag,
On Wed, 2004-12-01 at 03:59 +0100, Ziga Mahkovec wrote:
On Sun, 2004-11-28 at 20:49 -0500, David Zeuthen wrote:
So, I've looked a bit more into the booting process and how to optimize it.
Great work!
Thanks.
The results are pretty good I think, here is the general time line made with a wallclock
00: exit grub; start booting the kernel 04: kernel prints audit() 11: initrd is mounted; Red Hat nash visible mount / ro (normal initrd procedure) 13: start bootchart logging; start readahead of approx 193MB files sleep until readahead is complete 24: readahead done; now create /dev and modprobe (in background) mount / rw, enable swap start xfs startx as user davidz in background start messagebus start hald start acpid start NetworkManager
Do you have an idea of how much kudzu, cups and syslogd would add to these times? rhgb too probably, or would it make sense to dump it completely?
I think that cups and syslogd are mostly harmless - for capturing the readahead files from my modified kernel I had syslogd log dump to /tmp which I mounted as tmpfs. syslogd should probably start in before gdm, but cupsd can certainly be started later (ideally it should be started on demand).
kudzu is a bit more difficult as it brings up dialogs - I think Bill agrees (see the thread on fedora-desktop-list that I linked to in my original mail) that hardware configuration should be handled in the desktop GUI anyway.
- A number of things was started in parallel - I found that doing readahead while running modprobe wasn't helping anything; in fact it contributed negatively to performance (a bit to my surprise, I guess because the kernel was busy).
You think it might make sense to try running readahead in background, but after the modules are loaded? Especially if the readahead list could somehow coincide with the order of services started, to further reduce seeking. Or is readahead best left running alone?
Not sure; the big boost really comes from reordering the files on the filesystem - running readahead (which takes 11 seconds) only gives me a usable desktop four seconds earlier. I'm pretty sure no other process should do any disk IO when readahead is running as that will almost certainly incur seek penalties.
So, I think these numbers are good and there's still some room for improvement; e.g. it takes ten seconds from grub to when the initrd is mounted - surely the kernel can boot faster? It's after all 25% of the time spent from grub until I have usable desktop.
I did some experiments with bootchart logging in the initrd phase. Packed the initrd image with bash, ps and a bunch of libraries and started logging early in the nash script... only to find out that the whole phase flies by in less than a second :)
Yeah, I found that too.
I would like to visualize the kernel boot though. But I'd need pointers on what kind of data to collect, and how.
I think some embedded systems (think DVD players) use patches to boot the kernel faster - I wonder what the status of adding them to mainline are?
Thanks a lot to Ziga Mahkovec for the bootchart software - it's been very useful.
BTW, I've had loads of fun with SVG lately, so you might want to try regenerating these charts. Makes them scalable and about 15x smaller in file size. Have a look at the SVG samples (rsvg does a pretty good job): http://www.klika.si/ziga/bootchart/#Samples
Awesome.
Cheers, David
On Thu, Dec 02, 2004 at 11:48:56AM -0500, David Zeuthen wrote:
Not sure; the big boost really comes from reordering the files on the filesystem - running readahead (which takes 11 seconds) only gives me a
I'm looking into this. a mix of creating a new directory, copying the file there, then unlinking, linking and unlinking the new copies to the old name is more likely to help. The problem is that it's also bad for the current session where that change is done since you will end up with multiple in-memory copies of all system libraries... It is gonna be a bit tricky to set-up, either as a post-install step or very early on during boot. It's very easy to break too ...
Daniel
On Thu, 2004-12-02 at 11:58 -0500, Daniel Veillard wrote:
On Thu, Dec 02, 2004 at 11:48:56AM -0500, David Zeuthen wrote:
Not sure; the big boost really comes from reordering the files on the filesystem - running readahead (which takes 11 seconds) only gives me a
I'm looking into this. a mix of creating a new directory, copying the file there, then unlinking, linking and unlinking the new copies to the old name is more likely to help. The problem is that it's also bad for the current session where that change is done since you will end up with multiple in-memory copies of all system libraries... It is gonna be a bit tricky to set-up, either as a post-install step or very early on during boot. It's very easy to break too ...
To me it makes more sense to look at this on the block level than on the file level; remember, a lot the files in my readahead set stems from stat'ing files - for example, there are more than 2200 distinct .png files in my readahead set. We probably only need one sector from most of those (the inode) rather than the entire file.
I'm almost positive it requires kernel changes to do this the right way; one naive idea is to have a userspace daemon, capturing what blocks are read when (kernel tells this daemon using the kernel events layer). This would run in the first three minutes on each and every boot. When the system is idle (and only when running on AC power!) another daemon rearranges blocks on the disk. What blocks to rearrange could be the result of a computation involving several three-minute result sets.
Cheers, David
On Thu, 2004-12-02 at 12:40 -0500, David Zeuthen wrote:
I'm almost positive it requires kernel changes to do this the right way; one naive idea is to have a userspace daemon, capturing what blocks are read when (kernel tells this daemon using the kernel events layer). This would run in the first three minutes on each and every boot. When the system is idle (and only when running on AC power!) another daemon rearranges blocks on the disk. What blocks to rearrange could be the result of a computation involving several three-minute result sets.
Rearranging sounds complex and dangerous, since it requires deep integration with the filesystem. The online resizing took quite a long time to appear and that is conceptually much simpler. Why not do it on the block device layer (without knowledge of the filesystem) and just copy those blocks to a reserved area of the block device? Disks are big, duplicating say 100MB for this purpose wouldn't be bad. We would need to ensure that mkfs.ext3 would leave enough space for this though (and probably whatever does the copying would have to make sure that the filesystem wasn't in the way; perhaps an ext flag).
On Thu, 2004-12-02 at 12:59 -0500, Colin Walters wrote:
On Thu, 2004-12-02 at 12:40 -0500, David Zeuthen wrote:
I'm almost positive it requires kernel changes to do this the right way; one naive idea is to have a userspace daemon, capturing what blocks are read when (kernel tells this daemon using the kernel events layer). This would run in the first three minutes on each and every boot. When the system is idle (and only when running on AC power!) another daemon rearranges blocks on the disk. What blocks to rearrange could be the result of a computation involving several three-minute result sets.
Rearranging sounds complex and dangerous, since it requires deep integration with the filesystem. The online resizing took quite a long time to appear and that is conceptually much simpler. Why not do it on the block device layer (without knowledge of the filesystem) and just copy those blocks to a reserved area of the block device? Disks are big, duplicating say 100MB for this purpose wouldn't be bad.
To flesh this out a bit more, you would also write out the mapping from cache block -> original block to the cache, and make this a device- mapper target.
Colin Walters wrote:
I'm almost positive it requires kernel changes to do this the right way; one naive idea is to have a userspace daemon, capturing what blocks are read when (kernel tells this daemon using the kernel events layer). This would run in the first three minutes on each and every boot. When the system is idle (and only when running on AC power!) another daemon rearranges blocks on the disk. What blocks to rearrange could be the result of a computation involving several three-minute result sets.
Rearranging sounds complex and dangerous, since it requires deep integration with the filesystem. The online resizing took quite a long time to appear and that is conceptually much simpler. Why not do it on the block device layer (without knowledge of the filesystem) and just copy those blocks to a reserved area of the block device? Disks are big, duplicating say 100MB for this purpose wouldn't be bad.
Doing the rearranging at the block level has several advantages:
- We don't need to have this thread again (and don't need to apply another hack) when people realizes that OpenOffice *also* needs disk rearranging to start faster.
- It is a general speedup across the system
- If the disk blocks are generally arranged in such a way that blocks accessed together are close together, then readahead in the kernel becomes a matter of just reading further ahead *on the disk* instead of as now reading further ahead in the file.
- when a set of blocks are read in, the VM system knows that those blocks are likely to be needed soon, so it can consider it a bad idea to throw those pages away.
Søren
On Thu, 2004-12-02 at 12:59 -0500, Colin Walters wrote:
On Thu, 2004-12-02 at 12:40 -0500, David Zeuthen wrote:
I'm almost positive it requires kernel changes to do this the right way; one naive idea is to have a userspace daemon, capturing what blocks are read when (kernel tells this daemon using the kernel events layer). This would run in the first three minutes on each and every boot. When the system is idle (and only when running on AC power!) another daemon rearranges blocks on the disk. What blocks to rearrange could be the result of a computation involving several three-minute result sets.
Rearranging sounds complex and dangerous, since it requires deep integration with the filesystem. The online resizing took quite a long time to appear and that is conceptually much simpler. Why not do it on the block device layer (without knowledge of the filesystem) and just copy those blocks to a reserved area of the block device? Disks are big, duplicating say 100MB for this purpose wouldn't be bad. We would need to ensure that mkfs.ext3 would leave enough space for this though (and probably whatever does the copying would have to make sure that the filesystem wasn't in the way; perhaps an ext flag).
This sounds much simpler and better; for good measure, ensure that the reserved area is in the beginning of the partition as reads are generally faster on lower sectors. This article seems to suggest
http://www.kernelthread.com/mac/apme/optimizations/
that Mac OS X is doing something that like although it appears to be on a file system level rather than a block level. The concept of "hot files" is interesting.
Cheers, David
El jue, 02-12-2004 a las 14:06 -0500, David Zeuthen escribió:
[...]
This sounds much simpler and better; for good measure, ensure that the reserved area is in the beginning of the partition as reads are generally faster on lower sectors. This article seems to suggest
http://www.kernelthread.com/mac/apme/optimizations/
that Mac OS X is doing something that like although it appears to be on a file system level rather than a block level. The concept of "hot files" is interesting.
I was looking for this document for a long time, thanks.
BootCache sounds to me like a dynamic readahead, opposite to the current static readahead in Fedora. Done this way would rock for those users that doesn't use the predefined readahead.files fileset
On Thu, 2004-12-02 at 16:33 -0300, Franco Catrin wrote:
I was looking for this document for a long time, thanks.
BootCache sounds to me like a dynamic readahead, opposite to the current static readahead in Fedora. Done this way would rock for those users that doesn't use the predefined readahead.files fileset
Other interesting reads on the subject:
http://kerneltrap.org/node/view/2157 which also points to: http://msdn.microsoft.com/msdnmag/issues/01/12/XPKernel/default.aspx (search prefetch)
tor, 02.12.2004 kl. 18.59 skrev Colin Walters:
On Thu, 2004-12-02 at 12:40 -0500, David Zeuthen wrote:
I'm almost positive it requires kernel changes to do this the right way; one naive idea is to have a userspace daemon, capturing what blocks are read when (kernel tells this daemon using the kernel events layer). This would run in the first three minutes on each and every boot. When the system is idle (and only when running on AC power!) another daemon rearranges blocks on the disk. What blocks to rearrange could be the result of a computation involving several three-minute result sets.
Rearranging sounds complex and dangerous, since it requires deep integration with the filesystem. The online resizing took quite a long time to appear and that is conceptually much simpler. Why not do it on the block device layer (without knowledge of the filesystem) and just copy those blocks to a reserved area of the block device? Disks are big, duplicating say 100MB for this purpose wouldn't be bad. We would need to ensure that mkfs.ext3 would leave enough space for this though (and probably whatever does the copying would have to make sure that the filesystem wasn't in the way; perhaps an ext flag).
Hmm... /boot is usually quite "overbig", and must be situated within a booting (ie. not at the end) part of the disk...
lør, 04.12.2004 kl. 17.51 skrev Bill Nottingham:
Kyrre Ness Sjobak (kyrre@solution-forge.net) said:
Hmm... /boot is usually quite "overbig", and must be situated within a booting (ie. not at the end) part of the disk...
Sure, just have mkinitrd have the *entire* filesystem image on the initrd... :)
Great idea! Now you only need one rpm installed - the kernel one :) Just imagine how easy that would make upgrading the system...
On Thu, Dec 02, 2004 at 12:40:06PM -0500, David Zeuthen wrote:
On Thu, 2004-12-02 at 11:58 -0500, Daniel Veillard wrote:
On Thu, Dec 02, 2004 at 11:48:56AM -0500, David Zeuthen wrote:
Not sure; the big boost really comes from reordering the files on the filesystem - running readahead (which takes 11 seconds) only gives me a
I'm looking into this. a mix of creating a new directory, copying the file there, then unlinking, linking and unlinking the new copies to the old name is more likely to help. The problem is that it's also bad for the current session where that change is done since you will end up with multiple in-memory copies of all system libraries... It is gonna be a bit tricky to set-up, either as a post-install step or very early on during boot. It's very easy to break too ...
To me it makes more sense to look at this on the block level than on the file level; remember, a lot the files in my readahead set stems from
block level is quite harder, fs specific, and I'm afraid impossible to do in a generic way, you would need specific kernel APIs for all filesystems supported.
stat'ing files - for example, there are more than 2200 distinct .png files in my readahead set. We probably only need one sector from most of those (the inode) rather than the entire file.
I'm almost positive it requires kernel changes to do this the right way; one naive idea is to have a userspace daemon, capturing what blocks are read when (kernel tells this daemon using the kernel events layer). This would run in the first three minutes on each and every boot. When the system is idle (and only when running on AC power!) another daemon rearranges blocks on the disk. What blocks to rearrange could be the result of a computation involving several three-minute result sets.
I suggest a more generic but possibly slightly less efficient way by working at the file level. Maybe it will not be optimal, inducing a bit too much data or too many seeks, but would be doable without as much risk than trying to push a block reallocation API though the full I/O stack down to every fs driver. You suppose you can rearranges blocks on the disk, and I'm really not sure it's doable realistically within a reasonnable time frame. Ideally someone well versed in kernel FS internals can talk about this. Something tells me it's not gonna fly...
Daniel
El jue, 02-12-2004 a las 12:40 -0500, David Zeuthen escribió:
To me it makes more sense to look at this on the block level than on the file level; remember, a lot the files in my readahead set stems from stat'ing files - for example, there are more than 2200 distinct .png files in my readahead set. We probably only need one sector from most of those (the inode) rather than the entire file.
where do those files come from?
if those png files come from the gtk theme, it was fixed:
http://bugzilla.gnome.org/show_bug.cgi?id=154034
On Thu, 2004-12-02 at 12:40 -0500, David Zeuthen wrote:
To me it makes more sense to look at this on the block level than on the file level; remember, a lot the files in my readahead set stems from stat'ing files - for example, there are more than 2200 distinct .png files in my readahead set. We probably only need one sector from most of those (the inode) rather than the entire file.
I'm almost positive it requires kernel changes to do this the right way; one naive idea is to have a userspace daemon, capturing what blocks are read when (kernel tells this daemon using the kernel events layer). This would run in the first three minutes on each and every boot. When the system is idle (and only when running on AC power!) another daemon rearranges blocks on the disk. What blocks to rearrange could be the result of a computation involving several three-minute result sets.
Stephen Tweedie mentioned some ideas for what could be done in the kernel to optimize this stuff, I don't know specifics.
Havoc
On Thu, 2004-12-02 at 11:58 -0500, Daniel Veillard wrote:
On Thu, Dec 02, 2004 at 11:48:56AM -0500, David Zeuthen wrote:
Not sure; the big boost really comes from reordering the files on the filesystem - running readahead (which takes 11 seconds) only gives me a
I'm looking into this. a mix of creating a new directory, copying the file there, then unlinking, linking and unlinking the new copies to the old name is more likely to help.
Eek. So what happens if I install an RPM while this is happening, or prelink runs, or...? I think the caching should clearly go in the kernel; anything else is going to have serious race conditions in some form.
On Thu, Dec 02, 2004 at 01:25:39PM -0500, Colin Walters wrote:
On Thu, 2004-12-02 at 11:58 -0500, Daniel Veillard wrote:
On Thu, Dec 02, 2004 at 11:48:56AM -0500, David Zeuthen wrote:
Not sure; the big boost really comes from reordering the files on the filesystem - running readahead (which takes 11 seconds) only gives me a
I'm looking into this. a mix of creating a new directory, copying the file there, then unlinking, linking and unlinking the new copies to the old name is more likely to help.
Eek. So what happens if I install an RPM while this is happening, or
I don't expect this to happen in multiuser.
prelink runs, or...? I think the caching should clearly go in the kernel; anything else is going to have serious race conditions in some form.
If the kernel has an atomic rename system call then it can be done without race, if not that will be a very serious issue.
Daniel
On Thu, 2004-12-02 at 16:38 -0500, Daniel Veillard wrote:
If the kernel has an atomic rename system call then it can be done without race, if not that will be a very serious issue.
It has an atomic rename for files; but you can't atomically change a directory that has content. Or am I missing something? Not to mention this would break things like RPM verification.
On Thu, Dec 02, 2004 at 04:59:49PM -0500, Colin Walters wrote:
On Thu, 2004-12-02 at 16:38 -0500, Daniel Veillard wrote:
If the kernel has an atomic rename system call then it can be done without race, if not that will be a very serious issue.
It has an atomic rename for files; but you can't atomically change a directory that has content. Or am I missing something? Not to mention this would break things like RPM verification.
I think you're missing something. It is possible to copy the file somewhere else and then rename it to the original name, then update the m_time and rpm -Va should not say anything, the checksum and file system attributes should look alike. The copy to a single new directory is a trick to force allocation in the same filesystem zone (if the FS has zones).
Daniel
On Thu, 2004-12-02 at 18:40 -0500, Daniel Veillard wrote:
On Thu, Dec 02, 2004 at 04:59:49PM -0500, Colin Walters wrote:
On Thu, 2004-12-02 at 16:38 -0500, Daniel Veillard wrote:
If the kernel has an atomic rename system call then it can be done without race, if not that will be a very serious issue.
It has an atomic rename for files; but you can't atomically change a directory that has content. Or am I missing something? Not to mention this would break things like RPM verification.
I think you're missing something. It is possible to copy the file somewhere else and then rename it to the original name,
First, something could change the file while it is being copied. Second, something could change the file (e.g. delete it) after you've copied it, but before the rename. Not to mention that when you are walking a directory tree, it could have been changed in any way.
I am worried about a solution like this, because it basically requires that nothing else touches the entire directory tree while it's happening. If you do this late on shutdown, maybe we'd get away with it. Was that your plan?