Attached is a revised version of my turboLiveInst patch to livecd-tools and anaconda.
This version is more polished. I.e. bugs have been fixed, complexity removed, and therefore should be easier to review.
I performed some anecdotal performance tests, on a sony vaio vgn-n250e. I used a 30G destination volume for all tests, and when using usbstick media, it was media that reported 8.5MB/s from hdparm -t. I did have selinux disabled, and did not use the prelink option. I'd love to hear performance numbers from differing test rigs.
The performance results I got were-
install from cdrom without turboLiveInst:
copy: 250s postinst: 86s
install from cdrom with turboLiveInst:
copy: 299s postinst: 84s
install from usb without turboLiveInst:
copy: 226s postinst: 72s
install from usb with turboLiveInst:
copy: 175s postinst: 72s
Conclusions:
On this testrig, installing from cdrom, turboLiveInst yielded a 20% speedup in copy, which resulted in an end to end install speedup of 15%. Installing from usb, turboLiveInst yielded a 29% speedup in copy, which resulted in an end to end install speedup of 20%.
I did test copy-to-ram mode, and the resulting benefits were laughably huge. But this is only because this laptop has 1G of ram, and very strange behaviour occurs this near the threshold of having too little ram to use this feature. Though this is still an argument in favor of turboLiveInst, in that somehow it found itself on the better side of the threshold. I would expect that with 2G of ram, the benefit would be on the order of 35-50% speedup, as the main thing masking the benefit in the cdrom case, is the slow access to install media, which hides the benefit of cutting the needed disk writes nearly in half.
The secondary benefit of turboLiveInst is that it removes the artificial limitation that the target rootfs must be greater than 4.0G, instead of the 2.1G actual uncompressed size of the contents of the LiveCD.
Jeremy has pushed back against this patch because of complexity. Hopefully this round of polishing will make the patch much easier to read and understand. In addition, since the cleanupDeleted patch which this one depends on has already been merged, that should also make this a bit more palatable.
Jeremy also brought up the idea of doing a file level copy installation, rather than the current block level mechanism. This _is_ a good idea, in my opinion, in that it will more intelligently support situations such as a separate /usr filesystem. In addition there is no way that turboLiveInst or the existing block level mechanism can be made to support xfs or other non-ext3 destination filesystems chosen by the user (should those options return).
But- I would argue that file level installations may suffer badly due to cdrom seeking. I would also argue that there is no reason why turboLiveInst, could not be the first choice for installation technique, with fallbacks to file level copies for the seperate /usr and xfs type scenarios.
Ultimately I just hope that turboLiveInst gets serious consideration for F8, via performance comparison with whatever other options may exist.
Finally, here are some notes on the architecture, which may help you to understand the code-
As I mentioned in the original patch, the basic idea is to simply, at livecd-creator build time, use a device mapper snapshot to generate a delta file between the 4.0G filesystem housing 2.1G of data, and a 2.1G filesystem holding the same data. Then at anaconda/liveinst time, that delta file is used to recreate a virtual image of the 2.1G filesystem, so that it can be copied to the installation target, rather than the 4.0G filesystem which includes 1.9G of zeros that needn't be written to disk.
I ended up using /dev/loop118 in the initramfs init (mayflower generated) to expose the delta file. /dev/loop118 was mknod'd already by mayflower, but not actually used for anything. I used it to expose the delta file (osmin.tgz), because /dev/loop121 was being used to expose the 4.0G os.img, and it seemed simplest to use an identical mechanism to expose that data. Because by the time anaconda runs, the original cdrom and squashfs filesystems have been lazy unmounted, a simple cp at that time was not an option(?).
I chose to extract the delta file (osmin, a 16MB sparse file containing 1.2M of data, compressed to 25kb on cdrom) into /dev/shm (i.e. ram) so that reads from it would not try to go to the cdrom. This may not have been necessary.
I chose to calculate the size of the filesystem at livecd-creator time, and include it with osmin as osmin.size(together forming osmin.tgz). This is less complex than what I did in the first pass, which was to use dumpe2fs to calculate it at ananconda time.
As always, questions, comments, criticisms, and especially testers are more than welcome.
peace...
-dmc
Hi, This doesn't actually seem so bad to me ...
The main suggestion I'd have is that if the osmin.size was dropped[1], and if we could reduce the size of the COW area of the snapshot to its minimal (i.e. look at how much is used and truncate it at that point), then we can just include osmin.img directly and cut out a lot of the tricks.
Also, the snapshot device created by liveinst.sh should be readonly (dmsetup -r, I think).
I'd also probably move the snapshot creation code out of liveinst.sh into anaconda itself.
I'd tend towards not having --cleanup-deleted and --turbo-liveinst options and just make it the default.
Finally, perhaps explain this feature as:
"Reduce installation time by not copying unused data to disk"
And if you want to explain how it actually works:
"We avoid copying unused data by copying a filesystem image that has been reduced to the minimal possible size. This minimal image is not sufficient for a running LiveCD as applications need room to write more data, but the minimal image is efficiently created just before copying by applying a pre-calculated set of deltas to the original large filesystem image."
Cheers, Mark.
[1] - Just change getLiveSizeMb() to always "dumpe2fs -h" to get the filesystem size
Mark McLoughlin wrote:
Hi, This doesn't actually seem so bad to me ...
Thanks for taking a look.
The main suggestion I'd have is that if the osmin.size was dropped[1],
Actually in my first iteration of the patch, I was using dumpe2fs, but my parsing of it with awk and grep looked pretty ugly. I'm not opposed to going back to that if you tell me that is what will get accepted. Another minor justification is to minimize the work done at install time. (though I admit the bash+dumpe2fs+grep+sed is not going to add terribly significant time to the install).
historically-
" echo "$(( $(/sbin/dumpe2fs -h /dev/mapper/live-osimg-min 2>/dev/null | grep "^Block size:" | sed -e 's/.*:\s*//') * $(/sbin/dumpe2fs -h /dev/mapper/live-osimg-min 2>/dev/null | grep "^Block count:" | sed -e 's/.*:\s*//') / 512 ))" > /dev/shm/osmin.size "
and if we could reduce the size of the COW area of the snapshot to its minimal (i.e. look at how much is used and truncate it at that point),
This is pretty easy. In the intervening time, I am now confidently aware that I can use dmsetup status on the snapshotted device to determine how much of the file can be truncated.
then we can just include osmin.img directly and cut out a lot of the tricks.
It needs to be compressed, which is why the truncation didn't really matter. The reason it needs to be compressed, is because it can't live in the squashfs, it has to live on the iso9660. And note, that it compresses from 1.2M of actual data, to 7kb of actual data.
I'm all for cutting out the tricks, if it is possible.
Can you see an alternate way, with fewer tricks, to get osmin.img to be visible to anaconda?
You could try to force the mounting of /dev/live onto /mnt/live, but that breaks the copy-to-ram case, because in that case, the cdrom doesn't exist. If breaking copy-to-ram is ok, and you prefer grabbing osmin.img from a mounted /dev/live, I can do that.
Honestly I thought that using the exact same mechanism to propogate osmin.img as was used to propogate os.img, was not a trick, but elegant.
Also, the snapshot device created by liveinst.sh should be readonly (dmsetup -r, I think).
Sounds good, and the loop devices for the components while we're at it.
I'd also probably move the snapshot creation code out of liveinst.sh into anaconda itself.
I'm not so sure about this. liveinst.sh seemed appropriate, in that it was where the installable image was defined to anaconda. This also depends on any suggestions you might have about alternate ways of exposing osmin.img, so I'll wait to hear what you think about that.
Thanks again for taking a look at this. I'll be happy to do whatever you suggest to get it merge-worthy, but lets at least have one back and forth to make sure you understand why I made the design choices I did.
Regards,
-dmc
I'd tend towards not having --cleanup-deleted and --turbo-liveinst options and just make it the default.
Finally, perhaps explain this feature as:
"Reduce installation time by not copying unused data to disk"
And if you want to explain how it actually works:
"We avoid copying unused data by copying a filesystem image that has been reduced to the minimal possible size. This minimal image is not sufficient for a running LiveCD as applications need room to write more data, but the minimal image is efficiently created just before copying by applying a pre-calculated set of deltas to the original large filesystem image."
Cheers, Mark.
[1] - Just change getLiveSizeMb() to always "dumpe2fs -h" to get the filesystem size
-- Fedora-livecd-list mailing list Fedora-livecd-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-livecd-list
On Mon, 2007-09-03 at 15:33 -0500, Douglas McClendon wrote:
Mark McLoughlin wrote:
The main suggestion I'd have is that if the osmin.size was dropped[1],
Actually in my first iteration of the patch, I was using dumpe2fs, but my parsing of it with awk and grep looked pretty ugly.
Something like this is nice and simple:
import commands
def parseField(output, field): for line in output.split("\n"): if line.startswith(field + ":"): return line[len(field) + 1:].strip()
raise KeyError("Failed to find field '%s' in output" % field)
(status, output) = commands.getstatusoutput("/sbin/dumpe2fs -h /dev/root")
print parseField(output, "Block count")
and if we could reduce the size of the COW area of the snapshot to its minimal (i.e. look at how much is used and truncate it at that point),
This is pretty easy. In the intervening time, I am now confidently aware that I can use dmsetup status on the snapshotted device to determine how much of the file can be truncated.
Yep.
then we can just include osmin.img directly and cut out a lot of the tricks.
It needs to be compressed, which is why the truncation didn't really matter. The reason it needs to be compressed, is because it can't live in the squashfs, it has to live on the iso9660. And note, that it compresses from 1.2M of actual data, to 7kb of actual data.
It can be truncated down to 1.2M, but it can still be compressed to 7kb? If so, the compression does sounds worthwhile.
Honestly I thought that using the exact same mechanism to propogate osmin.img as was used to propogate os.img, was not a trick, but elegant.
What you have is mostly fine IMHO, it was the tarball I had a problem with.
Cheers, Mark.
On 9/4/07, Mark McLoughlin markmc@redhat.com wrote:
(status, output) = commands.getstatusoutput("/sbin/dumpe2fs -h
/dev/root")
Not directly related to your discussion, but personally I am fairly fanatical about never involving /bin/sh in - well, anything if I can avoid it =) Here it will be fine, but then maybe someone comes along later and wants to add a user-passed parameter to the command, and then you need to quote (if it's even remembered, and if it's not then you have weird failures and in other software security problems).
The "subprocess" module is the fix:
output = subprocess.Popen(['/sbin/dumpe2fs', '-h', '/dev/root'], stdout= subprocess.PIPE).communicate()[0]
On Tue, 2007-09-04 at 09:49 -0400, Colin Walters wrote:
On 9/4/07, Mark McLoughlin markmc@redhat.com wrote:
(status, output) = commands.getstatusoutput("/sbin/dumpe2fs -h /dev/root")
Not directly related to your discussion, but personally I am fairly fanatical about never involving /bin/sh in - well, anything if I can avoid it =) Here it will be fine, but then maybe someone comes along later and wants to add a user-passed parameter to the command, and then you need to quote (if it's even remembered, and if it's not then you have weird failures and in other software security problems).
Fair point but ...
The "subprocess" module is the fix:
output = subprocess.Popen(['/sbin/dumpe2fs', '-h', '/dev/root'], stdout=subprocess.PIPE).communicate()[0]
... that sucks :-)
(But yeah, agreed)
Cheers, Mark.
Mark McLoughlin wrote:
On Mon, 2007-09-03 at 15:33 -0500, Douglas McClendon wrote:
Mark McLoughlin wrote:
[snip]
thanks for providing the dumpe2fs python code Mark and Collin, should save you a cleanup patch later on :)
[snip]
It can be truncated down to 1.2M, but it can still be compressed to 7kb? If so, the compression does sounds worthwhile.
Yup. I can't actually wrap my head around the 1.2M->7kb compression, but I can give a reasonable explanation about why the inverse is possible. By the inverse, I mean using a snapshot to create a delta describing the differences between the minimized-fs and the maximized-fs (not max->min as is the actual case in use). I.e. in the min->max, all that is happening is some repetitive formatting of the filesystem out in the new unused area. As a result all that repetitive stuff can compress down to next to nothing. I assume something similar is happening in the max->min case.
What you have is mostly fine IMHO, it was the tarball I had a problem with.
Ok, in that case, it should be no sweat (relatively) to put together a new version of the patch, updated against the 30+ commits seen since the last version, which uses dumpe2fs in anaconda, no tarball, osmin.img truncation, and setting loops and dms to readonly where possible.
But, before I do that, can I get feedback from one other person, preferably who has commit authority, to tell me that this will be committed? Jeremy, you mentioned objections in the past? (all my goading of said objections, was because I was so sure that there is no remotely more correct way to accomplish the task, and I would be genuinely entertained to be corrected on that fact)
I would also like to grumblingly say, that I think it would have been much better if this review could have been done sooner, so that it could make it into F8T2. This seems like a significant improvement (20% install speedup), which could have easily been in F8T2, and thus out of the way as a cloud of potential problem hanging over F8T3.
-dmc
Cheers, Mark.
-- Fedora-livecd-list mailing list Fedora-livecd-list@redhat.com https://www.redhat.com/mailman/listinfo/fedora-livecd-list
On Tue, 2007-09-04 at 16:47 -0500, Douglas McClendon wrote:
Mark McLoughlin wrote:
On Mon, 2007-09-03 at 15:33 -0500, Douglas McClendon wrote:
Mark McLoughlin wrote:
What you have is mostly fine IMHO, it was the tarball I had a problem with.
Ok, in that case, it should be no sweat (relatively) to put together a new version of the patch, updated against the 30+ commits seen since the last version, which uses dumpe2fs in anaconda, no tarball, osmin.img truncation, and setting loops and dms to readonly where possible.
But, before I do that, can I get feedback from one other person, preferably who has commit authority, to tell me that this will be committed? Jeremy, you mentioned objections in the past? (all my goading of said objections, was because I was so sure that there is no remotely more correct way to accomplish the task, and I would be genuinely entertained to be corrected on that fact)
I think that some of the above will go a long way towards making things look nicer which is going to make me more amenable to it. I still don't necessarily _like_ it because I still think that it makes anaconda depend a bit too much on a lot of the details; but I guess as long as it can fall back cleanly in the absence of the bits, that just means a larger testing matrix.
Jeremy
Jeremy Katz wrote:
On Tue, 2007-09-04 at 16:47 -0500, Douglas McClendon wrote:
Mark McLoughlin wrote:
On Mon, 2007-09-03 at 15:33 -0500, Douglas McClendon wrote:
Mark McLoughlin wrote:
What you have is mostly fine IMHO, it was the tarball I had a problem with.
Ok, in that case, it should be no sweat (relatively) to put together a new version of the patch, updated against the 30+ commits seen since the last version, which uses dumpe2fs in anaconda, no tarball, osmin.img truncation, and setting loops and dms to readonly where possible.
But, before I do that, can I get feedback from one other person, preferably who has commit authority, to tell me that this will be committed? Jeremy, you mentioned objections in the past? (all my goading of said objections, was because I was so sure that there is no remotely more correct way to accomplish the task, and I would be genuinely entertained to be corrected on that fact)
I think that some of the above will go a long way towards making things look nicer which is going to make me more amenable to it. I still don't necessarily _like_ it because I still think that it makes anaconda depend a bit too much on a lot of the details; but I guess as long as it can fall back cleanly in the absence of the bits, that just means a larger testing matrix.
I'm going to assume that that was a "yes I like what the code does, yes it will be applied"
As far as the larger testing matrix- If you go by my suggestion, which markmc echoed, and make it the default path (along with removing the --ignore-deleted option in favor of it being default), then the testing matrix will be the same.
The only reason I offered --turbo-liveinst and --cleanup/ignore-deleted as options was to ease their acceptance. I really believe that they are the 'one true way' and that there is no benefit gained by having the original code path as an option. As you said, it just means a larger testing matrix.
-dmc
On Wed, 2007-09-05 at 15:54 -0500, Douglas McClendon wrote:
Jeremy Katz wrote:
On Tue, 2007-09-04 at 16:47 -0500, Douglas McClendon wrote:
Mark McLoughlin wrote:
On Mon, 2007-09-03 at 15:33 -0500, Douglas McClendon wrote:
Mark McLoughlin wrote:
What you have is mostly fine IMHO, it was the tarball I had a problem with.
Ok, in that case, it should be no sweat (relatively) to put together a new version of the patch, updated against the 30+ commits seen since the last version, which uses dumpe2fs in anaconda, no tarball, osmin.img truncation, and setting loops and dms to readonly where possible.
But, before I do that, can I get feedback from one other person, preferably who has commit authority, to tell me that this will be committed? Jeremy, you mentioned objections in the past? (all my goading of said objections, was because I was so sure that there is no remotely more correct way to accomplish the task, and I would be genuinely entertained to be corrected on that fact)
I think that some of the above will go a long way towards making things look nicer which is going to make me more amenable to it. I still don't necessarily _like_ it because I still think that it makes anaconda depend a bit too much on a lot of the details; but I guess as long as it can fall back cleanly in the absence of the bits, that just means a larger testing matrix.
I'm going to assume that that was a "yes I like what the code does, yes it will be applied"
I've been largely persuaded at least between your mails and markmc twisting my arm a little yesterday ;)
As far as the larger testing matrix- If you go by my suggestion, which markmc echoed, and make it the default path (along with removing the --ignore-deleted option in favor of it being default), then the testing matrix will be the same.
Well, the anaconda path still has to handle both cases as people could be using an older livecd-creator. That's really where the code-path explosion is.
It definitely makes sense to make the new path the default in livecd-creator and probably not to really make it a separate option to enable/disable. As has probably been noticed, there are a lot fewer options now, and a good chunk are marked as "for debugging/development use only" with comments. I'm going to poke at optparse and hide them from --help too.
Jeremy
Jeremy Katz wrote:
On Wed, 2007-09-05 at 15:54 -0500, Douglas McClendon wrote:
Jeremy Katz wrote:
I think that some of the above will go a long way towards making things look nicer which is going to make me more amenable to it. I still don't necessarily _like_ it because I still think that it makes anaconda depend a bit too much on a lot of the details; but I guess as long as it can fall back cleanly in the absence of the bits, that just means a larger testing matrix.
I'm going to assume that that was a "yes I like what the code does, yes it will be applied"
I've been largely persuaded at least between your mails and markmc twisting my arm a little yesterday ;)
Ok.
Now, for me to try to give a little on your points-
I was thinking about what you said about "it makes anaconda depend a bit too much on the details". Can you clarify that a bit? Specifically, I can imagine one or two possible (but not entirely pretty) ways to change the patch, so that it makes 0 change to anaconda, at the expense of adding a fair amount of complexity to liveinst.sh. But I'm not sure what your long-term vision of liveinst.sh is. I.e. do you foresee it getting folded into anaconda at some point?
-dmc
On Wed, 2007-09-05 at 17:49 -0500, Douglas McClendon wrote:
I was thinking about what you said about "it makes anaconda depend a bit too much on the details". Can you clarify that a bit? Specifically, I can imagine one or two possible (but not entirely pretty) ways to change the patch, so that it makes 0 change to anaconda, at the expense of adding a fair amount of complexity to liveinst.sh. But I'm not sure what your long-term vision of liveinst.sh is. I.e. do you foresee it getting folded into anaconda at some point?
For all intents and purposes, liveinst is part of anaconda. It just does a tiny bit of setup and gives something that's useful to run. It will probably continue to exist, mostly because it gives people something nicer to tell people to run than saying "run anaconda with this set of options". But putting complexity there isn't better or worse really
Jeremy
Hi Jeremy,
On Wed, 2007-09-05 at 17:49 -0400, Jeremy Katz wrote:
As has probably been noticed, there are a lot fewer options now, and a good chunk are marked as "for debugging/development use only" with comments.
One option that I'm missing at the moment is the "--repo" option.
I know the syntax sucked, and you couldn't specify a mirror list, but I found it useful where you have default URLs of the repos in the kickstart file but you expect most users to have a more local mirror of those repos. Allowing them to just override on the command line is much nicer than requiring them to edit the kickstart file.
e.g. the instructions to a developer might be:
$> git-clone <my-kickstarts-repo> kickstarts $> cd kickstarts $> livecd-creator --repo fedora,file:///my/mirror -c foo.ks $> <hack on the kickstarts> $> git-commit
but now it's:
$> git-clone <my-kickstarts-repo> kickstarts $> cd kickstarts $> <edit foo.ks to change the repo URL> $> livecd-creator -c foo.ks $> <hack on the kickstarts> $> <edit foo.ks to change the repo URL back to the default> $> git-commit
Cheers, Mark.
On Fri, 2007-09-07 at 10:16 +0100, Mark McLoughlin wrote:
On Wed, 2007-09-05 at 17:49 -0400, Jeremy Katz wrote:
As has probably been noticed, there are a lot fewer options now, and a good chunk are marked as "for debugging/development use only" with comments.
One option that I'm missing at the moment is the "--repo" option.
I know the syntax sucked, and you couldn't specify a mirror list, but I found it useful where you have default URLs of the repos in the kickstart file but you expect most users to have a more local mirror of those repos. Allowing them to just override on the command line is much nicer than requiring them to edit the kickstart file.
So my biggest problem with the repo option wasn't really the syntax, but more the impact on having reproducible configs. If your image is created using an arbitrary new set of repos specified on the command line, that makes it harder to pass out the configs for other people to build on top of your images.
But I can see where having an override to point at a local location for a repo can be helpful. I can see two ways of doing it
1) Make another repo command with the same repo id just override the repo rather than spit an error. This would then mean you could have my-local-ks.cfg with a %include of the base config and then a repo line pointing to your local repo
2) A --repo-override that overrides a given repo id using similar syntax to the old --repo option
Jeremy
On Fri, 07 Sep 2007 10:18:06 -0400 Jeremy Katz katzj@redhat.com wrote:
So my biggest problem with the repo option wasn't really the syntax, but more the impact on having reproducible configs. If your image is created using an arbitrary new set of repos specified on the command line, that makes it harder to pass out the configs for other people to build on top of your images.
Another hard thing about --repo lines is how do you express things like excludes or includes from that repo, or eventually the cost of that repo for the sake of download choice?
On Fri, 2007-09-07 at 10:30 -0400, Jesse Keating wrote:
On Fri, 07 Sep 2007 10:18:06 -0400 Jeremy Katz katzj@redhat.com wrote:
So my biggest problem with the repo option wasn't really the syntax, but more the impact on having reproducible configs. If your image is created using an arbitrary new set of repos specified on the command line, that makes it harder to pass out the configs for other people to build on top of your images.
Another hard thing about --repo lines is how do you express things like excludes or includes from that repo, or eventually the cost of that repo for the sake of download choice?
That's why I'd only be up for doing it as an explicit "override the URL for this repo" and not any of the more general pieces. Then all of the other stuff has to be in the config, you're just saying "hey, for fedora development, I have something that's a lot easier to get at located at file:///home/katzj/rawhide/i386"
Jeremy
On Fri, 07 Sep 2007 10:36:45 -0400 Jeremy Katz katzj@redhat.com wrote:
That's why I'd only be up for doing it as an explicit "override the URL for this repo" and not any of the more general pieces. Then all of the other stuff has to be in the config, you're just saying "hey, for fedora development, I have something that's a lot easier to get at located at file:///home/katzj/rawhide/i386"
Yeah, I think that makes a lot of sense.
On Wed, 2007-09-05 at 09:58 -0400, Jeremy Katz wrote:
I think that some of the above will go a long way towards making things look nicer which is going to make me more amenable to it. I still don't necessarily _like_ it because I still think that it makes anaconda depend a bit too much on a lot of the details; but I guess as long as it can fall back cleanly in the absence of the bits, that just means a larger testing matrix.
The extra details that anaconda will know about is "if there is an osmin.img, use that to create a snapshot device on top of os.img".
That does raise an interesting question, I think ... there is a "protocol" between livecd-tools and anaconda, which would suggest that the kickstart configs should require[1] a minimal version of anaconda and, also, that anaconda needs to continue supporting older versions of the protocol for a while.
Maybe a good way to express it would be for the anaconda RPM to "Provides: livecd-protocol = 1" and have livecd-creator check whether any of the repos provide the appropriate version before creating the livecd?
(In this case, though, we probably wouldn't bump the protocol number since the fallbacks ensure continued compatibility)
Just a thought ...
Cheers, Mark.
[1] - Yeah, I know kickstart can't do that.
livecd@lists.fedoraproject.org