Hey all,
Here's a first attempt at the "compressed anaconda runtime" patch I've been messing with for a while now. Consider it a Request For Comment, or a simple Proof-of-Concept patch.
src/pylorax/__init__.py | 11 +++- src/pylorax/constants.py | 2 + src/pylorax/installtree.py | 133 ++++++++++++++++++++++++++++++++++++++++---- 3 files changed, 133 insertions(+), 13 deletions(-)
The problem I was trying to solve is this: our initrd.img is ~360MB uncompressed in RAM. This means we can no longer boot and run on systems with 512MB RAM; see bug 682555 (and its various duplicates) for example reports.
Idea #1: Let's use squashfs!
The main problem with this is that squashfs is read-only, and anaconda (specifically loader) will crash and burn if it can't write to various parts of its root filesystem - /etc, /var, /tmp, and possibly others.
(We could probably fix this with some effort - especially once we've gotten rid of anaconda-init and loader - but I needed a low-effort proof of concept first.)
Idea #2: Our Live images are compressed with squashfs, and they use a device-mapper overlay to make themselves read-write - so let's do that!
This is basically what the patch does: it builds the anaconda runtime image into something very similar to the Live image, and uses the same dracut startup scripts as the Live images to get itself set up and running.
Here's how the btrsquash initrd build process works:
- populate installtree as usual - chroot into install tree and create dracut initramfs - clean installtree as usual (note: this removes dracut) - [other normal lorax steps...] - create initrd - create empty btrfs image "LiveOS/rootfs.img" and mount it - copy installtree into rootfs - unmount rootfs - make squashfs.img with "LiveOS/rootfs.img" inside - place squashfs.img in a cpio archive with "/etc/cmdline" - /etc/cmdline tells dracut where to find squashfs.img - concatenate dracut initramfs and squashfs.cpio
As noted in the bug report, this does require some changes to anaconda and dracut. The relevant changes are:
anaconda: dac6c6ec 6f4a1a3b (should be in anaconda-16.2 or 16.3) dracut: up to fe17f4e8 (should be in dracut-009)
I've tested the resulting images in KVM - systems with 512MB RAM happily boot, run and install. See the following screenshot, taken just after the betanag screen in anaconda: http://wwoods.fedorapeople.org/screenshots/btrsquash/btrsquash-dracut.png
There are a couple of known problems: first, the dracut initramfs will only contain the modules for one kernel - we need to run dracut once for each kernel and merge all the images together. Second, the resulting image is ~132MB, which is still too big for PPC netboot. Finally, I'm not sure how this will affect driver disks or updates images or other weird things that might involve writing to the initramfs.
Keep in mind that the image is actually two parts again - initramfs and squashfs. If we keep those parts separate, PPC users could use dracut's networking stuff to fetch the runtime image. We can also save RAM on media installs (and boot.iso and USB-based installs) by leaving the squashfs image on the media. And for every other case, we can just use the concatenated Big Image like are now.
So there we have it. Feedback welcomed. If we come up with a good plan for how to handle compressed images in the future I'll try to port this to lorax master.
Oh, one last thing - I have some scripts to convert current, existing images into working btrsquash-style images. I might put those somewhere public if people are interested, but they're pretty hacky..
-w
This adds a "ramdisk" section to lorax.conf, which contains a "style" key. This key is passed to LoraxInstallTree to control what style of ramdisk should be built. The default is "initramfs", which is the current default One Big Image style of ramdisk. More may be added later. --- src/pylorax/__init__.py | 6 +++++- src/pylorax/installtree.py | 33 ++++++++++++++++++++------------- 2 files changed, 25 insertions(+), 14 deletions(-)
diff --git a/src/pylorax/__init__.py b/src/pylorax/__init__.py index 1ce411c..de74b28 100644 --- a/src/pylorax/__init__.py +++ b/src/pylorax/__init__.py @@ -96,6 +96,9 @@ class Lorax(BaseLoraxClass): self.conf.add_section("templates") self.conf.set("templates", "ramdisk", "ramdisk.ltmpl")
+ self.conf.add_section("ramdisk") + self.conf.set("ramdisk", "style", "initramfs") + # read the config file if os.path.isfile(conf_file): self.conf.read(conf_file) @@ -197,7 +200,8 @@ class Lorax(BaseLoraxClass): # set up install tree logger.info("setting up install tree") self.installtree = LoraxInstallTree(self.yum, self.basearch, - self.libdir, self.workdir) + self.libdir, self.workdir, + self.conf.get("ramdisk", "style"))
# set up required build parameters logger.info("setting up build parameters") diff --git a/src/pylorax/installtree.py b/src/pylorax/installtree.py index 9883bba..9704769 100644 --- a/src/pylorax/installtree.py +++ b/src/pylorax/installtree.py @@ -39,16 +39,33 @@ from sysutils import *
class LoraxInstallTree(BaseLoraxClass):
- def __init__(self, yum, basearch, libdir, workdir): + def __init__(self, yum, basearch, libdir, workdir, style): BaseLoraxClass.__init__(self) self.yum = yum self.root = self.yum.installroot self.basearch = basearch self.libdir = libdir self.workdir = workdir + self.style = style
self.lcmds = constants.LoraxRequiredCommands()
+ if self.style == 'initramfs': + self.make_initrd = self.make_initramfs + + def compress(self, initrd, kernel, compression="xz"): + start = time.time() + logger.debug("creating {0}-style initrd".format(self.style)) + # move corresponding modules to the tree + shutil.move(joinpaths(self.workdir, kernel.version), + joinpaths(self.root, "modules")) + result = self.make_initrd(initrd, kernel, compression) + # move modules out of the tree again + shutil.move(joinpaths(self.root, "modules", kernel.version), + self.workdir) + elapsed = time.time() - start + return result, elapsed + def remove_locales(self): chroot = lambda: os.chroot(self.root)
@@ -506,13 +523,8 @@ class LoraxInstallTree(BaseLoraxClass): dst = joinpaths(self.root, "sbin") shutil.copy2(src, dst)
- def compress(self, initrd, kernel): + def make_initramfs(self, initrd, kernel, type="xz"): chdir = lambda: os.chdir(self.root) - start = time.time() - - # move corresponding modules to the tree - shutil.move(joinpaths(self.workdir, kernel.version), - joinpaths(self.root, "modules"))
find = subprocess.Popen([self.lcmds.FIND, "."], stdout=subprocess.PIPE, preexec_fn=chdir) @@ -525,13 +537,8 @@ class LoraxInstallTree(BaseLoraxClass): gzipped.write(cpio.stdout.read()) gzipped.close()
- # move modules out of the tree again - shutil.move(joinpaths(self.root, "modules", kernel.version), - self.workdir) - - elapsed = time.time() - start + return True
- return True, elapsed
@property def kernels(self):
This adds a "btrsquash" style for ramdisks - a squashfs-compressed btrfs image which gets loaded by the same dracut 'dmsquash-live' module used by our LiveOS images.
This saves a good ~256MB RAM, which allows us to install on systems with 512MB RAM again. --- src/pylorax/__init__.py | 5 ++ src/pylorax/constants.py | 2 + src/pylorax/installtree.py | 102 ++++++++++++++++++++++++++++++++++++++++++++ 3 files changed, 109 insertions(+), 0 deletions(-)
diff --git a/src/pylorax/__init__.py b/src/pylorax/__init__.py index de74b28..a575031 100644 --- a/src/pylorax/__init__.py +++ b/src/pylorax/__init__.py @@ -283,6 +283,11 @@ class Lorax(BaseLoraxClass): logger.info("moving stubs") self.installtree.move_stubs()
+ # if needed: create initramfs (before modules get shuffled around) + if self.conf.get("ramdisk", "style") == "btrsquash": + logger.info("creating dracut initramfs") + self.installtree.make_squash_initramfs() + # get the list of required modules logger.info("getting list of required modules") modules = [f[1:] for f in template if f[0] == "module"] diff --git a/src/pylorax/constants.py b/src/pylorax/constants.py index 547a938..c46699f 100644 --- a/src/pylorax/constants.py +++ b/src/pylorax/constants.py @@ -47,7 +47,9 @@ class LoraxRequiredCommands(dict): self["LOCALEDEF"] = "localedef" self["LOSETUP"] = "losetup" self["MKDOSFS"] = "mkdosfs" + self["MKFS_BTRFS"] = "mkfs.btrfs" self["MKISOFS"] = "mkisofs" + self["MKSQUASHFS"] = "mksquashfs" self["MODINFO"] = "modinfo" self["MOUNT"] = "mount" self["PARTED"] = "parted" diff --git a/src/pylorax/installtree.py b/src/pylorax/installtree.py index 9704769..60d6470 100644 --- a/src/pylorax/installtree.py +++ b/src/pylorax/installtree.py @@ -26,6 +26,7 @@ import sys import os import shutil import gzip +import lzma import re import glob import time @@ -52,6 +53,9 @@ class LoraxInstallTree(BaseLoraxClass):
if self.style == 'initramfs': self.make_initrd = self.make_initramfs + if self.style == 'btrsquash': + self._mkfs = [self.lcmds.MKFS_BTRFS, "-L", "Anaconda"] + self.make_initrd = self.make_live_squashfs
def compress(self, initrd, kernel, compression="xz"): start = time.time() @@ -539,6 +543,104 @@ class LoraxInstallTree(BaseLoraxClass):
return True
+ def make_squash_initramfs(self): + outfile = "/tmp/initramfs.img" + logger.debug("chrooting into installtree to create initramfs.img") + # NOTE: ext[34] gets included by default, but we have to ask for btrfs + subprocess.check_call(["chroot", self.root, + "/sbin/dracut", "--nomdadmconf", "--nolvmconf", + "--modules", "base btrfs dmsquash-live", + outfile, self.kernels[0].version]) + # move output file into installtree workdir, and repack with xz. + # NOTE: for some reason the concatenated image will fail to boot if we + # leave it gzipped? Weird. + self.initramfs = joinpaths(self.workdir, "initramfs.img") + gzip_in = gzip.open(joinpaths(self.root, outfile)) + xz_out = lzma.LZMAFile(self.initramfs, "w", + options={'format':'xz', 'level':9}) + xz_out.write(gzip_in.read()) + + + def make_live_squashfs(self, initrd, kernel, compression): + '''This is a little complicated, but dracut wants to find a squashfs + image named "squashfs.img" which contains a filesystem image named + "LiveOS/rootfs.img". + Placing squashfs.img inside a cpio image and concatenating that + with the existing initramfs.img will make squashfs.img appear inside + initramfs at boot time.''' + # These exact names are required by dracut + squashname = "squashfs.img" + imgname = "LiveOS/rootfs.img" + + # Create fs image of installtree + fsimage = joinpaths(self.workdir, "installtree.img") + open(fsimage, "wb").truncate(2*1024**3) + mountpoint = joinpaths(self.workdir, "rootfs") + os.mkdir(mountpoint, 0755) + logger.debug("formatting rootfs image: %s %s", + " ".join(self._mkfs), fsimage) + subprocess.check_call(self._mkfs + [fsimage], stdout=subprocess.PIPE) + logger.debug("mounting rootfs image at %s", mountpoint) + subprocess.check_call([self.lcmds.MOUNT, "-o", "loop", + fsimage, mountpoint]) + try: + logger.info("copying installtree into rootfs image") + srcfiles = [joinpaths(self.root, f) for f in os.listdir(self.root)] + subprocess.check_call(["cp", "-a"] + srcfiles + [mountpoint]) + finally: + logger.debug("unmounting rootfs image") + rc = subprocess.call([self.lcmds.UMOUNT, mountpoint]) + if rc != 0: + logger.critical("umount %s failed (returncode %i)", mountpoint, rc) + sys.exit(rc) + os.rmdir(mountpoint) + + # Make squashfs with rootfs image inside + logger.info("creating %s containing %s", squashname, imgname) + squashtree = joinpaths(self.workdir, "squashfs") + os.makedirs(joinpaths(squashtree, os.path.dirname(imgname))) + shutil.move(fsimage, joinpaths(squashtree, imgname)) + squashimage = joinpaths(self.workdir, squashname) + subprocess.check_call([self.lcmds.MKSQUASHFS, squashtree, squashimage, + "-comp", compression]) + shutil.rmtree(squashtree) + + # Put squashimage in a new cpio image with dracut config + logger.debug("creating cpio image containing %s", squashname) + initramfsdir = joinpaths(self.workdir, "initramfs") + squash_cpio = joinpaths(self.workdir, "squashfs.cpio") + cmdline = joinpaths(initramfsdir, "etc/cmdline") + os.makedirs(os.path.dirname(cmdline)) + # write boot cmdline for dracut + with open(cmdline, "wb") as fobj: + fobj.write("root=live:/{0}\n".format(squashname)) + if self.style == "btrsquash": + fobj.write("rootflags=compress\n") + # add squashimage + shutil.move(squashimage, initramfsdir) + # create cpio container + chdir = lambda: os.chdir(initramfsdir) + find = subprocess.Popen([self.lcmds.FIND, "."], stdout=subprocess.PIPE, + preexec_fn=chdir) + cpio = subprocess.Popen([self.lcmds.CPIO, "--quiet", "-c", "-o"], + stdin=find.stdout, + stdout=open(squash_cpio, "wb"), + preexec_fn=chdir) + cpio.communicate() + shutil.rmtree(initramfsdir) + + # create final image + logger.debug("concatenating initramfs.img and squashfs cpio") + logger.debug("initramfs.img size = %i", os.stat(self.initramfs).st_size) + with open(initrd.fpath, "wb") as output: + with open(self.initramfs, "rb") as fobj: + output.write(fobj.read()) + with open(squash_cpio, "rb") as fobj: + output.write(fobj.read()) + os.remove(self.initramfs) + os.remove(squash_cpio) + + return True
@property def kernels(self):
Chris asked me about this on IRC so I thought I'd mention it here:
On Fri, 2011-03-25 at 14:35 -0400, Will Woods wrote:
Keep in mind that the image is actually two parts again - initramfs and squashfs.
I should be more careful about talking about splitting the images back up! Let me be clear: with this proposed image layout, loader is still very, very dead. The layout is like this:
Old: [loader] -> [stage2] F15: [anaconda runtime] New: [dracut] -> [anaconda runtime]
Note that we are *not* responsible for any of the code outside the anaconda runtime - which means we're still no longer responsible for anything that involves locating or mounting the anaconda runtime.
All of that is completely up to dracut, which has far better debugging / tracing tools than loader ever did, plus a builtin shell, udev, LVM assembly tools, raid assembly tools, fairly robust networking, iSCSI setup, etc.
We can also save RAM on media installs (and boot.iso and USB-based installs) by leaving the squashfs image on the media. And for every other case, we can just use the concatenated Big Image like are now.
This is pretty simple to set up - boot.iso would contain LiveOS/squashfs.img and the initramfs would have an /etc/cmdline that directs it to look for (e.g.) root=live:CDLABEL="Fedora 15" This would save another ~128MB RAM. The same technique could be applied for performing installs from external drives (USB sticks and the like).
If we keep those parts separate, PPC users could use dracut's networking stuff to fetch the runtime image.
And this would be up to the sysadmin to set up, not us. We can default to booting from the combined image for the normal network case, but sysadmins are an ingenious lot; as long as we provide them separate images[1] they'll figure out how to make the corner cases work using dracut's capabilities.
-w
[1] Or a way to split the combined image(s) into separate parts. All you really need is the length of the dracut initramfs and: dd if=initrd.img of=dracut.img bs=$DRACUT_LENGTH count=1 dd if=initrd.img of=anaconda.img bs=$DRACUT_LENGTH skip=1
I should be more careful about talking about splitting the images back up! Let me be clear: with this proposed image layout, loader is still very, very dead. The layout is like this:
Old: [loader] -> [stage2] F15: [anaconda runtime] New: [dracut] -> [anaconda runtime]
Note that we are *not* responsible for any of the code outside the anaconda runtime - which means we're still no longer responsible for anything that involves locating or mounting the anaconda runtime.
Yeah, I was a little concerned about all this stuff based on the terminology, but this more in depth mail took care of everything. So far, it looks like a good plan to me.
- Chris
On Mon, 2011-03-28 at 14:02 -0400, Will Woods wrote:
Chris asked me about this on IRC so I thought I'd mention it here:
On Fri, 2011-03-25 at 14:35 -0400, Will Woods wrote:
Keep in mind that the image is actually two parts again - initramfs and squashfs.
This looks like a stripped down live-cd to me. How big does squashfs.img become?
I should be more careful about talking about splitting the images back up! Let me be clear: with this proposed image layout, loader is still very, very dead. The layout is like this:
Old: [loader] -> [stage2] F15: [anaconda runtime] New: [dracut] -> [anaconda runtime]
Note that we are *not* responsible for any of the code outside the anaconda runtime - which means we're still no longer responsible for anything that involves locating or mounting the anaconda runtime.
All of that is completely up to dracut, which has far better debugging / tracing tools than loader ever did, plus a builtin shell, udev, LVM assembly tools, raid assembly tools, fairly robust networking, iSCSI setup, etc.
We can also save RAM on media installs (and boot.iso and USB-based installs) by leaving the squashfs image on the media. And for every other case, we can just use the concatenated Big Image like are now.
Yes please, us low end people need a break.
This is pretty simple to set up - boot.iso would contain LiveOS/squashfs.img and the initramfs would have an /etc/cmdline that directs it to look for (e.g.) root=live:CDLABEL="Fedora 15" This would save another ~128MB RAM. The same technique could be applied for performing installs from external drives (USB sticks and the like).
Again, great anything to save on the ram. This is very like the old run from ram option on the older livecds, now you don't have to pass that option at the boot prompt. In the case of media backed squashfs.img, using uuid or label for the cmdline would the same as a livecd/usb does now?
If we keep those parts separate, PPC users could use dracut's networking stuff to fetch the runtime image.
And this would be up to the sysadmin to set up, not us. We can default to booting from the combined image for the normal network case, but sysadmins are an ingenious lot; as long as we provide them separate images[1] they'll figure out how to make the corner cases work using dracut's capabilities.
-w
Great idea,
Jerry
On Fri, 2011-03-25 at 14:35 -0400, Will Woods wrote: =
Idea #2: Our Live images are compressed with squashfs, and they use a device-mapper overlay to make themselves read-write - so let's do that!
This is basically what the patch does: it builds the anaconda runtime image into something very similar to the Live image, and uses the same dracut startup scripts as the Live images to get itself set up and running.
Dude. Awesome.
Jon.
anaconda-devel@lists.fedoraproject.org