xz compression makes the initrd 33% smaller (136M -> 90M). The extra memory overhead at decompression time is negligible: testing showed that any system with enough RAM to use the gzip-compressed initrd was also able to load the xz-compressed initrd with no trouble.
Note that '--check=crc32' is needed because the kernel doesn't know how to perform the default xz integrity check (crc64). --- src/pylorax/constants.py | 2 ++ src/pylorax/installtree.py | 24 ++++++++++++------------ 2 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/src/pylorax/constants.py b/src/pylorax/constants.py index 547a938..4967450 100644 --- a/src/pylorax/constants.py +++ b/src/pylorax/constants.py @@ -41,6 +41,7 @@ class LoraxRequiredCommands(dict): self["DMSETUP"] = "dmsetup" self["FIND"] = "find" self["GCONFTOOL"] = "gconftool-2" + self["GZIP"] = "gzip" self["IMPLANTISOMD5"] = "implantisomd5" self["ISOHYBRID"] = "isohybrid" self["LDCONFIG"] = "ldconfig" @@ -52,6 +53,7 @@ class LoraxRequiredCommands(dict): self["MOUNT"] = "mount" self["PARTED"] = "parted" self["UMOUNT"] = "umount" + self["XZ"] = "xz"
def __getattr__(self, attr): return self[attr] diff --git a/src/pylorax/installtree.py b/src/pylorax/installtree.py index 9883bba..e1a0bdb 100644 --- a/src/pylorax/installtree.py +++ b/src/pylorax/installtree.py @@ -25,7 +25,6 @@ logger = logging.getLogger("pylorax.installtree") import sys import os import shutil -import gzip import re import glob import time @@ -49,6 +48,14 @@ class LoraxInstallTree(BaseLoraxClass):
self.lcmds = constants.LoraxRequiredCommands()
+ def compress_initrd_pipe(self, stdin, stdout): + return subprocess.Popen([self.lcmds.XZ, '--check=crc32', '-9', '-c'], + stdin=stdin, stdout=stdout) + + def compress_module(self, filename): + return subprocess.call([self.lcmds.GZIP, '-9', '-f', + filename]) + def remove_locales(self): chroot = lambda: os.chroot(self.root)
@@ -310,14 +317,7 @@ class LoraxInstallTree(BaseLoraxClass): for root, _, fnames in os.walk(moddir): for fname in filter(lambda f: f.endswith(".ko"), fnames): path = os.path.join(root, fname) - with open(path, "rb") as fobj: - data = fobj.read() - - gzipped = gzip.open("{0}.gz".format(path), "wb") - gzipped.write(data) - gzipped.close() - - os.unlink(path) + self.compress_module(path)
def run_depmod(self, kernel): systemmap = "System.map-{0.version}".format(kernel) @@ -521,9 +521,9 @@ class LoraxInstallTree(BaseLoraxClass): stdin=find.stdout, stdout=subprocess.PIPE, preexec_fn=chdir)
- gzipped = gzip.open(initrd.fpath, "wb") - gzipped.write(cpio.stdout.read()) - gzipped.close() + compress = self.compress_initrd_pipe(stdin=cpio.stdout, + stdout=open(initrd.fpath,"wb")) + compress.communicate()
# move modules out of the tree again shutil.move(joinpaths(self.root, "modules", kernel.version),
xz compression makes the initrd 33% smaller (136M -> 90M). The extra memory overhead at decompression time is negligible: testing showed that any system with enough RAM to use the gzip-compressed initrd was also able to load the xz-compressed initrd with no trouble.
Note that '--check=crc32' is needed because the kernel doesn't know how to perform the default xz integrity check (crc64).
If it's okay with mgracik, it's okay with me. I'm happy to see the initrd get below 100 MB, which was my original goal.
- Chris
Why do you want to remove the gzip python module calls, and replace them with an external call to gzip?
--
Martin Gracik
----- Original Message -----
xz compression makes the initrd 33% smaller (136M -> 90M). The extra memory overhead at decompression time is negligible: testing showed that any system with enough RAM to use the gzip-compressed initrd was also able to load the xz-compressed initrd with no trouble.
Note that '--check=crc32' is needed because the kernel doesn't know how to perform the default xz integrity check (crc64).
src/pylorax/constants.py | 2 ++ src/pylorax/installtree.py | 24 ++++++++++++------------ 2 files changed, 14 insertions(+), 12 deletions(-)
diff --git a/src/pylorax/constants.py b/src/pylorax/constants.py index 547a938..4967450 100644 --- a/src/pylorax/constants.py +++ b/src/pylorax/constants.py @@ -41,6 +41,7 @@ class LoraxRequiredCommands(dict): self["DMSETUP"] = "dmsetup" self["FIND"] = "find" self["GCONFTOOL"] = "gconftool-2"
- self["GZIP"] = "gzip"
self["IMPLANTISOMD5"] = "implantisomd5" self["ISOHYBRID"] = "isohybrid" self["LDCONFIG"] = "ldconfig" @@ -52,6 +53,7 @@ class LoraxRequiredCommands(dict): self["MOUNT"] = "mount" self["PARTED"] = "parted" self["UMOUNT"] = "umount"
- self["XZ"] = "xz"
def __getattr__(self, attr): return self[attr] diff --git a/src/pylorax/installtree.py b/src/pylorax/installtree.py index 9883bba..e1a0bdb 100644 --- a/src/pylorax/installtree.py +++ b/src/pylorax/installtree.py @@ -25,7 +25,6 @@ logger = logging.getLogger("pylorax.installtree") import sys import os import shutil -import gzip import re import glob import time @@ -49,6 +48,14 @@ class LoraxInstallTree(BaseLoraxClass):
self.lcmds = constants.LoraxRequiredCommands()
- def compress_initrd_pipe(self, stdin, stdout):
- return subprocess.Popen([self.lcmds.XZ, '--check=crc32', '-9',
'-c'],
- stdin=stdin, stdout=stdout)
- def compress_module(self, filename):
- return subprocess.call([self.lcmds.GZIP, '-9', '-f',
- filename])
def remove_locales(self): chroot = lambda: os.chroot(self.root)
@@ -310,14 +317,7 @@ class LoraxInstallTree(BaseLoraxClass): for root, _, fnames in os.walk(moddir): for fname in filter(lambda f: f.endswith(".ko"), fnames): path = os.path.join(root, fname)
- with open(path, "rb") as fobj:
- data = fobj.read()
- gzipped = gzip.open("{0}.gz".format(path), "wb")
- gzipped.write(data)
- gzipped.close()
- os.unlink(path)
- self.compress_module(path)
def run_depmod(self, kernel): systemmap = "System.map-{0.version}".format(kernel) @@ -521,9 +521,9 @@ class LoraxInstallTree(BaseLoraxClass): stdin=find.stdout, stdout=subprocess.PIPE, preexec_fn=chdir)
- gzipped = gzip.open(initrd.fpath, "wb")
- gzipped.write(cpio.stdout.read())
- gzipped.close()
- compress = self.compress_initrd_pipe(stdin=cpio.stdout,
- stdout=open(initrd.fpath,"wb"))
- compress.communicate()
# move modules out of the tree again shutil.move(joinpaths(self.root, "modules", kernel.version), -- 1.7.4
Anaconda-devel-list mailing list Anaconda-devel-list@redhat.com https://www.redhat.com/mailman/listinfo/anaconda-devel-list
On Wed, 2011-02-16 at 03:22 -0500, Martin Gracik wrote:
Why do you want to remove the gzip python module calls, and replace them with an external call to gzip?
Mostly just for consistency with the other compressor call - plus it let me remove an 'import' and a few lines of code, plus it would make it easier to replace gzip with another compression program like xz. Except module-init-tools only supports .gz files. So.. yeah. Not that useful.
I'll send a much smaller patch that just does the xz compression.
-w
On Wed, 2011-02-16 at 16:38 -0500, Will Woods wrote:
On Wed, 2011-02-16 at 03:22 -0500, Martin Gracik wrote:
Why do you want to remove the gzip python module calls, and replace them with an external call to gzip?
Mostly just for consistency with the other compressor call - plus it let me remove an 'import' and a few lines of code, plus it would make it easier to replace gzip with another compression program like xz. Except module-init-tools only supports .gz files. So.. yeah. Not that useful.
I'll send a much smaller patch that just does the xz compression.
-w
Anaconda-devel-list mailing list Anaconda-devel-list@redhat.com https://www.redhat.com/mailman/listinfo/anaconda-devel-list
I tried to use python modules if we have them instead of calling some binaries. I hope there will be the PyLZMA module in fedora soon, so we can change the xz call in the future too.
I have the lzma code in my local branch, I was trying it before we went to Tempe, so you don't need to bother with it.
But, does it require any changes in anaconda? Or can we push the patch to the repo and start using lzma right away?
On Thu, 2011-02-17 at 08:50 +0100, Martin Gracik wrote:
On Wed, 2011-02-16 at 16:38 -0500, Will Woods wrote:
On Wed, 2011-02-16 at 03:22 -0500, Martin Gracik wrote:
Why do you want to remove the gzip python module calls, and replace them with an external call to gzip?
Mostly just for consistency with the other compressor call - plus it let me remove an 'import' and a few lines of code, plus it would make it easier to replace gzip with another compression program like xz. Except module-init-tools only supports .gz files. So.. yeah. Not that useful.
I'll send a much smaller patch that just does the xz compression.
-w
Anaconda-devel-list mailing list Anaconda-devel-list@redhat.com https://www.redhat.com/mailman/listinfo/anaconda-devel-list
I tried to use python modules if we have them instead of calling some binaries. I hope there will be the PyLZMA module in fedora soon, so we can change the xz call in the future too.
I have the lzma code in my local branch, I was trying it before we went to Tempe, so you don't need to bother with it.
But, does it require any changes in anaconda? Or can we push the patch to the repo and start using lzma right away?
http://koji.fedoraproject.org/koji/packageinfo?packageID=10571
there's pyliblzma, is that gonna work?
-sv
yes, that should do it, thanks
--
Martin Gracik
----- Original Message -----
On Thu, 2011-02-17 at 08:50 +0100, Martin Gracik wrote:
On Wed, 2011-02-16 at 16:38 -0500, Will Woods wrote:
On Wed, 2011-02-16 at 03:22 -0500, Martin Gracik wrote:
Why do you want to remove the gzip python module calls, and replace them with an external call to gzip?
Mostly just for consistency with the other compressor call - plus it let me remove an 'import' and a few lines of code, plus it would make it easier to replace gzip with another compression program like xz. Except module-init-tools only supports .gz files. So.. yeah. Not that useful.
I'll send a much smaller patch that just does the xz compression.
-w
Anaconda-devel-list mailing list Anaconda-devel-list@redhat.com https://www.redhat.com/mailman/listinfo/anaconda-devel-list
I tried to use python modules if we have them instead of calling some binaries. I hope there will be the PyLZMA module in fedora soon, so we can change the xz call in the future too.
I have the lzma code in my local branch, I was trying it before we went to Tempe, so you don't need to bother with it.
But, does it require any changes in anaconda? Or can we push the patch to the repo and start using lzma right away?
http://koji.fedoraproject.org/koji/packageinfo?packageID=10571
there's pyliblzma, is that gonna work?
-sv
On Thu, 2011-02-17 at 08:50 +0100, Martin Gracik wrote:
I have the lzma code in my local branch, I was trying it before we went to Tempe, so you don't need to bother with it.
But, does it require any changes in anaconda? Or can we push the patch to the repo and start using lzma right away?
It can be pushed immediately. initrd decompression is handled by the kernel, and the F15 kernel already supports lzma/xz, so no anaconda changes are necessary.
The most significant reason we might want to use xz instead of lzma is integrity checking - gzip and xz use crc32, lzma has none.
-w
On 02/17/2011 06:40 AM, Will Woods wrote:
The most significant reason we might want to use xz instead of lzma is integrity checking - gzip and xz use crc32, lzma has none.
That's not really true. If the header is OK and if lzma decompression reaches EOF on input with the expected state (0==accumulator && bytes_written==original_length), then that is an integrity check that is broadly equivalent to crc32. lzma decompression is equivalent to a "arithmetic long division" of the input encoded representation; crc32 is a "polynomial long division" of the bitstring.
The value added by crc32 is low. Because crc32 is orthogonal to the algorithmic check, then the probability that crc32 catches an otherwise-undetected error is 2**-32.
The cost of crc32 is high. crc32 pollutes the data cache, often equivalent to flushing a major portion of L1. In the name of speed, common implementations use many kilobytes of tables. The adler32 checksum is *MUCH* better: no tables, less code, faster, no cache pollution. adler32 is about 1/4096 less powerful (65521/65536) in detecting impostors. crc32 is trivial in hardware and has mindshare. But in software, crc32 should be replaced by adler32.
On Thu, 2011-02-17 at 07:54 -0800, John Reiser wrote:
On 02/17/2011 06:40 AM, Will Woods wrote:
The most significant reason we might want to use xz instead of lzma is integrity checking - gzip and xz use crc32, lzma has none.
That's not really true. If the header is OK and if lzma decompression reaches EOF on input with the expected state (0==accumulator && bytes_written==original_length), then that is an integrity check that is broadly equivalent to crc32. lzma decompression is equivalent to a "arithmetic long division" of the input encoded representation; crc32 is a "polynomial long division" of the bitstring.
Okay then - no real useful difference between xz and lzma, at least for our purposes here.
The value added by crc32 is low. Because crc32 is orthogonal to the algorithmic check, then the probability that crc32 catches an otherwise-undetected error is 2**-32.
The cost of crc32 is high. crc32 pollutes the data cache, often equivalent to flushing a major portion of L1. In the name of speed, common implementations use many kilobytes of tables. The adler32 checksum is *MUCH* better: no tables, less code, faster, no cache pollution. adler32 is about 1/4096 less powerful (65521/65536) in detecting impostors. crc32 is trivial in hardware and has mindshare. But in software, crc32 should be replaced by adler32.
Okay, sure. As soon as there's a --check=adler32 switch for xz, and the kernel will handle the resulting image, we'll switch.
-w
On 02/14/2011 10:05 AM, Will Woods wrote:
xz compression makes the initrd 33% smaller (136M -> 90M).
Another 1.96MB can be squeezed from the kernel drivers (lib/modules/.../*.ko...).
From a recent F15 alpha RC of install DVD:
10.608MB tree of *.ko.gz-9 23.608MB tree of *.ko 8.984MB tree of *.ko.xz-9 23.192MB cpio *.ko 7.022MB (cpio *.ko).xz-9 11.236MB tree of *.ko.gz-2
The 1.96MB is obtained from (8.98MB - 7.02MB). Build the initrd as the xz-compressed cpio archive containing the _uncompressed_ *.ko. The kernel applies "xz -d", "cpio --extract", and "gzip -c -2", getting a tree of *.ko.gz-2 that is (11.236MB - 10.608MB) = 0.628MB larger than before, at a cost of about 1 second for re-compressing while constructing the initramfs. But in the meantime initrd.img is about 2MB smaller.
anaconda-devel@lists.fedoraproject.org