Hi everyone, I ve got a strange behavior with bz2 and gzip. When i compress a small folder around 225 Mo the compression goes to 49 Mo for gzip 46 mo for bzip2
When i compress a bigger folder around 5 Go the compression goes to 2.6 Go for gzip 3.1 mo for bzip2
Bzip2 has a better compress for Small folder and gzip has a better one for big folder. Has anyone of you has the same effect.
Maybe gzip did'nt compress everything how can i check this. Should i be worried. It's mostly office document. Happy new year,
Franck
-- Franck
On Sun, 2006-01-01 at 19:38 -0500, Franck Y wrote:
Hi everyone, I ve got a strange behavior with bz2 and gzip. When i compress a small folder around 225 Mo the compression goes to 49 Mo for gzip 46 mo for bzip2
When i compress a bigger folder around 5 Go the compression goes to 2.6 Go for gzip 3.1 mo for bzip2
Bzip2 has a better compress for Small folder and gzip has a better one for big folder. Has anyone of you has the same effect.
Maybe gzip did'nt compress everything how can i check this. Should i be worried. It's mostly office document. Happy new year,
It is really dependent upon what it is you are compressing. If you compress a bunch of files like text files with a lot of "empty" space inside them, you will get really good compression ratios. If you compress a bunch of files with compressed imaged (like PNG or JPG), then you won't get very good compression at all.
On my system, I used gzip and bzip2 to compress /etc just as a test:
[root@ml110 ~]# tar zcf etc.tgz /etc/ tar: Removing leading `/' from member names [root@ml110 ~]# tar jcf etc.tar.bz2 /etc/ tar: Removing leading `/' from member names [root@ml110 ~]# ls -lh etc.t* -rw-r--r-- 1 root root 6.0M Jan 1 18:49 etc.tar.bz2 -rw-r--r-- 1 root root 8.4M Jan 1 18:48 etc.tgz
So on my systems, bzip2 beat the heck out of gzip.
Thomas
On Sun, 2006-01-01 at 21:06 -0600, Thomas Cameron wrote:
On my system, I used gzip and bzip2 to compress /etc just as a test:
[root@ml110 ~]# tar zcf etc.tgz /etc/ tar: Removing leading `/' from member names [root@ml110 ~]# tar jcf etc.tar.bz2 /etc/ tar: Removing leading `/' from member names [root@ml110 ~]# ls -lh etc.t* -rw-r--r-- 1 root root 6.0M Jan 1 18:49 etc.tar.bz2 -rw-r--r-- 1 root root 8.4M Jan 1 18:48 etc.tgz
So on my systems, bzip2 beat the heck out of gzip.
Yeah - it really depends. If I remember correctly - I was playing with bzip2 and gzip for man pages on my lfs system, and for the man pages - gzip was both faster decompression (when man wanted to read them) and smaller - in general, but for info pages - bzip2 saved a LOT of space - I think the more data you have (text files anyway), the better compression you get out of bzip2.
I pretty much just use gzip though for everything - the extra space isn't that much and gzip is pretty standard everywhere.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
On 2 Jan 2006 at 2:16, Michael A. Peters wrote:
From: "Michael A. Peters" mpeters@mac.com To: fedora-list@redhat.com Date sent: Mon, 02 Jan 2006 02:16:23 -0800 Subject: Re: Gzip better than Bz2 ? Normal? Send reply to: For users of Fedora Core releases fedora-list@redhat.com mailto:fedora-list-request@redhat.com?subject=unsubscribe mailto:fedora-list-request@redhat.com?subject=subscribe
On Sun, 2006-01-01 at 21:06 -0600, Thomas Cameron wrote:
On my system, I used gzip and bzip2 to compress /etc just as a test:
[root@ml110 ~]# tar zcf etc.tgz /etc/ tar: Removing leading `/' from member names [root@ml110 ~]# tar jcf etc.tar.bz2 /etc/ tar: Removing leading `/' from member names [root@ml110 ~]# ls -lh etc.t* -rw-r--r-- 1 root root 6.0M Jan 1 18:49 etc.tar.bz2 -rw-r--r-- 1 root root 8.4M Jan 1 18:48 etc.tgz
So on my systems, bzip2 beat the heck out of gzip.
Yeah - it really depends. If I remember correctly - I was playing with bzip2 and gzip for man pages on my lfs system, and for the man pages - gzip was both faster decompression (when man wanted to read them) and smaller - in general, but for info pages - bzip2 saved a LOT of space - I think the more data you have (text files anyway), the better compression you get out of bzip2.
I pretty much just use gzip though for everything - the extra space isn't that much and gzip is pretty standard everywhere.
--
To add another compression to the list, you might want to look at lzop. I use it with g4l to create disk images. It has the option of using gzip, bzip, and lzop. I find that lzop is faster to compress than gzip. With an 80GB drive and 3 OS's it takes about 2 hours for gzip to compress the drive, and only about 1 hour for lzop. Both seem to decompress in about 50 minutes, so it is with compression time. I tried to use bzip2 for the compression, but it was just taking way longer the gzip.
So, if compression time is an issue, lzop can be much faster. It does create and image that is about 10% to 12% larger than with gzip, but for me, half the time was more of an issue.
fedora-list mailing list fedora-list@redhat.com To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
+----------------------------------------------------------+ Michael D. Setzer II - Computer Science Instructor Guam Community College Computer Center mailto:mikes@kuentos.guam.net mailto:msetzerii@gmail.com http://www.guam.net/home/mikes Guam - Where America's Day Begins +----------------------------------------------------------+
http://setiathome.berkeley.edu Number of Seti Units Returned: 19,471 Processing time: 32 years, 290 days, 12 hours, 58 minutes (Total Hours: 287,489)
BOINC Seti@Home Total Credits 203204.188247
On Sunday 01 January 2006 8:38 pm, Franck Y wrote:
Maybe gzip did'nt compress everything how can i check this. Should i be worried. It's mostly office document. Happy new year,
Check this article:
Hello Franck,
On Sun, 1 Jan 2006 19:38:13 -0500 Franck Y franck110@gmail.com wrote:
Hi everyone, I ve got a strange behavior with bz2 and gzip. When i compress a small folder around 225 Mo the compression goes to 49 Mo for gzip 46 mo for bzip2
When i compress a bigger folder around 5 Go the compression goes to 2.6 Go for gzip 3.1 mo for bzip2
Bzip2 has a better compress for Small folder and gzip has a better one for big folder. Has anyone of you has the same effect.
Maybe gzip did'nt compress everything how can i check this. Should i be worried. It's mostly office document. Happy new year,
Define "better"? This might be a bit subjective, and I guess that you deal w/ the size criteria. Another critical factor in compression is.. time. bzip2 is *really* slower than gzip.
Here, etc.tar.bz2 is 5.8MB but takes 17sec, and etc.tar.gz is 8MB.. in 4sec.
Regards,
Franck Y wrote:
Hi everyone, I ve got a strange behavior with bz2 and gzip. When i compress a small folder around 225 Mo the compression goes to 49 Mo for gzip 46 mo for bzip2
When i compress a bigger folder around 5 Go the compression goes to 2.6 Go for gzip 3.1 mo for bzip2
[snip]
Not surprising at all. It is a theorem that, for any compression algorithm which makes the compressed version of some file smaller than the uncompressed version, there is a file for which the compressed version is *larger* than the uncompressed version. Compression algorithms all make assumptions about the form of the input data. How well any given collection of files matches those assumptions determines how well the algorithm works on that collection.
Mike
Mike McCarty wrote:
Franck Y wrote:
Hi everyone, I ve got a strange behavior with bz2 and gzip. When i compress a small folder around 225 Mo the compression goes to 49 Mo for gzip 46 mo for bzip2
When i compress a bigger folder around 5 Go the compression goes to 2.6 Go for gzip 3.1 mo for bzip2
[snip]
Not surprising at all. It is a theorem that, for any compression algorithm which makes the compressed version of some file
Let me ammend that...
"... for any *lossless* compression algorithm..."
Mike
It is a theorem that, for any [lossless] compression algorithm which makes the compressed version of some file smaller than the uncompressed version, there is a file for which the compressed version is *larger* than the uncompressed version.
It's also true that the amount of expansion in those cases never has to be more than one additional bit. -- Deron Meranda
Deron Meranda wrote:
It is a theorem that, for any [lossless] compression algorithm which makes the compressed version of some file smaller than the uncompressed version, there is a file for which the compressed version is *larger* than the uncompressed version.
It's also true that the amount of expansion in those cases never has to be more than one additional bit. -- Deron Meranda
Umm, I suppose that you mean that there is *another* compression algorithm which produces, for any file which is actually shrunk, an output which is the same except for being one bit longer, and which, for all the files which the original actually grows, produces an output which is only one bit longer than the uncompressed file.
There are lossless compression algorithms for which what you said is not true.
Mike
Mike McCarty mike.mccarty@sbcglobal.net writes:
There are lossless compression algorithms for which what you said is not true.
I can't believe that. You can have a one-bit flag to indicate compression on or off. If the compression algorithm produces more output than input, then you can save the flag "0" plus the original file contents. If the compression is an improvement then you save the flag "1" plus the compressed data. So successfully compressed files are one bit larger than they could have been, with the benefit that uncompressible files do not get bloated by more than one bit.
Donald Arseneau wrote:
Mike McCarty mike.mccarty@sbcglobal.net writes:
There are lossless compression algorithms for which what you said is not true.
I can't believe that. You can have a one-bit flag to indicate compression on or off. If the compression algorithm produces more output than input, then you can save the flag "0" plus the original file contents. If the compression is an improvement then you save the flag "1" plus the compressed data. So successfully compressed files are one bit larger than they could have been, with the benefit that uncompressible files do not get bloated by more than one bit.
But you just *changed* algorithms. If you read carefully what I wrote, in full, you'll see that I said that there is *another* algorithm producing the same output, except one bit longer, for files which actually get smaller, and one bit longer than the original for files which get larger.
Read carefully what I wrote, and I think you'll see things in a different light.
Surely you don't claim that one single algorithm can produce different outputs when run successively with the same input?
Mike
Mike McCarty mike.mccarty@sbcglobal.net writes:
But you just *changed* algorithms. If you read carefully what I wrote, in full,
I did; understood; but not quoted in full.
you'll see that I said that there is *another* algorithm producing the same output,
Well, Deron said that, and you clarified what he said ("I suppose that you mean that there is *another* compression algorithm which...") and then you made a claim about this thesis: "There are lossless compression algorithms for which what you said is not true." So your written, or as read, claim was that "There are lossless compression algorithms for which it is not true that there exists another algorithm which..."
:-)
Yes, I see now what you were getting at, but it was a straw man: Deron never claimed that *all* algorithms are limited to bloating by 1 bit! (Algorithm: copy contents and append one byte.) He said there never has to be more than one bit added.
Or maybe he did, though I doubt he meant it that way. I'm content that everyone is in agreement.
Thanks for clarifying Mike, although Donald was correct in pointing out that I didn't make the claim for *all* algorithms. But I wasn't as explicit as perhaps I should have been. Given any lossless algorithm A, it's always possible to trivially derive an algorithm A2 which adds one bit but prevents bloating.
Of course the 1-bit maximum expansion claim assumes an O(n)-SPACE algorithm. Meaning that it's not always practical when compressing very large data sets, or for stream-compressors, as the program will have to attempt to compress the entire data (or most of it) before it sees it's going to bloat it.
For streaming compressors, it's still possible to get an O(1)-SPACE (constant space) by using a blocking method. You divide the input into blocks of K-bits, and then you only need to add one bit for each block. So the potential extra bloat is no longer a minimal 1-bit, but it's still a very small constant percentage of the total data size. There are surely other ways to do this too.
In a practical sense though, what all this means is that file bloating should not be a concern for any modern compression program. Unlike for instance the old Unix compress(1) program which can bloat quite badly. -- Deron Meranda