Compressing files (gz versus bz2 versus xz)

Stephen John Smoogen smooge at gmail.com
Thu Aug 26 23:35:48 UTC 2010


So I was asked to look at compression on log servers and to see if
changing to xz would save us some space. My test is not comprehensive
but showed what might happen.

Basic summary. XZ may save us up to 2% over what we are currently
saving but its real advantage is in speed of uncompressing files over
bzip2. [compression may be faster for some files also.]

File         |  Size   | Gzip  | G%   | Bunzip2 | B%   | XZ    | X%
messages.log |  644568 | 10992 | 98.3 |  4856   | 99.3 | 5940  | 99.1
mail.log     |  610816 | 65060 | 89.3 | 40836   | 93.3 | 35536 | 94.5
TOTAL        | 1255384 | 76052 | 93.5 | 45692   | 96.1 | 41476 | 96.5

Program      | Compression Time  | Uncompression Time
GZIP         | 00m43.416s        |  00m10.033s
BZIP         | 10m42.296s        |  01m02.525s
XZ           | 10m15.937s        |  00m12.565s


Raw data below

root at log01 smooge-b]# du -s messages.log mail.log
644568  messages.log
610816  mail.log
[root at log01 smooge-b]# time gzip -v -9 messages.log mail.log
messages.log:    98.3% -- replaced with messages.log.gz
mail.log:        89.3% -- replaced with mail.log.gz

real    0m43.416s
user    0m41.335s
sys     0m1.736s
[root at log01 smooge-b]# du -s messages.log.gz mail.log.gz
10992   messages.log.gz
65060   mail.log.gz
[root at log01 smooge-b]# time gunzip -v messages.log.gz mail.log.gz
messages.log.gz:         98.3% -- replaced with messages.log
mail.log.gz:     89.3% -- replaced with mail.log

real    0m10.033s
user    0m6.948s
sys     0m3.004s

[root at log01 smooge-b]# time bzip2 -v -9 messages.log mail.log
  messages.log: 133.043:1,  0.060 bits/byte, 99.25% saved, 659381328
in, 4956148 out.
  mail.log:     14.961:1,  0.535 bits/byte, 93.32% saved, 624854215
in, 41766136 out.

real    10m42.296s
user    10m36.948s
sys     0m1.608s
[root at log01 smooge-b]# du -sc messages.log.bz2 mail.log.bz2
4856    messages.log.bz2
40836   mail.log.bz2
45692   total
[root at log01 smooge-b]# time bunzip2 -v messages.log.bz2 mail.log.bz2
  messages.log.bz2: done
  mail.log.bz2:     done

real    1m2.525s
user    0m44.779s
sys     0m4.956s

[root at log01 smooge-b]# time xz -v -9 messages.log mail.log
messages.log (1/2)
  100.0 %             5,923.6 KiB / 628.8 MiB = 0.009   3.1 MiB/s         3:21

mail.log (2/2)
  100.0 %                34.7 MiB / 595.9 MiB = 0.058   1.4 MiB/s         6:53

real    10m15.937s
user    10m8.550s
sys     0m3.552s
[root at log01 smooge-b]# du -s messages.log.xz mail.log.xz
5940    messages.log.xz
35536   mail.log.xz
[root at log01 smooge-b]# time unxz -v messages.log.xz mail.log.xz
messages.log.xz (1/2)
  100.0 %             5,923.6 KiB / 628.8 MiB = 0.009   140 MiB/s         0:04

mail.log.xz (2/2)
  100.0 %                34.7 MiB / 595.9 MiB = 0.058    74 MiB/s         0:08

real    0m12.565s
user    0m8.709s
sys     0m3.636s



-- 
Stephen J Smoogen.
“The core skill of innovators is error recovery, not failure avoidance.”
Randy Nelson, President of Pixar University.
"We have a strategic plan. It's called doing things.""
— Herb Kelleher, founder Southwest Airlines


More information about the infrastructure mailing list