On Fri, 25 Sep 2020 12:10:22 +0200, Zbigniew Jędrzejewski-Szmek wrote:
I'm missing some good statistics.
I have 1.6TB of statistics, ask me anything. It is calculated by my scripts:
https://git.jankratochvil.net/?p=massrebuild.git;a=tree
git clone
git://git.jankratochvil.net/massrebuild
> * DWZ advantage: On the whole Fedora distro it saves 3.3% (5GB
of the
> 157GB distribution size)
What is this comparing? Is this the size of binary rpm or the
installation-on-disk footprint?
I am usually talking about *-debuginfo.rpm size.
Another possible number is separate *.debug files download (DWZ is then 6%
bigger than -fdebug-types-section due to the associated DWZ common files).
I would love to see a comparison of numbers for three things:
- raw debuginfo without dwz or -fdebug-types-section
Oops, I do not have this number, I can run new massrebuild, it takes about
4 days (depending on availability of beefy machines).
- debuginfo with dwz (current approach)
rpm size: 35186079102
disk size: 177913332940
- debuginfo with -fdebug-types-section
rpm size: 37570327765
disk size: 214927514757
= DWZ rpm size is smaller by 6.78%
= DWZ on-disk size is smaller by 20.8%
It is based on 22080 Fedora Rawhide packages rebuilt on 2020-08-24.
For each of those three categories both measures (rpm size and
on-disk size)
would be useful.
Another big variable is F-34 should be hopefully in DWARF-5 (F-33 is DWARF-4)
which will change the numbers a bit (unaware which way). Currently DWZ is not
yet ported to DWARF-5 so there is no way to compare it. Also DWZ does not plan
to support LLVM DWARF-5 so that will also skew such comparison even after its
port.
For on-disk size it will all get different by F-33 btrfs compression again
which should reduce the size by about 50% (which makes any
DWZ/-fdebug-types-section differences pointless). It will obviously make the
on-disk size difference smaller (than current 20.8%).
And finally on-disk size depends a lot on which *-debuginfo packages you have
installed which varies a lot when stddev is twice the average DWZ saving.
Could you provide numbers like this for some subset of packages
(20-30 packages that produce debuginfo would be enough to get a good measure).
Problem of these numbers is they depend too much on the chosen set of rpms
so 20-30 packages do not say anything.
DWZ against -fdebug-types-section saves for whole Rawhide 6.35% size total.
When averaged for each package it is 5.44% (that means DWZ saves more on
bigger-than-median packages) but stddev of the saving is +/-11%.
Packages where -fdebug-types-section is smallest (by difference in bytes):
70.11: julia-1.5.0-1.fc33.src.rpm -fdebug-types-section size=866936043 DWZ
size=1236511762
74.43: nodejs-14.7.0-1.fc33.src.rpm -fdebug-types-section size=921485027 DWZ
size=1238008099
77.84: mozjs78-78.1.0-1.fc33.src.rpm -fdebug-types-section size=623280098 DWZ
size=800743010
Packages where DWZ is smallest (by difference in bytes):
508.93: kea-1.7.9-3.fc33.src.rpm -fdebug-types-section size=1379013840 DWZ
size=270963319
143.07: paraview-5.8.1-1.fc33.src.rpm -fdebug-types-section size=11462175974 DWZ
size=8011695061
196.49: hpx-1.4.1-4.fc33.src.rpm -fdebug-types-section size=10981369919 DWZ
size=5588742102
All these sizes are for *-debuginfo.rpm.
The sizes depend strongly on the chosen subset of packages:
For example for ELN-like (*) distro the saving is not 6.35% but only 0.28%.
For Fedora 32 packages on my personal machines it is not 6.35% but 0.72%.
(*) I did use Fedora Rawhide subset for packages present in CentOS-8.2.
Also there is an opportunity for new non-DWZ optimization (orthogonal to
DWZ/-fdebug-types-section) which can save 5.96% of *-debuginfo.rpm with
clang-only draft implementation which requires no DWARF consumers modification
and it is easier to implement than to upstream+maintain the DWZ support for
LLDB.
I find that 3.3% number strange — it would mean that dwz is
essentially useless, but maybe I'm misunderstanding how it's defined.
F-32 x86_64 has 157GB total, debug/ is 82GB (6GB is *-debugsource):
6.35% * (82-6) / 157 = 3.07%
approx., the 3.3% was calculated with more exact distro size numbers.
I think we need to get some better understanding what the effects of
various approaches are before discussing which to pick.
Thanks for this discussion.
Jan