Better repodata performance
seth vidal
skvidal at phy.duke.edu
Sun Jan 30 21:14:52 UTC 2005
> I hope you're really not saying that, if I request to install package
> foo, that depends on bar, it will also download headers for baz, a
> totally unrelated package. I can see that we'd need headers for foo
> and bar, but not for baz. I thought the point of the xml files and
> the info on provides, filelists, etc, was precisely to enable the
> depsolver to avoid having to download the headers for every package.
Just so we don't go off into deeply uninformed space:
yum 2.0.X downloaded all the headers in the headers directory that it
did NOT have installed. It figured this out by reading header.info. This
file stored nevra + rpm location. So yum 2.0.X downloaded this file to
see what new headers it needed, downloaded them, then got on with the
process at hand.
> I'm wondering if it would be possible for a depsolver to create a
> (smaller) .hdr file out of info in the .xml files, and feed that to
> rpmlib for transaction-verification purposes. This would enable it to
> skip the download-header step before downloading the entire package.
Talk to Paul Nasrat - he was working on that a while ago but I think he
got stuck in some rabbit hole debugging something.
> Definitely. But couldn't we perhaps do it by intelligently filtering
> information out of the rpm header and, say, generating a single
> archive containing all of the info needed for depsolving and for
> rpmlib's transaction verification?
you can't do that b/c file conflicts CAN NOT be calculated via rpm w/o
having the full header and/or all the file information present.
> I was expecting depsolving wouldn't require all the headers. And from
> what I gather from your reply, it indeed doesn't.
it requires all the headers of the packages involved, yes.
> Let's consider two scenarios: 1) using up2date with yum-2.0 (headers/)
> repos (whoever claimed up2date supported rpmmd repodata/ misled me :-)
> and 2) using yum-2.1 (repodata/) repos.
>
> 1) yum 2.0
>
> 16MiB) initial download, distro's and empty updates's hdrs
>
> 8MiB) daily (on average) downloads of header.info for updates,
> downloaded by rhn-applet, considering an average size of almost
> 30KiB, for 40 weeks. (both FC2 and FC3 updates for i386 have a
> header.info this big right now)
>
> 16MiB) .hdr files for updates, downloaded by the update installer.
> Current FC2 i386 headers/ holds 9832KiB, whereas FC3 i386
> headers/ holds 8528KiB, but that doesn't count superseded
> updates, whose .hdr files are removed. The assumption is that
> each header is downloaded once. 16MiB is a guestimate, that I
> believe to be inflated. It doesn't take into account the
> duplicate downloads of header.info for updates, under the
> assumption that a web proxy would avoid downloading again what
> rhn-applet has already downloaded.
>
> ----
>
> 40MiB) just in metadata over a period of 9 months, total
>
> 2) yum 2.1
>
> 2.7MiB) initial download, distro's and empty updates'
> primary.xml.gz and filelists.xml.gz
>
> 68MiB) daily (on average) downloads of primary.xml.gz, downloaded by
> rhn-applet, considering an average size of 250KiB (FC2 updates's
> is 240KiB, whereas FC3's is 257KiB, plus about 1KiB for
> repomd.xml)
>
> 16MiB) .hdr files for updates, downloaded by the update installer
> (same as in case 1)
>
> 192MiB) filelists.xml.gz for updates, downloaded twice a week on
> average by the update installer, to solve filename dep.
>
> ----
>
> 278.7MiB) just in metadata over a period of 9 months, total
>
>
> Looks like a waste of at least 238.7 MiB per user per 9-month install.
> Sure, it's not a lot, only 26.5MiB a month, but it's almost 6 times as
> much data being transferred for the very same purpose. How is that a
> win? Multiply that by the number of users pounding on your mirrors
> and it adds up to hundreds of GiB a month.
> Another factor is that you probably won't need filelists.xml.gz for
> every update. Maybe I don't quite understand how often it is needed,
> but even if I have to download it only once a month, that's still
> 64MiB over 9 months, more than the 40MiB total metadata downloaded
> over 9 months by yum 2.0.
yum 2.1.x ONLY DOWNLOADS THE XML FILES WHEN IT NEEDS THEM.
go read the code and stop guessing.
it downloads repomd.xml everytime - that's < 1K.
it downloads primary.xml.gz if the file has changed - that's typically <
1M.
it downloads filelists.xml.gz only when there is a file dep that it
cannot resolve with primary.xml.gz.
> I don't know how yum 2.0 did it, but up2date surely won't even try to
> download a .hdr file if it already has it in /var/spool/up2date, so
> this is not an issue.
yum 2.0.x certainly DID NOT download a .hdr file it already had. Sheesh,
go read the code, stop making suppositions based on anecdotes.
> repodata helps the initial download, granted, but it loses terribly in
> the long run.
only as the number of file deps outside of /etc/* and *bin/* increases.
if you keep the file deps in those paths then repodata is a huge win.
-sv
More information about the devel
mailing list