Back Again

Sam Varshavchik mrsam at courier-mta.com
Wed Aug 1 03:30:15 UTC 2007


Todd Zullinger writes:

> Thank you for the detailed reply Sam. :)
> 
> Excuse me for asking more questions (potentially dumb questions at
> that).
> 
> Sam Varshavchik wrote:
>> I don't know if you've ever upgraded Fedora from one release to the
>> next.  The upgrade process is as slow as molasses, even though all
>> the metadata is right there.
> 
> No, I avoid upgrades.  I always do fresh installs as a matter of
> habit.  Point taken though.  I have read a lot of complaints of slow
> upgrades at the dependency resolution stage.
> 
>> A few years ago the base distro was much smaller than it it now. The
>> size of a typical Linux distro has really balooned. Some of the
>> algorithms in rpm scale horribly. It wasn't such a big deal when a
>> typical linux distro was only a few hundred packages, but now it's a
>> few thousand packages, with dependencies that are much more
>> complicated, and rpm is now really blowing apart at the seams.
> 
> I haven't looked at the code, but is it rpm or yum that's really
> bogging down?  Or aren't you making much of a distinction when you say
> rpm?

I'd break it down as about 70% yum vs. 30% rpm.  Yum is really taking its 
sweet time figuring out what it needs to do. But even after it's done that, 
and downloaded everything, rpm still tends to spin its wheels, if it has a 
large list of packages to chew through.

>> Furthermore, rpm, as is, does not implement remote repositories.
> 
> Does it need to?  Does dpkg do this?
> 
>> With a large repository, like Fedora, even a compressed XML file is
>> going to end up being rather huge. Then, you have to uncompress it
>> and parse it.  And, XML parsing is also not exactly a light task.
> 
> But somehow or another you need to deal with a sizable chunk of data
> to make reasonable decisions regarding dependencies.  The tough part
> about rpm development is trying to be backward compatible and still
> make forward progress.  I don't envy the guys hacking on rpm.

You do /not/ need that much info in the first step. All you need is a just a 
list of names of packages available on the remote repository. You reconcile 
that against the list of packages you already have downloaded the metadata 
for, and you then know what's new.

Meanwhile, primary.xml.gz is actually a voluminous XML file that contains 
not just each package's name and version, but also all sorts of extra info. 
And you have to download the whole thing every time. And, the current 
version of yum, sqlite-based, does not help. I see that primary.sqlite.bz2 
is about twice as large as primary.xml.gz.

So, all this talk of a database-based yum, and it turns out that you end up 
having to download /twice/ as much data as you used to before? Someone 
explain to me what we're supposed to be doing here.

Let's look at repodata.  Right now, for fedora updates, 7/i386/repodata, we 
have this:

total 16904
drwxr-xr-x 2 root root    4096 2007-07-30 12:23 .
drwxr-xr-x 3 root root  159744 2007-07-30 12:23 ..
-rw-r--r-- 1 root root 2676161 2007-07-30 12:23 filelists.sqlite.bz2
-rw-r--r-- 1 root root 2703076 2007-07-30 12:22 filelists.xml.gz
-rw-r--r-- 1 root root 4603154 2007-07-30 12:23 other.sqlite.bz2
-rw-r--r-- 1 root root 5249048 2007-07-30 12:22 other.xml.gz
-rw-r--r-- 1 root root 1122990 2007-07-30 12:23 primary.sqlite.bz2
-rw-r--r-- 1 root root  732021 2007-07-30 12:22 primary.xml.gz
-rw-r--r-- 1 root root    1953 2007-07-30 12:23 repomd.xml

>From what I see yum is doing, it download the primary, the other file, and 
possibly filelists, /every/ time a single package gets added to the 
repository. Even though 99% of the content is the same as before.

This, in my opinion, does not really such an optimum design to me. You 
should /not/ have to download /everything/ every time a single package 
changes.

>> Remote package repositories could've been implemented much better.
>> When I had some free time some time ago, I quickly hacked up a
>> package manager for some of my internally-developed software. I
>> found that I could do similar kind of package metadata
>> synchronization much more efficiently than yum/rpm.
> 
> Isn't the harder part doing this in a way that doesn't completely
> break backward compatibility though?  And then you have to spend a
> bunch of years adding new code to deal with the odd sorts of deps that
> packagers come up with in the wild (versioned obsoletes on a multilib
> system sounds fun :).
> 
> Someone posted to the fedora-devel list a month or so ago saying
> they'd created a super fast depsolver using php and mysql.  Once all
> of the various cases they'd missed were explained, things didn't go
> much further.  (And no, I'm not at all suggesting that applies to your
> work -- it's obvious that you know more than that and that you
> actually created a working system. :)

In my case, I had no intention of bending over backwards in order to stay 
compatible with rpm. The whole point was do this better, have a clean start, 
and a clean design, and then provide later a shim layer that imports rpm's 
dependencies. And my design has far more sophisticated dependency design 
than rpm. All the extra hackery that's done now with kernel packages, which 
support third party out-of-tree kernel modules using a yum plug in -- all of 
that is broomed away and the additional logic becomes incorporated in the 
overall design, rather than an aftermarket add-on hack. Ditto for the epoch 
hack -- my solution fixes the original underlying reason for having an 
epoch in the first place.

And, of course, php+mysql will always a lot of overhead. No matter what you 
do there, you will always be left in the dust by carefully-designed, 
compiled C++ code. No matter how you twist in turn, you'll always have to: 
compile php code, interpret php code, generate SQL, send the SQL over a 
communication channel to the mysql db engine, have your SQL parsed, query 
plan formed, then finally processed by the mysql engine, and finally 
returning the resulting data. The C++ equivalent: run already-compiled code. 
Done.

>> metadata file you want to download, you can use HTTP 1.1 partial
>> chunk request feature to download just the bits of the metadata file
>> that you want.
> 
> Perhaps you should bribe someone to implement this in yum as a proof
> of concept?

Well, I can point them to how HTTP 1.1 chunking works, and how to gracefully 
autodetect if the HTTP server supports HTTP 1.1 chunking, and the logic to
gracefully fall back to "Plan B", if the repository's HTTP server is running 
old Apache without HTTP 1.1 support, and what to do next. That's about all I 
can do. I won't write the code, I have plenty of other coding work that 
keeps me busy.

>> But then, after all is said and done, no amount of tweaks to rpm can
>> compensate for stupid and broken packaging. Right now, due to
>> indirect dependencies, grub requires *GTK* runtime libraries to be
>> installed. On my headless machine, I now have to plop down a
>> crapload of x.org and GTK RPMs, because grub requires them, due to
>> its intermediate dependencies.
> 
> Yeah.  This was caused by policy more than by incompetence.  The folks
> at Red Hat's legal department asked that all of the trademarked logos
> be kept in one package, for easier tracking and removal by downstream
> users of Fedora's packages (or something like that).

It's not that trademarked logos must be kept in one package. It's just that 
the package, for some reason that I still can't fathom, must depend on gtk2 
code libraries. Why would a package that supposedly contain nothing more 
than a bunch of logo image files, have a needed dependency on a package that 
contains system libraries? That just does not compute.

I haven't really looked at it, but the probable story is that gtk2-engine or 
the gnome-themes package also includes some shell script that the logos 
package needs for some reason, so rather than separating it out into a 
subpackage, which would be the proper thing to do, you have to install the 
whole bloody thing, and because gtk2 requires all xorg core libraries, that 
ends up getting sucked down the drain as well.

Although this does not have any direct relevance to the overall issue of 
rpm's design, it is demonstrative, though, of the same kind of inefficient 
non-attention to detals.

-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.fedoraproject.org/pipermail/users/attachments/20070731/a5b97960/attachment-0002.bin 


More information about the users mailing list