Back Again

Sam Varshavchik mrsam at courier-mta.com
Wed Aug 1 02:59:35 UTC 2007


David Boles writes:

> on 7/31/2007 7:03 PM, Sam Varshavchik wrote:
>> Todd Zullinger writes:
>> 
>>> Sam Varshavchik wrote:
>>>> Nah, it's not closer. It's just that rpm is getting crappier every
>>>> year, and is long overdue for replacement.
>>> I could easily be mistaken, but AFAIK, the main difference in speed
>>> that end users notice between yum and apt is due to the fact that apt
>>> caches it's metadata.  In between runs of apt-get update, calls to
>>> apt-get use the data on disk without hitting the network.  With yum,
>>> the update and upgrade steps from apt-get are both done in the update.
>> 
>> I don't know if you've ever upgraded Fedora from one release to the next. 
>> The upgrade process is as slow as molasses, even though all the metadata is 
>> right there.
> 
> 
> Do you know just why an upgrade of a system 6 months old, or more, takes
> longer than a fresh install of a new release? You should study that
> situation. Start with package dependencies and then think about just what
> you might have changed and added from third party sites. Then think some more.

Well, I did think. The system does not have anything beyond Fedora and 
Fedora Extras, plus my own RPMs. But why does it matter, anyway? Why does 
the presence of a foreign RPM cause such a nervous breakdown? At most it 
should result in an unsatisifed dependency. But why would should this result 
in rpm spinning its wheels, to such an extent?

> Care for a really stupid example? Take a 2006 automobile. Examine it very
> closely. Then with a garage full of new 2007 parts make it a 2007
> automobile. All the time making sure that everything fits and still works.
> 
> Email us when you're finished.

No matter which parts you do have in your automobile and where they came 
from, when you have to compare its part with a fixed list of two thousand 
other parts, from a reference model, it should take the exact same amount of 
time whether all your parts are OEM or aftermarket. It's the same number of 
parts in your car, whether original or replacement, after all. So why would 
it matter?

At most, the complexity of what RPM has to do would be O(N), and it should 
really be O(log N). But it seems, though, that RPM's actual complexity is at 
least O(N^2), unscientifically.

I tell you this. I mentioned before that I use my own package management 
tool internally to manage some homebrewed software. I have a compatilibity 
shim that sucks out pretty much the entire contents of the system RPM 
database, and imports all of the dependencies into my internal package 
database. This is to allow my own packages, which might have, say, a 
dependency on something.so, have the dependencies satisified by an RPM.

Basically, I read all RPM resources, and create a dummy package that 
provides those resources, then install the dummy package, so my internal 
package database contains all the RPM-provided resources. Each time I update 
some RPMs, I rerun the import script and upgrade the old dummy RPM 
compatibility package to a new one.

This operation, you understand of course, is analogous to your example -- 
taking an old snapshot of the entire RPM database, comparing it to a new 
one, and reconciling any differences against resources required by my 
internal packages, to make sure that they don't break. This operation is 
also equivalent, to what Anaconda has to do when it's about to upgrade the 
Fedora distro -- take the current RPM database, and reconcile it with the 
RPM database from the release you're updating to.

It takes me, oh, maybe a minute or so to crunch everything together. The 
analogous step in Anaconda -- "Preparing transaction" -- takes aout 5-10 
minutes.

And I actually have more work to do. RPM has, I believe, three resources 
classes to reconcile against each other -- provided resources, required 
resources, and conflicting resources.  My internal package database has six 
resource classes to reconcile, so I actually have more work to do.

The performance degradation that I see in Anaconda is far more pronounced on 
less-robust hardware. On my less-than one year old laptop, with a fairly 
speedy Pentium, and 2 gigs of RAM, Anaconda is about 2-3 times slower than 
my homegrown code. On an old box that I have, running a pair of decade-old 
(approx) 500 Mhz Celerons, with 256MB RAM, rpm is dreadfully slow -- about 
10-15 times slower than my homegrown code. There's something terribly 
inefficient in the way that Anaconda goes about its business. It should 
/not/ take that long to do its duty.

Some of it might be due to Anaconda being Python code, and my homegrown code 
being C++.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://lists.fedoraproject.org/pipermail/users/attachments/20070731/75f8da09/attachment-0002.bin 


More information about the users mailing list