sorting yum/dnf metadata and metadata diffs

Zdenek Kabelac zkabelac at redhat.com
Fri Feb 13 08:35:54 UTC 2015


Dne 13.2.2015 v 09:21 Marcin Juszkiewicz napsal(a):
> On 13.02.2015 08:11, Casey Jao wrote:
>> How feasible would it be to keep the listings in primary.xml and
>> filelists.xml sorted by package name and arch? Doing so could open the door
>> to simple and efficient diffs of repository metadata.
>
> Something like pdiffs in Debian?
>
>> Those two are by far the largest metadata files. If the observed
>> improvements are typical, then keeping those files in order and hosting the
>> diffs between the present and the previous few days (and modifying dnf to
>> look for those diffs) could substantially reduce the amount of data that
>> users must download every time a repository is updated, which for a
>> fast-moving OS like Fedora could happen nearly every day.
>
> If only amount of download data matters then why not compress
> primary.xml and filelists.xml with xz?
>
>   11646147 primary.xml.gz
>    8676976 primary.xml.xz
>   30607019 filelists.xml.gz
>   23661236 filelists.xml.xz
>
> But yeah, it can make dnf/yum use more cpu power to uncompress them each
> time they want to use that data.

IMHO you are solving the thing on the wrong end....

How about using some better data structures then this 'xml'?

Even splitting language description into separate files would be a big win...

But changes like this would really safe CPU & space massively....

XML in this size is highly inefficient - and since it's already distributed in 
compressed thus unreadable form, it already doesn't matter which format it is 
using....


Zdenek



More information about the devel mailing list