sorting yum/dnf metadata and metadata diffs

Casey Jao casey.jao at gmail.com
Fri Feb 13 07:11:33 UTC 2015


How feasible would it be to keep the listings in primary.xml and
filelists.xml sorted by package name and arch? Doing so could open the door
to simple and efficient diffs of repository metadata.

I recently ran some quick tests using python and elementtree. While the F21
primary.xml files from 2/7 and 2/9 both weigh around 2.6M compressed and
~18M uncompressed, sorting them and running a simple line-by-line
comparison revealed a diff of ~500K, which compressed down to ~70K. A
similar procedure on the 8M filelists.xml yielded a diff which compressed
to ~200K.

Those two are by far the largest metadata files. If the observed
improvements are typical, then keeping those files in order and hosting the
diffs between the present and the previous few days (and modifying dnf to
look for those diffs) could substantially reduce the amount of data that
users must download every time a repository is updated, which for a
fast-moving OS like Fedora could happen nearly every day.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.fedoraproject.org/pipermail/devel/attachments/20150212/60148a9e/attachment.html>


More information about the devel mailing list