On Wed, 10 Apr 2013 10:33:43 -0500
Chris Adams <cmadams(a)hiwaay.net> wrote:
The metadata starts in XML before being loaded into an SQLite DB
file,
and the XML is in the repodata directory with the DB. However, both
are compressed, as they are large. For example, the current
updates/18/x86_64 XML is over 34M (5M gzip compressed), and the DB is
41M (9M bzip2 compressed). I'm guessing there are historical reasons
why different compression is used; both could be made noticeably
smaller with xz (XML to just over 3M, DB to 7M), but that's still a
lot of data to download (and there are also other metadata files that
have to be downloaded sometimes, especially the filelists.xml.gz,
which is 10M gzip compressed).
I'm not sure when the XML is downloaded instead of (or in addition to)
the DB, but it does appear to happen (I see one example in my mirror
server web logs this morning for example).
Here's how it works.
the xml metadata put together over a decade ago. It is the canonical
representation of the metadata.
The sqlite was added maybe 8ish years ago as a way of more quickly
reading the same data and not eating up so much memory. At the time
bzip2 was the new hotness so we used it instead of gz.
the primary, filelists and other xml should not ever be downloaded at
this point unless you hit a mirror which is out of sync, badly.
the only xml files that should be getting downloaded:
1. repomd.xml - it's fairly small and the index for everything else
2. comps.xml (or groups.xml) - which is where comps is stored per-repo
3. updatemd.xml which is just the security/update info for describing
updates
yum will grab repomd.xml and look to see if it is newer than what it
has already. Then go from there about updating the rest of the metadata.
Hope that helps explain it a bit more.
-sv