Software Management call for RFEs
alex at declera.com
Tue May 28 20:04:05 UTC 2013
On 28.05.2013 21:18, seth vidal wrote:
> On Tue, 28 May 2013 20:42:13 +0300
> Alek Paunov <alex at declera.com> wrote:
>> So, it seems that yum already have the "filelists on demand"
>> optimization implemented. Why you are asking for removing a feature,
>> which do not make the things worse ... ?
> I'm not.
> But when you download the filelists - it is A LOT of data.
It is of course :-). It is big and slow now, but it implements one more
distinguishing and convenient Fedora feature ... and under careful
schema and encoding, can be scaled down several times in both space and
Actually, every "positive" (install, update) yum operation implies
access to the repos. Repos contain everything. If our software was
perfectly optimized, not only filelists but all other parts of the
database (including primary.files, which you have cited initially)
should be lazily synced, right?
> I'd rather not have filedeps so it doesn't get pulled in for other
> things in depsolving.
Sorry, I do not know how this amount of data will impact libsolv in the
future. IMO, for yum (I mean in the sqlite based solution) it is a
matter of optimizations.
>> I have a few questions:
>> * What is the reasoning behind the splitting of the database across
>> many .sqlite files?
> many? it's 3 afaik. primary, filelists, other.
> how do you mean 'many?
Multiplied by the number of the repos. That is what I am trying to
understand - Why not just single .sqlite file for the whole yum database?
>> * Why the sql schema is so denormalized (IMO, leads to both
>> bandwidth and disk overspending without speed benefits)?. For
>> example: Why provides and requires tables do not use the common
>> domain table?
> B/c it was designed 8yrs ago and we were going for compressable space
> and making it as quick as possible to search?
In the provides and requires example, we do not have any space/speed
benefits achieved by the missing common domain (dependency +
dependency_evr tables). In the current situation we have fat and slow
text duplication and indexes instead of integer references to the domain
subnodes (dependencies is the biggest domain in the primary). Yes, in
bunch of cases a little denormalization is inevitable when we fight for
speed, but IMO, this and few other space flaws are with negative impact
on the speed too.
>> * Why the incremental update mechanism (eg. applying xml diffs to
>> the sqlite database) was not been considered from the very beginning?
> It wasn't necessary? There was a massively smaller number of pkgs to
Indeed. Also, 8 years ago the possibilities and the number of ideas to
reuse were definitely different :-)
More information about the devel