Software Management call for RFEs

Alek Paunov alex at declera.com
Tue May 28 20:04:05 UTC 2013


On 28.05.2013 21:18, seth vidal wrote:
> On Tue, 28 May 2013 20:42:13 +0300
> Alek Paunov <alex at declera.com> wrote:
>> So, it seems that yum already have the "filelists on demand"
>> optimization implemented. Why you are asking for removing a feature,
>> which do not make the things worse ... ?
>
> I'm not.
>
> But when you download the filelists - it is A LOT of data.

It is of course :-). It is big and slow now, but it implements one more 
distinguishing and convenient Fedora feature ... and under careful 
schema and encoding, can be scaled down several times in both space and 
query time.

Actually, every "positive" (install, update) yum operation implies 
access to the repos. Repos contain everything. If our software was 
perfectly optimized, not only filelists but all other parts of the 
database (including primary.files, which you have cited initially) 
should be lazily synced, right?

>
> I'd rather not have filedeps so it doesn't get pulled in for other
> things in depsolving.
>

Sorry, I do not know how this amount of data will impact libsolv in the 
future. IMO, for yum (I mean in the sqlite based solution) it is a 
matter of optimizations.

>> I have a few questions:
>>
>>    * What is the reasoning behind the splitting of the database across
>> many .sqlite files?
>
> many? it's 3 afaik. primary, filelists, other.
>
> how do you mean 'many?

Multiplied by the number of the repos. That is what I am trying to 
understand - Why not just single .sqlite file for the whole yum database?

>>    * Why the sql schema is so denormalized (IMO, leads to both
>> bandwidth and disk overspending without speed benefits)?. For
>> example: Why provides and requires tables do not use the common
>> domain table?
>
> B/c it was designed 8yrs ago and we were going for compressable space
> and making it as quick as possible to search?

In the provides and requires example, we do not have any space/speed 
benefits achieved by the missing common domain (dependency + 
dependency_evr tables). In the current situation we have fat and slow 
text duplication and indexes instead of integer references to the domain 
subnodes (dependencies is the biggest domain in the primary). Yes, in 
bunch of cases a little denormalization is inevitable when we fight for 
speed, but IMO, this and few other space flaws are with negative impact 
on the speed too.

>
>>    * Why the incremental update mechanism (eg. applying xml diffs to
>> the sqlite database) was not been considered from the very beginning?
>
> It wasn't necessary? There was a massively smaller number of pkgs to
> consider.
>

Indeed. Also, 8 years ago the possibilities and the number of ideas to 
reuse were definitely different :-)

Thank you,
Alek



More information about the devel mailing list