Re: Supporting EPEL Builds in Koji

Monday, 6 October 2008

Mike Bonnet wrote:
...
 On Fri, 2008-07-18 at 11:38 -0400, Mike McLean wrote:
> Mike Bonnet wrote:
>> On Thu, 2008-07-17 at 13:54 -0400, Mike McLean wrote:
>>> If the remote_repo_url data is going to be inherited (and I tend to
>>> think it should be), then I think it should be in a separate table. 
...
...
>> I don't have any problem with this, though it does mean
we'll need to
>> duplicate quite a bit of the inheritance-walking code, ...
...
> Walking inheritance is just a matter of determining the
inheritance 
> order and scanning data on the parent tags in sequence. ...
...
 Sorry, I was referring to walking tag_inheritance.  I'd rather
have one
 place that walks the inheritance hierarchy and aggregates data from it,
 than two places that are doing almost the same thing. 
We're talking about inherently different data. External repos to be 
merged in are quite different from builds in the system.

...
 Each tag has a set of builds associated with it.  We walk the
 inheritance hierarchy, aggregating the builds from each tag in the
 hierarchy into a flat list, and then pass that list to createrepo.  We
 would do essentially the same thing for external repos.  When walking
 the hierarchy, if a tag has an external repo associated with it, we
 would append that repo url to a flat list, and pass that list to
 mergerepo.  In both cases we're working with collections of packages
 that are associated with a tag, just in different formats. 
Sure, we can do this with one call to readFullInheritance, and traverse 
both the build table and external repo table from the given order.

...
 In discussing this with Jesse, I think we want external repos to be
 inherited.  This is probably the easiest way to deal with having
 multiple external repos getting pulled in to a single buildroot, which
 is essential for Fedora (think F9 GA and F9 Updates).

 The idea was that, by convention, we would have external-repo-only tags,
 with only a single external repo associated with it and no
 packages/builds associated.  These external-repo-only tags could then be
 inserted into the build hierarchy where appropriate.  An ordered list of
 external repos could then be constructed by performing the current
 depth-first search of the inheritance hierarchy.  The ordered list would
 then be passed to mergerepo, which would ensure that packages in repos
 earlier in the list supersede packages (by srpm name) in repos later in
 the list.  This would preserve the "first-match-wins" inheritance policy
 that Koji currently implements, and that admins expect.  For example:

 dist-custom-build
   ├─dist-custom
   └─dist-f9-updates-external
       └─dist-f9-ga-external

 would result mergerepo creating a single repo that would only contain
 packages from dist-f9-ga-external if they did not exist in the
 Koji-generated repo (dist-custom-build + dist-custom),
 dist-f9-updates-external, or the blacklist of blocked packages.  This is
 consistent with how Koji package inheritance currently works, and I
 think is the most intuitive approach. 
It is similar, but different in potentially confusing ways. External 
repos do not have build structure, so we can't really have the same sort 
of inheritance behavior with a combination of external repo tags and 
normal tags.

We order the external repos in inheritance order, but ultimately those 
repos are merged with the internal one in a way that does not honor 
inheritance in the way that the admin might expect.

Using tags to represent external repos fails intuition because external 
repos are very much not like tags. When we get to supporting external 
koji systems, we can do something like this, but for external repos the 
"bolted-on" nature needs to be clear. This is why I'd prefer to have the 
data a little more removed.

...
> I see all that, and I'm almost convinced. The flipside is
that by 
> default all the code will treat these external rpms the same as the 
> local ones, which will not be correct for a number of cases. 

 Personally I'd prefer adding a few special cases to the existing code,
 rather than maintain a whole heap of almost-but-not-quite-the-same code
 to manage external rpms.  I think that conceptually they're alike enough
 that the number of special cases will be minimal. 
I think I'm ok with using the rpminfo table.

...
 I think that synthesizing builds for that sake of maintaining the
 not-null constraint is more pain than it's worth, and would make
 enforcing our nvr-uniqueness constraints (which we definitely want to do
 for local builds) more difficult.  Having locally-built rpms always
 associated with a build, and external rpms not, makes sense to me. 
Ok, agreed.

...
> Also, I'm thinking we need to have some sort of rpm_origin
table so that 
> all these references can be managed cleanly.

 That sounds reasonable to me.  Note that we may end up with a lot of
 rows in this table, since we're allowing variable substitution in the
 external_repo_url (tag name and arch).  But I don't see that as a
 problem. 
I'm thinking the only substitution we should support is arch. Anything 
else sort of constitutes a different repo.

If we use an origin table like this we can abstract out the arch. 
Something like:

create table external_repo (
	id SERIAL PRIMARY KEY,
	name TEXT );
create table external_repo_config (
	external_repo_id INTEGER NOT NULL REFERENCES external_repo (id),
	url TEXT NOT NULL,
	-- plus versioning fields
	-- ... );

This way if upstream repo changes url scheme or moves to a different 
host, you can keep some notion of connectedness. External rpms would 
simply reference external_repo_id.

...
> In the same vein, what happens when an external repo has an
nvra+sigmd5 
> matching a /local/ rpm?  Maybe it doesn't matter, though I guess 
> technically we want to record the origin properly when it gets into a 
> buildroot via external repo vs internal tag.

 Right, we would record the origin as the remote repo it came from (by
 parsing the merged repodata and looking at the baseurl). 
So where do we draw the line between code that we add to koji and code 
that we add to createrepo (or some external merge-repo tool)?

...
>> However, we will already be parsing the remote repodata,
which contains
>> information like the srpm name for each rpm, so we could do something
>> more sophisticated here.
> -snipsnip-
> ...
>> The repomerge tool seems like it solves the problem better, and would be
>> more useful in general.
> If we're going to have our fingers in the repodata, we'll probably want 
> to have them in the merge too. Perhaps we can get createrepo and/or this 
> repomerge tool usefully libified?

 I was thinking we would probably just call out to the tool the way we do
 for createrepo, but I'm certainly not against using an API.  I'm a
 little concerned about memory usage when doing the create/mergerepo
 in-process, since we know python and mod_python have garbage-collection
 issues, but that may be a "cross the bridge when we come to it" problem.
 Seth, is it feasible to provide an API to mergerepo that we could use
 directly? 
I don't think I even saw a reply from Seth on this. Where does the 
mergerepo code stand now?

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

Re: Supporting EPEL Builds in Koji