Mike Bonnet wrote:
On Thu, 2008-07-17 at 13:54 -0400, Mike McLean wrote:
> If the remote_repo_url data is going to be inherited (and I tend to
> think it should be), then I think it should be in a separate table. I'd
> like to reserve tag_config for data that is local to individual tags.
> This will also make it easier to represent multiple remote repos.
I don't have any problem with this, though it does mean we'll need to
duplicate quite a bit of the inheritance-walking code, or make it
configurable as to which inheritance it's walking. This new table would
also have to be versioned, the same way the tag_config table is.
Walking inheritance is just a matter of determining the inheritance
order and scanning data on the parent tags in sequence. Currently,
nothing scans tag_config in this way because no data in tag_config is
inherited. (Well, in a sense tag_changed_since_event() does walk
tag_config, but that's a little different.)
We need to figure out how we'll deal with multiplicity for the external
repos. If tag A uses repo X and inherits from tag B which uses repo Y,
then does tag A use both X and Y, or does the X entry override it?
A (+repo X)
+- B (+repo Y)
My inclination is that it should override, because I think we'll want
some way to do override that that mechanism seems easiest.
Also, I think we'll probably want to allow multiple external repos per
tag, something which will be much easier to represent in an external
table. We can include an explicit priority field to make a sane
uniqueness condition (and to provide a clear ordering for the repo merge).
The big win here is that the methods and tools that query rpminfo
information about what was present in the buildroot at build time
I see all that, and I'm almost convinced. The flipside is that by
default all the code will treat these external rpms the same as the
local ones, which will not be correct for a number of cases. Obviously,
part of this will involve changing code to behave differently for the
external ones, I'm just worried about how much we might have to change,
or what we might miss.
Yes, I realize that the "not null" constraint should exist
now, and in
fact all rpms in the Fedora database do reference builds. However, I
think logically having a remote rpm not reference a local build makes
sense. The alternative is to create the build object from the srpm info
in the repodata (along with some namespacing similar to rpminfo).
However, this would significantly clutter the build table with
information that is pretty non-essential.
The idea of grouping them into builds appeals to me, but I don't think
it's possible in general (though maybe we could fake it well enough
somehow). The only data we're (mostly) guaranteed to have to work with
is the sourcerpm header field. The catch is that in case of an
nvr-collision we can't determine which build it belongs to (or indeed if
we should create a new build of same nvr).
I'm open to suggestions on how to modify the uniqueness
handle this case. We care about ensuring that a locally-built rpm
doesn't have the same n-v-r as another locally-built rpm. I don't think
we care at all about n-v-r uniqueness amongst remote rpms. However, we
probably want to avoid creating 2 rpminfo entries when the same remote
rpm is used in 2 different buildroots. Using the sigmd5 is a good way
to avoid that.
Agreed. same sigmd5 ==> same rpm.
However, what happens if a remote rpm with the same
n-v-r and sigmd5 gets pulled in from 2 different remote repos?
This gets into part of what bugs me about this and why I'm somewhat
inclined to keep the ext repo data a step removed. It's so potentially
dirty. Koji has all these consistency constraints that an external repo
(much less many of them in aggregate) lacks.
It's quite possible that an external repo might respin a package keeping
the same nvr, so we don't even need 2 external repos to hit this
the "origin" field should be pushed down to the buildroot_listing table,
so the buildroots can reference the same rpminfo object, but indicate
that it came from a different repo in each buildroot?
Interesting. Yeah, I think that is is probably the right answer.
Also, I'm thinking we need to have some sort of rpm_origin table so that
all these references can be managed cleanly.
Also, what happens when we find 2 remote rpms with the same n-v-r
different sigmd5s? Should that be an error?
Certainly we have to allow the possibility that two origins might have
overlapping nvras. Within a single origin, I'm not so sure. I suppose we
can get away with some small consistency demands. As long as we're only
enforcing unique nvra for local builds and indexing by sigmd5/similar, I
don't think we /have/ to make this an error condition.
In the same vein, what happens when an external repo has an nvra+sigmd5
matching a /local/ rpm? Maybe it doesn't matter, though I guess
technically we want to record the origin properly when it gets into a
buildroot via external repo vs internal tag.
> First, I'd like to be able to support external koji servers
(or rather a
I agree that this is a desirable goal. I believe this is more the
domain of the Koji secondary-arch daemon. It would be talking directly
Well, it has some similarities to 2nd arch, but still quite different.
The more I think about it, the more I think that supporting an external
koji server will probably be much different from from the ext repo
business. Most of the issues with rpminfo will carry over, but with a
koji server we will be able to determine build data and can probably
actually pull off something like "inherit from tag X on koji server Y."
The tag content may be managed by build, but when it's time for
actually get used (in the form of a yum repo) it gets unfolded into a
big list of rpms. And what gets associated with a buildroot is simply a
big list of rpms. Conceptually I don't really have a problem with the
idea of a tag as a big list of rpms, that we happen to group by srpm
within Koji because it's more convenient for us. So adding the external
repo information to tag_config is just an extension of the big list of
Yeah, I almost wish I hadn't made the build structure quite the way I did.
However, we will already be parsing the remote repodata, which
information like the srpm name for each rpm, so we could do something
more sophisticated here.
The repomerge tool seems like it solves the problem better, and would
more useful in general.
If we're going to have our fingers in the repodata, we'll probably want
to have them in the merge too. Perhaps we can get createrepo and/or this
repomerge tool usefully libified?