I talked a bit with Jesse on IRC about this during mayhem week, and said
I'd post here about it at some point.
Some of us were using the old Makefile.common feature for tracking upstream
sources. It was a dismal idiosyncratic kludge like everything else there,
but it was handy. I'll briefly explain how it works as background, which
is not to suggest that we shouldn't solve the same problem for package
maintainers in an entirely different way now.
In the old scheme, you'd set up these things in your package module:
* a "mirrors" file containing a list of URLs
* a list of upstream file names, either in an "upstream" file
or in the Makefile variable UPSTREAM
* an "upstream-key.gpg" file
* set the UPSTREAM_CHECKS variable in Makefile
Those things are stored in the pkgs CVS. For example, the kernel uses an
"upstream" file that gets editted and committed when the base version
changes; whereas elfutils just has in Makefile:
UPSTREAM_FILES = $(NAME)-$(VERSION).tar.bz2
upstream:;
Then you get new make targets:
* make download
This downloads the files in upstream file list if they don't exist
locally, and then checks their signatures against the local .sig or .sign
or whatever files (as UPSTREAM_CHECKS is set).
* make rebase
This does 'make download' and then 'make upload FILES=...' with the
upstream files list. It then does 'cvs add' on all the signature files
just downloaded/checked.
* make new-base
Same, but does new-sources instead of upload, i.e. wipe the old sources
file first.
So, for example, my procedure with elfutils, say when going from 0.147 to
0.148, is:
# edit elfutils.spec, change Version to 0.148
make new-base
cvs rm -f elfutils-0.147.tar.bz2.sig
cvs commit -m'Update to 0.148'
et voila. For the kernel, we have some scripts that edit the "upstream"
file automagically, so it's similarly seamless. (For me in elfutils,
that "cvs rm" is the only real seam.)
The "mirrors" file gives numerous URLs of containing directories where
downloads of the files in the list are attempted. For elfutils, I just
have a list of one. For the kernel, we have a list that is just several
subdirectories of
http://ftp.kernel.org/pub/linux/kernel/v2.6 and just so
that we don't have to keep track of which tarballs and patch files get
stashed where up there. Frankly, I think we could just make the
kernel.spec URLs for files just be exactly correct with some more care and
macro/script magic. I am not aware of any case where someone really needs
to look at multiple mirrors to refresh their upstream files.
So, what about that makes any sense?
I see the use in upstream-key.gpg being kept in the repo along with the
settings of upstream URLs and such. That makes it easy for a new
maintainer to pick things up cold and not need to intuit whose key to go
fetch from a keyserver.
I don't see any benefit to keeping detached signature files in the package
repo. Once you've checked the signatures and decided that your local files
are what you want, then you commit the sources file with their md5sums. So
anyone later wanting to make sure they have the right thing can just do
'md5sum -c sources' and know what files were meant when the pkgs commit was
made. It doesn't really matter if there is some upstream authority's
signature that says they are kosher too, they are the files Fedora is using.
So, that's all background. Here is what I think would be nice to have in
the new world order:
* AFAICT current rpm handles URLs in PatchN: lines as well as SourceN:
lines. We can just put proper URLs for every upstream file in our .spec
files. For most of bothering to do this, upstream is pretty sensible, so
this is really easy with %{version} macros, and mostly we already have
these URLs in there.
I don't see any need for an alternate URL list off hand (no "mirrors"
file). It might be worthwhile to take a survey via grep on the final
/cvs/pkgs state for all the pkgs with upstream-key.gpg files, upstream
files, UPSTREAM_{FILES,CHECKS} variable settings in Makefile, mirrors
files. We can see how much this is actually used, and if anybody
actually has a mirrors file listing multiple alternate download sites for
the same file.
* rpmdev-download script
This could be a fedpkg subcommand, but it's just the local portion and
actually doesn't really have anything to do with Fedora specifically.
I'm suspicious that there must already be something that makes use of the
URLs in Source/Patch .spec headers in some useful fashion, but I can't
really find anything off hand that does.
This would look at the (macro-expanded) .spec file and list all the files
that are given with a URL rather than just a local file name. For each
of those files, it tries to download it from the URL. If there is a
local upstream-key.pgp file (or with a switch, or whatever), then it also
tries to download a URL.{sig,sign,asc} file and verify that the file just
downloaded matches its signature and that's signed by a public key in the
upstream-key.pgp keyring--if not, it removes the download and fails. If
there is no upstream-key.pgp file, it can look for a URL.md5 file and
check it against itself. (The old Makefile.common supports .md5 files,
though I actually only know of uses with .sign and .sig files.) If all
is downloaded and verified, then it emits an md5sum/name list of all the
files that had URLs in the .spec file, on stdout or perhaps directly
writing "sources".
A 'fedpkg new-base' subcommand could run "rpmdev-download foo.spec >
sources".
* pre-update hook on
pkgs.fedoraproject.org repos
(The gitolite "custom hooks" documentation suggests it is easy to write a
script and make all existing/new repos point to it to run after the ACL
enforcement hook succeeds.)
This can start out independent of the upstream-download stuff.
Firstly, it checks each new head to validate its tree's "sources" file.
If the format is invalid, or any md5sum+name is not already present in
the lookaside cache, refuse the push.
Second, it can macro-expand and parse the .spec file (I assume there is
some python for this already), with appropriate dist macros for the
branch. This is the same step rpmdev-download does, but here we get the
complete list of Source and Patch files, both those with URLs and those
without. Check that the tree contains the file if it's a local file name,
and that the sources file contains the file name if it's a URL.
Of course, we may want to consider various other .spec checks at some
point that would go here too. But that's beside the point here.
* Auto-downuploading!
Automating the process of downloading big tarballs from well-connected
servers to my humble home workstation and then uploading them to other
well-connected servers was not what I actually wanted to do. (I used to
usually log into a machine in an RH office to do this step so as not to
suffer the slow upload delay from home, but between current Fedora
infrastructure's separate location and slightly better ISP service at
home, I tend to use the easy stuff at home and just sit and wait a little
while.) When I'm downloading from
fedorahosted.org to upload, it's
especially ironic, because I'm pretty sure the two servers are in the
same rack, if not the same machine!
What that pre-update hook should really do after it gets the list of
files named by URLs in .spec and verifies they are listed in the sources
file, is check if each one is already in the lookaside cache, and if it's
not, go get it. You have the upstream URL from .spec and you have the
md5sum from sources. You can fetch the file and verify it's what the
committer meant. Just do it!
This could be structured so the download from upstream is done on the
lookaside cache server itself, which is obviously optimal. Since they
are all closely colocated, I assume it isn't really any worse overall to
have the git server machine or some other worker (a koji job?) do the
download and upload via the existing lookaside server mechanism. I don't
care about the implementation details.
To avoid dangers either from abuse or from strange errors, there could be
an administrative list of acceptable upstream URL patterns. (The dangers
would be provoking strange CGI hits on the web from Fedora servers, and
that sort of thing.) This can include all
fedorahosted.org and
kernel.org and whatever the normal sourceforge download URL schema is
this week, etc. It's probably sufficient to hand-adminster this URL
filter list and let package maintainers use rel-eng tickets for
additions. If there turns out to be an actual need that the "mirrors"
file was serving, then this pattern list could instead be a table to
rewrite a canonical URL to a list of alternate URLs to try downloading
from (with rewrite-to-self on matching
fedorahosted.org and the like,
and error on no match).
All that seems like it shouldn't be all that much work (for someone else ;-).
And it would be not merely a restoration of what we've been used to, but
both much cleaner and much better!
Thanks,
Roland