I think this just broke two build jobs:
| Executing /usr/sbin/mock-helper yum --installroot /var/lib/mock/fedora-5-i386-core-e4c5db6c7c922589d8995bf76b50bcc08e68b049/root install 'zlib-devel' 'texi2html' 'compat-gcc-32' 'SDL-devel' | http://buildsys.fedoraproject.org/plague-results/fedora-5-extras/nx/1.5.0-9....: [Errno 4] IOError: HTTP Error 404: Not Found | Trying other mirror. | Error: failure: nx/1.5.0-9.fc5/i386/nx-1.5.0-9.fc5.i386.rpm from local: [Errno 256] No more mirrors to try. | Cleaning up...
Am I misinterpreting this, or did this build job fail because the package had been moved away shortly before the job was queued?
When I logged into extras64 to sign and push pending builds, the build-status page did not list any running jobs. However, the time it takes to run repomanage, createrepo, repoview and sync the results to the master repository is far too long. It looks like above "qemu" job was started shortly after I had started the push. The push is still in progress while I type this. Especially repoview takes a lot of time, and it is also run for debuginfo packages. (why? is this really worthwhile?)
Which repositories do the build servers pull packages from?
What options do we have to improve this?
- We could modify the push script to _copy_ files instead of moving them. Successfully copied packages would be marked [in their package root directory] and would not be removed prior to a successful sync.
- I've heard there is a new createrepo version which makes backups of other files in the repodata directory. With it, we would not lose the repoview directory and could sync twice (once after running createrepo, second time after updating the repoview pages). The first sync would cause the new packages to show up in the master repository much sooner.
- One good thing about the new push script is that it mails the build report only after a successful sync, i.e. when the master repository is up-to-date. Is this the only notification for packagers that they can use to decide when to re-queue their build jobs?
Am Donnerstag, den 08.06.2006, 12:45 +0200 schrieb Michael Schwendt:
I think this just broke two build jobs:
Well, we had similar problems before and even opened a bug report to track it (but it got forgotten): https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=185846
[...]
- We could modify the push script to _copy_ files instead of moving
them. Successfully copied packages would be marked [in their package root directory] and would not be removed prior to a successful sync.
Sound like a good workaround (or is this even the proper solution?).
CU thl
On Thu, 2006-06-08 at 14:32 +0200, Thorsten Leemhuis wrote:
Am Donnerstag, den 08.06.2006, 12:45 +0200 schrieb Michael Schwendt:
I think this just broke two build jobs:
Well, we had similar problems before and even opened a bug report to track it (but it got forgotten): https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=185846
[...]
- We could modify the push script to _copy_ files instead of moving
them. Successfully copied packages would be marked [in their package root directory] and would not be removed prior to a successful sync.
Sound like a good workaround (or is this even the proper solution?).
Any idea how long each push takes, for a reasonable # of packages? Dave Woodhouse just ran into a repo timeout, where the repo was locked longer than 30 minutes and the builder killed the job (intentionally and correctly) because the server hadn't told it to unlock yet.
If the potential window is > 30m, we should increase the timeout in the builders to account for that. I just picked 30m as a default with no particular idea if it was correct or not.
Timeout should account for both the server's copy+createrepo run, and the Extras push script operations.
Dan
On Thu, 08 Jun 2006 10:00:59 -0400, Dan Williams wrote:
Any idea how long each push takes, for a reasonable # of packages?
In total, a lot longer than 30 minutes. :(
Dave Woodhouse just ran into a repo timeout, where the repo was locked longer than 30 minutes and the builder killed the job (intentionally and correctly) because the server hadn't told it to unlock yet.
That doesn't sound as if it's related. The push script does not lock anything for such a long period. The new locks below repodir are locked only as long as it takes to type in the passphrase, sign and move the packages and clean up empty dirs. For an average push, I believe that takes somewhere below one minute in total per dist.
Only after that the time-consuming operations are performed, and no repo below repodir is locked at the time, as they are not needed anymore.
If the potential window is > 30m, we should increase the timeout in the builders to account for that. I just picked 30m as a default with no particular idea if it was correct or not.
Timeout should account for both the server's copy+createrepo run, and the Extras push script operations.
Reading the bugzilla ticket Thorsten has pointed to, all this smells much like we should try the copy-and-mark-as-done approach.
In pseudo-code, for each dist X:
1 - lock plague-results repo for dist X 2 - examine available new builds which are not marked as PUSHED 3 - copy available builds to local master repo - keep track of what packages we create in there 4 - sign all new packages in local master repo - in case of error, rollback, i.e. revert to state before (3) by removing the packages we tracked - if successful, mark all copied builds as PUSHED 5 - unlock plague-results repo for dist X
If this 5-steps transaction is successful, the usual repomanage, createrepo, and repoview stuff is run. A cleanup job can remove old packages, which are marked as PUSHED, from the plague-results repo after one or two days.
With this approach, we remove builds from the plague-results repo only when they are available in the master repository. And we don't sign/modify rpms in the plague-results repo, so this doesn't confuse the build servers either.
buildsys@lists.fedoraproject.org