This is some nitty-gritty but super informative "sausage making" stuff here.
Could be pulled wholesale, and then provide a link to copr?
---------- Forwarded message ----------
From: Miroslav Suchý <msuchy(a)redhat.com>
Date: Thu, Nov 12, 2015 at 10:16 AM
To: Cool Other Package Repositories <copr-devel(a)lists.fedorahosted.org>
Last two days we had problem processing the queue. This is post-mortem of
Mmraka sent several thousands of builds to Copr - that is fine, it was
discussed in advance with me and in fact I
encourage such tests and rebuilds. However this triggered one bug: this
users was unable to get list of builds as we
have inefficient SQL query on that page . As result of this Michal (and
very likely somebody else too) tried to
delete several hundreds of builds at once.
This resulted in bad JobGrabber behaviour where it fetched few dozen tasks
and then stopped without any error.
When I debug it (on production (!) because it did not happen in stage) it
processed first round of builds, first action
and then stopped.
I then learned that JobGrabber waited for lock, which was hanging there
from previously killed JobGrabber. After I
removed it, I found there is that big number of tasks to be executed. And
our code in JobGrabber looks like:
if some builds:
put builds in queue
if some tasks:
execute them immediately
That is because previously users send only few tasks at once and those
operation are basically very cheap (usually just
unlink, followed by quick createrepo_c --update).
However repositories to which belong those actions are big (several GBs)
and even createrepo_c run for more than minute.
So it effectively blocked next fetch of builds from frontend for several
Right now the task queue is empty so builds are processed in timely manner
and our code in master is already changed to
be resistant to such behaviour.
I am really sorry if you had to wait for your build in past two days.
We learned a lesson from this massive usage of Copr and we identified some
other potential performance issue and it will
result in even better service in upcoming days.
 Adam is fixing the code right now.
Miroslav Suchy, RHCA
Red Hat, Senior Software Engineer, #brno, #devexp, #fedora-buildsys
copr-devel mailing list
Fedora Community Lead & Council