On Mon, Aug 29, 2016 at 3:33 PM, Pavel Raiskup <praiskup@redhat.com> wrote:
On Monday, August 29, 2016 1:04:58 PM CEST Michal Novotny wrote:
> On Fri, Aug 26, 2016 at 6:22 PM, Pavel Raiskup <praiskup@redhat.com> wrote:
> > Does it spawn builders even if there is no build queue yet?
>
> Yes, it does. I completely cut off our backend dev instance from frontend
> and repeated the fresh-start experiment.
> This time the build queue was empty for sure and the builders were still
> being spawned. I also confirmed in the
> code that it should be so. These parts (around VmMaster ) weren't touched.

Perfect, thanks.

I've done a quick review of the patch now, and I pretty much like the
backend's "take-one-task" only approach.  That way you can control the
queue on frontend (with atomicity given by PostgreSQL), while still that
is 'pull down' approach from backend.

There is one drawback, however -- the ugly workaround DEFER_BUILD_SECONDS.
The problem is that you now put all builds (all architectures) into one
build queue (that has "starving" consequences if you wasn't using
DEFER_BUILD_SECONDS).

I would suggest you to add one additional argument into /backend/waiting/
backend API -> requested architecture. 

Good idea.
 
  * Then, you can remove everything related to "defer" action both on BE
    and FE.

  * You can lower the BE<->FE traffic, and significantly lower IO on
    front-end --> because then you can first allocate appropriate VM, and
    right after that assign job (not vice versa: take job, then try to
    take VM and possibly defer the job).

You wouldn't exactly allocate appropriate VM. Rather, some VMs were preallocated
(ideally taking all the available resources and all busy building) and you would acquire
a new build job of the fitting arch when one of them became free. I am not sure if we can
completely remove job deferring because it is good for cases when a backend took a job
it cannot (currently) handle. That, in theory, shouldn't happen with this approach of taking jobs
only if there is an available VM of the job's required arch but you never know and deferring is
better than dropping.

Also, the 'take-build' (load_job() in particular) method should be "atomic" ->
automatically move the action into 'running' state, and completely remove the
"starting" state, which has zero informational value anyway (users know/should
know the build queue priority anyway).

That's probably right.
 

  * Then we could much easily implement "multiple-backends"
    support, I wanted to have something like this for a long time.

Cannot see into this but sounds good.
 
Pavel