On Friday, June 19, 2020 12:04:09 PM CEST Pavel Raiskup wrote:
> On Friday, June 19, 2020 11:39:57 AM CEST Iñaki Ucar wrote:
> > And it's happening again.
>
> Seems like something related to:
>
https://pagure.io/fedora-infrastructure/issue/9051
>
> I'm looking at it, and I'll keep you updated.
Ok, I can confirm the reason - we fail to spawn VMs because we rely on
several repositories to be available (koji infra and fedora repos).
Fedora repos don't work all the time because they are in AWS, and
librepo/libdnf/dnf doesn't fallback to other mirrors:
https://bugzilla.redhat.com/show_bug.cgi?id=1819188
Sometime different mirror is taken, and we succeed there.
The infra repos are down, but they seem to respond from time to time too:
https://kojipkgs.fedoraproject.org/repos-dist/f30-infra/latest/
On top of that, our VM allocation scripting did not detect this kind of
spawning failure. So we often did not retry the allocation (but I was
able to fix this bug at least).
The overall result/status is that we are able to start ~1 machine from 10
attempts, which means ~1 machine each few minutes. Hmm, at least
something, but it is not enough to process all the incoming requests.
I can not promise timely resolution, hopefully everything get's into a
good shape when
https://pagure.io/fedora-infrastructure/issue/9051 is
fixed.