[fedora-arm] Status Update: ARM Koji Build System Outage

Peter Robinson pbrobinson at gmail.com
Tue Mar 13 15:31:51 UTC 2012


Hi Max et el,

On Tue, Mar 13, 2012 at 2:52 PM, M Abed <maxamaxim at me.com> wrote:
> Hello Everyone!
>
> As you are aware that we had *some* issues with the build farm due to a
> power outage yesterday. We didn't have too many chances to send updates. So
> here it is:

I've asked for this on a number of occasions. It only takes 30 seconds
to send a quick IRC or email update to keep people in the loop of the
status even if it's a "koji still down, still working on it" style of
message.

> After the power outage, all the critical servers and the builders were up
> and running. However due to the power outage there was some other networking
> issues at Seneca. A short while later some of the systems rebooted and we
> started dealing with it as soon as we noticed.
>
> At that point some of the management servers were still down. There were
> issues with Apache, SE Linux or Networking in one server or another. After
> those issues were dealt with, there was the NFS Locking issue and it turned
> out to be due to some Firewall rules. All that got fixed last night.

I hope all the firewall rules and other such changes have been now
committed to config files so that the changes are stateful over a
reboot, it's not the first time firewalls have caused issues on the
reboot of servers.

> However, we are still not 100%. Some of the builders are still timing out as
> of this morning and that issue is still under investigation. Please consider
> the build farm to be out of commission till further notice. You can submit
> jobs and that may or may not go through.

Can you please disable the problematic builders while you work on them
an only re-enable them once they are known good. That way you can work
on them to your hearts content and others can continue to work
uninterrupted.

> We are going to bring the farm back up as fast as we can.  We apologize for
> the inconvenience and thank you for your patience.
>
> Today *will* be a better day and we will try to provide updates
> more frequently. If you have any question please reply to this email or
> express it in the channel.

I'm fully aware of what it can be like working in outage situations
(that is my job after all) so please try and provide regular updates.

Thanks for your work.

Peter


More information about the arm mailing list