Freeze break: db-koji01 and bvirthost09 reboot
Pierre-Yves Chibon
pingou at pingoured.fr
Fri Apr 10 18:13:29 UTC 2015
On Fri, Apr 10, 2015 at 11:04:55AM -0600, Kevin Fenzi wrote:
> I was going to wait until after freeze for this, but with us slipping a
> week I think it might be worth doing now.
>
> For the last few weeks we have been having issues with db-koji01.
> The problem started when I moved it's backend storage from one iscsi/pv
> to another iscsi/pv. The load has been high since then and it's not as
> performant as it was.
>
> Effects:
>
> * koji alerts in nagios make us need to restart httpd on koji01 (which
> we can do without outage, but means a human has to wake up and go do
> it).
>
> * If koji01 httpd isn't restarted, kojira sometimes will timeout and
> not launch newrepos. (We worked around this by increasing the
> timeout, but it's only a matter of time before it hits this again).
>
> * Pages on koji that need lots of db access are slower than they
> were/need to be.
>
> Cause:
>
> Not entirely sure what the base cause is. lvdisplay shows the guest is
> on the right iscsi volume, there's no iscsi errors or the like. The
> host did have stale lvm data due to lvmetad running, but that shouldn't
> have affected the running guest(s). I can only think there's something
> still trying to hit the old no longer used iscsi volume and causing
> extra load.
>
> What I would like to do:
>
> * Stop postgres on db-koji01. This will cause the hub to show db down
> to anyone looking.
>
> * rsync /var/lib/pgsql off to backup03. This should take less than
> 10min.
>
> * shutdown db-koji01 and dhcp01.
>
> * Reboot bvirthost09
>
> * See if the issue clears up. If something happens and db-koji01
> doesn't come back up right, we can make a new one and
> sync /var/lib/pgsql back to it and be back up pretty quickly.
> Hopefully it won't come to that.
>
> I'd like to schedule this possibly over the weekend off hours when koji
> isn't all that busy.
>
> Thoughts? +1s?
+1 for me and fingers crossed :)
Pierre
More information about the infrastructure
mailing list