Freeze break: db-koji01 and bvirthost09 reboot

Pierre-Yves Chibon pingou at pingoured.fr
Fri Apr 10 18:13:29 UTC 2015


On Fri, Apr 10, 2015 at 11:04:55AM -0600, Kevin Fenzi wrote:
> I was going to wait until after freeze for this, but with us slipping a
> week I think it might be worth doing now. 
> 
> For the last few weeks we have been having issues with db-koji01. 
> The problem started when I moved it's backend storage from one iscsi/pv
> to another iscsi/pv. The load has been high since then and it's not as
> performant as it was. 
> 
> Effects: 
> 
> * koji alerts in nagios make us need to restart httpd on koji01 (which
>   we can do without outage, but means a human has to wake up and go do
>   it). 
> 
> * If koji01 httpd isn't restarted, kojira sometimes will timeout and
>   not launch newrepos. (We worked around this by increasing the
>   timeout, but it's only a matter of time before it hits this again). 
> 
> * Pages on koji that need lots of db access are slower than they
>   were/need to be. 
> 
> Cause: 
> 
> Not entirely sure what the base cause is. lvdisplay shows the guest is
> on the right iscsi volume, there's no iscsi errors or the like. The
> host did have stale lvm data due to lvmetad running, but that shouldn't
> have affected the running guest(s). I can only think there's something
> still trying to hit the old no longer used iscsi volume and causing
> extra load. 
> 
> What I would like to do: 
> 
> * Stop postgres on db-koji01. This will cause the hub to show db down
>   to anyone looking. 
> 
> * rsync /var/lib/pgsql off to backup03. This should take less than
>   10min. 
> 
> * shutdown db-koji01 and dhcp01. 
> 
> * Reboot bvirthost09 
> 
> * See if the issue clears up. If something happens and db-koji01
>   doesn't come back up right, we can make a new one and
>   sync /var/lib/pgsql back to it and be back up pretty quickly.
>   Hopefully it won't come to that. 
> 
> I'd like to schedule this possibly over the weekend off hours when koji
> isn't all that busy. 
> 
> Thoughts? +1s?

+1 for me and fingers crossed :)


Pierre


More information about the infrastructure mailing list