On Fri, Apr 10, 2015 at 11:04:55AM -0600, Kevin Fenzi wrote:
I was going to wait until after freeze for this, but with us slipping
week I think it might be worth doing now.
For the last few weeks we have been having issues with db-koji01.
The problem started when I moved it's backend storage from one iscsi/pv
to another iscsi/pv. The load has been high since then and it's not as
performant as it was.
* koji alerts in nagios make us need to restart httpd on koji01 (which
we can do without outage, but means a human has to wake up and go do
* If koji01 httpd isn't restarted, kojira sometimes will timeout and
not launch newrepos. (We worked around this by increasing the
timeout, but it's only a matter of time before it hits this again).
* Pages on koji that need lots of db access are slower than they
were/need to be.
Not entirely sure what the base cause is. lvdisplay shows the guest is
on the right iscsi volume, there's no iscsi errors or the like. The
host did have stale lvm data due to lvmetad running, but that shouldn't
have affected the running guest(s). I can only think there's something
still trying to hit the old no longer used iscsi volume and causing
What I would like to do:
* Stop postgres on db-koji01. This will cause the hub to show db down
to anyone looking.
* rsync /var/lib/pgsql off to backup03. This should take less than
* shutdown db-koji01 and dhcp01.
* Reboot bvirthost09
* See if the issue clears up. If something happens and db-koji01
doesn't come back up right, we can make a new one and
sync /var/lib/pgsql back to it and be back up pretty quickly.
Hopefully it won't come to that.
I'd like to schedule this possibly over the weekend off hours when koji
isn't all that busy.
+1 for me and fingers crossed :)