Freeze break: db-koji01 and bvirthost09 reboot

Friday, 10 April 2015

I was going to wait until after freeze for this, but with us slipping a
week I think it might be worth doing now. 

For the last few weeks we have been having issues with db-koji01. 
The problem started when I moved it's backend storage from one iscsi/pv
to another iscsi/pv. The load has been high since then and it's not as
performant as it was. 

Effects: 

* koji alerts in nagios make us need to restart httpd on koji01 (which
  we can do without outage, but means a human has to wake up and go do
  it). 

* If koji01 httpd isn't restarted, kojira sometimes will timeout and
  not launch newrepos. (We worked around this by increasing the
  timeout, but it's only a matter of time before it hits this again). 

* Pages on koji that need lots of db access are slower than they
  were/need to be. 

Cause: 

Not entirely sure what the base cause is. lvdisplay shows the guest is
on the right iscsi volume, there's no iscsi errors or the like. The
host did have stale lvm data due to lvmetad running, but that shouldn't
have affected the running guest(s). I can only think there's something
still trying to hit the old no longer used iscsi volume and causing
extra load. 

What I would like to do: 

* Stop postgres on db-koji01. This will cause the hub to show db down
  to anyone looking. 

* rsync /var/lib/pgsql off to backup03. This should take less than
  10min. 

* shutdown db-koji01 and dhcp01. 

* Reboot bvirthost09 

* See if the issue clears up. If something happens and db-koji01
  doesn't come back up right, we can make a new one and
  sync /var/lib/pgsql back to it and be back up pretty quickly.
  Hopefully it won't come to that. 

I'd like to schedule this possibly over the weekend off hours when koji
isn't all that busy. 

Thoughts? +1s?

kevin

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006