On Wed, Jan 12, 2011 at 08:06, seth vidal <skvidal(a)fedoraproject.org> wrote:
I took some notes while we were rebooting boxes I wanted to share them
with everyone for future outages.
Ordering of the bounces:
1. xen14: puppet is on there and if that is back up first we have a
place to stand for pushing out any changes (dns changes for example via
puppet) - xen14 takes about 4 minutes to restart/POST
Most of the new IBM hardware can take 4-6 minutes to reboot. I don't
know if there is some flags I should have put in it, but it is deadly
Overall things to think about for the future:
1. dumping a complete virsh list - including how much memory is actually
being used per vm per server before we start reboots
2. checking what disks need fscks because of mounted time and doing
those earlier or separately.
3. verifying that all running vms are:
a. intended to be running
b. have a config file
c. are set to autostart
4. verifying that all NOT running vms are:
a. intended to be off
b. are NOT set to autostart
looks good. I thought koji2 was running before the reboots but it may
have been a ghost vm.
infrastructure mailing list
Stephen J Smoogen.
"The core skill of innovators is error recovery, not failure avoidance."
Randy Nelson, President of Pixar University.
"Let us be kind, one to another, for most of us are fighting a hard
battle." -- Ian MacLaren