Outage notes

Wed Jan 12 23:38:53 UTC 2011

On Wed, 2011-01-12 at 14:13 -0700, Stephen John Smoogen wrote:
> On Wed, Jan 12, 2011 at 08:06, seth vidal <skvidal at fedoraproject.org> wrote:
> > Hi Everyone,
> >  I took some notes while we were rebooting boxes I wanted to share them
> > with everyone for future outages.
> >
> > Ordering of the bounces:
> > 1. xen14: puppet is on there and if that is back up first we have a
> > place to stand for pushing out any changes (dns changes for example via
> > puppet) - xen14 takes about 4 minutes to restart/POST
> 
> Most of the new IBM hardware can take 4-6 minutes to reboot. I don't
> know if there is some flags I should have put in it, but it is deadly
> slow.
> 

I have seen in past where IBM Intel boxes are not configured to fast
POST, this could potentially be cause for slow reboot time esp. wrt
installed system memory during POST checks?

> 
> > Overall things to think about for the future:
> > 1. dumping a complete virsh list - including how much memory is actually
> > being used per vm per server before we start reboots
> > 2. checking what disks need fscks because of mounted time and doing
> > those earlier or separately.
> > 3. verifying that all running vms are:
> >   a. intended to be running
> >   b. have a config file
> >   c. are set to autostart
> > 4. verifying that all NOT running vms are:
> >   a. intended to be off
> >   b. are NOT set to autostart
> 
> looks good. I thought koji2 was running before the reboots but it may
> have been a ghost vm.
> 
> > thoughts welcome.
> > -sv
> >
> >
> >
> >
> > _______________________________________________
> > infrastructure mailing list
> > infrastructure at lists.fedoraproject.org
> > https://admin.fedoraproject.org/mailman/listinfo/infrastructure
> >
> 
> 
>