On Fri, 2008-04-18 at 15:33 -0500, Mike McGrath wrote:
The only reason we haven't done this already is the inability to
the box is already up somewhere (which is something we need already)
Consider this scenario:
app1 running on xen1 (which is having high load from koji1 also on xen1)
People complain about the wiki.
We move app1 to a more free box, xen7.
high load causes CRASH
xen1 reboots. Attempts to bring app1 up (already up on xen7)
Two machines try to write to the same disk - DOOM.
There is a bit of hope in this. 1) its happened before and it seems
that the second guest sees the disk is already mounted and gets stuck at
an fsck shell. As long as we realize that that condition potentially
means the box is already up and needs to be checked... we're fine. If
someone tries to type the root password and fsck the disk... DOOM.
This is all a sign of a larger problem with the lack of open source
management tools for virtualization on more then one host at a time. I'm
a huge fan of automation so in general I'd like to
see the plan above implemented but I think we need to alter the xm
creation scripts (I'm not sure what this involves) that makes sure hosts
don't come up on the wrong xen host.
Okay so maybe we need a really-xen-startup init script which:
1. happens AFTER network, etc are up so iscsi items work
2. provides a locking capability so it can talk to 'something else' to
find out which domains are already locked and allocated to determine if
it should start them (and this is easy to circumvent stale locks on
crashing with a good db)
3. notifies on restart.
just a few thoughts...