xen proposal

Mike McGrath mmcgrath at redhat.com
Fri Apr 18 20:33:22 UTC 2008


On Fri, 18 Apr 2008, seth vidal wrote:

> So xen1 went down today and I was helping bring things back up. I didn't
> know to look in /var/log/messages for the messages from
> xenGuestsRunning.sh. I was wondering this:
>
> would it make sense to have xenGuestsRunning run every hour and re-make
> the symlinks in /etc/xen/auto for the guests which should be running on
> the machine? Also - if for some reason the xen guests can't be started
> up automatically due to other complexities (iscsi, memory over commit,
> etc) we could have xenGuestsrunning auto-generate a script which can be
> run to re-make the xen guests which should be running.
>
> I'd be willing to put the script together, I just wanted to ask if there
> was a good reason NOT to do this, so I don't waste time if I've missed
> something.
>

The only reason we haven't done this already is the inability to detect if
the box is already up somewhere (which is something we need already)
Consider this scenario:

app1 running on xen1 (which is having high load from koji1 also on xen1)

People complain about the wiki.

We move app1 to a more free box, xen7.

high load causes CRASH

xen1 reboots.  Attempts to bring app1 up (already up on xen7)

Two machines try to write to the same disk - DOOM.


There is a bit of hope in this.  1) its happened before and it seems
that the second guest sees the disk is already mounted and gets stuck at
an fsck shell.  As long as we realize that that condition potentially
means the box is already up and needs to be checked... we're fine.  If
someone tries to type the root password and fsck the disk... DOOM.

This is all a sign of a larger problem with the lack of open source
management tools for virtualization on more then one host at a time.  I'm
a huge fan of automation so in general I'd like to
see the plan above implemented but I think we need to alter the xm
creation scripts (I'm not sure what this involves) that makes sure hosts
don't come up on the wrong xen host.

	-Mike




More information about the infrastructure mailing list