* Dan Kenigsberg <danken(a)redhat.com> [2012-09-09 12:52]:
On Fri, Sep 07, 2012 at 03:54:10PM -0400, Alon Bar-Lev wrote:
>
>
> ----- Original Message -----
> > From: "Ryan Harper" <ryanh(a)us.ibm.com>
> > To: "Alon Bar-Lev" <alonbl(a)redhat.com>
> > Cc: "Ryan Harper" <ryanh(a)us.ibm.com>,
vdsm-devel(a)lists.fedorahosted.org
> > Sent: Friday, September 7, 2012 10:47:10 PM
> > Subject: Re: Change in vdsm[master]: bootstrap: perform reboot asynchronously
> >
> > * Alon Bar-Lev <alonbl(a)redhat.com> [2012-09-07 14:45]:
> > >
> > >
> > > ----- Original Message -----
> > > > From: "Ryan Harper" <ryanh(a)us.ibm.com>
> > > > To: "Alon Bar-Lev" <alonbl(a)redhat.com>
> > > > Cc: vdsm-devel(a)lists.fedorahosted.org
> > > > Sent: Friday, September 7, 2012 10:30:18 PM
> > > > Subject: Re: Change in vdsm[master]: bootstrap: perform reboot
> > > > asynchronously
> > > >
> > > > * Alon Bar-Lev <alonbl(a)redhat.com> [2012-09-05 16:11]:
> > > > > Alon Bar-Lev has uploaded a new change for review.
> > > > >
> > > > > Change subject: bootstrap: perform reboot asynchronously
> > > > >
......................................................................
> > > > >
> > > > > bootstrap: perform reboot asynchronously
> > > > >
> > > > > The use of /sbin/reboot may cause reboot to be performed at the
> > > > > middle
> > > > > of script execution.
> > > > >
> > > > > Reboot should be delayed in background so that script will have
> > > > > a
> > > > > fair
> > > > > chance to terminate properly.
> > > >
> > > > So, we fork and sleep 10 seconds? Is that really want we want to
> > > > do?
> > > > Why is 10 seconds enough?
> > > >
> > > > Shouldn't the deployUtil be tracking the script execution and
> > > > waiting
> > > > for the scripts to complete before rebooting?
> > >
> > > Hi,
> > >
> > > Reboot is called at the very end of the script, 10 seconds is more
> > > than enough.
> >
> > I don't know how we can assert that... we're not the sole process on
> > the
> > box.
> >
> > >
> > > You are right that we can track the pid of the bootstrap script's
> > > parent parent parent, but it will introduce more complexity that I
> > > am
> > > not sure worth it.
> >
> > Why can't we just wait on the PID if it we know it?
>
> Because if we want to have this precise we need to track the following chain of
processes.
>
> sshd->sh->python->python
>
> If we only track the last link in chain, it is not enough as we have race anyway,
and have to wait some extra seconds, as the sh is doing some more logic and cleanups.
>
> We can create the process tree which stop either at ssh or init... but even then if
this is run differently we have a problem.
Ryan, I suppose you are right - hard-coding 10 seconds rings all
fishiness bells. Still, the current behaviour, of running /sbin/reboot
and hoping that it is slow enough so that the calling process writes
what it needs to stdout, is even worse.
I trust Alon to think of a saner delayedReboot() in the near future, so
I've taking the patch now, as a step in the right direction.
Understood, something is better than nothing, however, it certainly is
pushing the requirement for reworking delayedReboot() off a bit since
it's "working" for now.
--
Ryan Harper
Software Engineer; Linux Technology Center
IBM Corp., Austin, Tx
ryanh(a)us.ibm.com