Hi everyone,
while discussing recent signal handling issues in LNST we also came
across timeout handling that seems a bit non clear atm.
Any command or test has timeout attribute that defines how much time
it takes for command to finish. E.g.
<run command="sleep 9" timeout="10"/>
Internally we use SIGALRM, to notify the controller about "time is up".
The simplest case is a command/test that's run in foreground (not having
bg_id attribute). For such command the SIGALRM handler is set to raise
an exception and SIGALRM is scheduled based on timeout attribute. If the
command does not finish on time it will be killed.
Another case is a command that is put in background. In that case the
scheduled SIGALRM is immediately reset. Therefore any timeout set for
the command in background does not make sense since the SIGALRM is not
scheduled anymore (actually it will be scheduled again since there are other
commands in the queue, but now irrelevant).
For the background commands there are two variants of scenario:
a. intr()/kill() is called, here we actually don't care about command
timeout, since recipe is written in a way to terminate the process in
background
b. wait() is called, in this case, the SIGALRM is scheduled for the wait
command based on remaining time of the command running in background,
so in this case timeout is properly handled and valid
All of the above is kind of summary and things are working reliably but
I'd still like to start a discussion on the whole subject and also
following.
If a user specifies timeout for command on background and it has intr() or
kill() accompanied, should we restrict it? Should we just notify the user
that the timeout specified does not make any difference? Should we implement
it so that timeout works also in this case?
Ondrej also mentioned, that if the bg command runs for too long it can be
killed because of socket timeout. So it might be unexpected behavior.
Let me know what you think or if you have more ideas.
-Jan