Mon, Apr 25, 2016 at 03:28:40PM CEST, olichtne(a)redhat.com wrote:
On Fri, Apr 22, 2016 at 03:07:57PM +0200, Jan Tluka wrote:
> While working on graceful kill I noticed that when we call interrupt on
> a command then the forked child is left orphaned and the pid_exists() check
> always return true and graceful kill timeouts.
> Adding join() call after sending the interrupt solves this problem.
> Signed-off-by: Jan Tluka <jtluka(a)redhat.com>
> lnst/Common/NetTestCommand.py | 1 +
> 1 file changed, 1 insertion(+)
> diff --git a/lnst/Common/NetTestCommand.py b/lnst/Common/NetTestCommand.py
> index f09e402..6573ffd 100644
> --- a/lnst/Common/NetTestCommand.py
> +++ b/lnst/Common/NetTestCommand.py
> @@ -212,6 +212,7 @@ class NetTestCommand:
> logging.debug("Interrupting command with id \"%s\",
pid \"%d\"" % (self._id, self._pid))
> os.killpg(os.getpgid(self._pid), signal.SIGINT)
> + self._process.join()
AH! I remember this one!... I've spent 3 weeks looking for this deadlock
bug. Take a look at this commit:
This needs to be checked before applying this patch... The commit
message mentions large amounts of data being sent over the communication
PIPE, I'm not exactly sure how much this is at the moment but something
tells me it could be connected to the memory page size of the system.
Also I think we saw this bug when using tcpdump, probably from the
Yes, the deadlock is there. Scratch this patch set until we find better
Thanks for catching this, Ondrej!
>> self._control_cmd = cmd
>> def kill(self, cmd):
>> LNST-developers mailing list