This patch set enhances handling of command timeouts. We've noticed that for example netperf ran from Netperf test module might not be able to make connection to netperf server. Linux default is to try send 6 SYN packets before giving up TCP connection. This can take up to 2 minutes before netperf terminates.
When the timeout for Netperf module happens the current approach is simply to send SIG_KILL to the process. It works fine but this also make all of command outputs to be lost so the user can't tell what happened to netperf besides that it was killed.
To overcome this limitation I've added graceful kill when timeout occurs. First the slave tries to send SIG_INT to the process and checks if the process ended for 5 seconds. If the process does not end it is SIG_KILLed.
I tried to test this as much as possible and I also ran the regression tests. Two of the tests had to be modified due to new reporting of graceful kill. Besides that all is working fine and even better.
Jan Tluka (7): NetTestCommand: add pid_exists method to NetTestCommand NetTestCommand: log interrupt of foreground and background command separately NetTestCommand: added graceful kill flag NetTestSlave: add graceful termination to kill_command Machine: use graceful kill_command on process timeout NetTestCommand: add missing join on interrupt regression-tests: update tests to match graceful termination on timeout
lnst/Common/NetTestCommand.py | 26 +++++++++++++++++++++++--- lnst/Controller/Machine.py | 9 +++++++-- lnst/Slave/NetTestSlave.py | 22 ++++++++++++++++++++-- regression-tests/tests/24/run.sh | 2 +- regression-tests/tests/27/run.sh | 4 ++-- 5 files changed, 53 insertions(+), 10 deletions(-)