I'd like to make a feature request for mock: the ability for it to determine a job has taken too long and kill it. mock --timeout N (with N in minutes) is the UI I'm picturing.
I've been doing these mass rebuilds for a while. Every so often we'll wind up with a package with the halting problem - it continues to run, or not, but it doesn't ever finish building either. Ever. Several days later, it's still going, but not making progress.
Sure, some jobs, like the kernel, openoffice, glibc, etc. can take a good number of hours. But even those don't run for days. The latest culprits was a few perl modules that ran for >2 days with no end in sight.
Thanks, Matt
On Mon, Apr 16, 2007 at 07:50:15AM -0500, Matt Domsch wrote:
I'd like to make a feature request for mock: the ability for it to determine a job has taken too long and kill it. mock --timeout N (with N in minutes) is the UI I'm picturing.
I've been doing these mass rebuilds for a while. Every so often we'll wind up with a package with the halting problem - it continues to run, or not, but it doesn't ever finish building either. Ever. Several days later, it's still going, but not making progress.
Sure, some jobs, like the kernel, openoffice, glibc, etc. can take a good number of hours. But even those don't run for days. The latest culprits was a few perl modules that ran for >2 days with no end in sight.
Did you find out why it was not ending? Was it a package bug?
Thanks, Matt
On Mon, Apr 16, 2007 at 03:25:21PM +0200, Axel Thimm wrote:
On Mon, Apr 16, 2007 at 07:50:15AM -0500, Matt Domsch wrote:
I'd like to make a feature request for mock: the ability for it to determine a job has taken too long and kill it. mock --timeout N (with N in minutes) is the UI I'm picturing.
I've been doing these mass rebuilds for a while. Every so often we'll wind up with a package with the halting problem - it continues to run, or not, but it doesn't ever finish building either. Ever. Several days later, it's still going, but not making progress.
Sure, some jobs, like the kernel, openoffice, glibc, etc. can take a good number of hours. But even those don't run for days. The latest culprits was a few perl modules that ran for >2 days with no end in sight.
Did you find out why it was not ending? Was it a package bug?
It's usually not a packaging bug, but a package bug. :-)
The perl 'make test' routines are the usual culprits.
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Matt Domsch wrote:
I'd like to make a feature request for mock: the ability for it to determine a job has taken too long and kill it. mock --timeout N (with N in minutes) is the UI I'm picturing.
I've been doing these mass rebuilds for a while. Every so often we'll wind up with a package with the halting problem - it continues to run, or not, but it doesn't ever finish building either. Ever. Several days later, it's still going, but not making progress.
Sure, some jobs, like the kernel, openoffice, glibc, etc. can take a good number of hours. But even those don't run for days. The latest culprits was a few perl modules that ran for >2 days with no end in sight.
File an RFE BZ on it and I'll look at it.
Clark
On Mon, Apr 16, 2007 at 12:50:11PM -0500, Clark Williams wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Matt Domsch wrote:
I'd like to make a feature request for mock: the ability for it to determine a job has taken too long and kill it. mock --timeout N (with N in minutes) is the UI I'm picturing.
I've been doing these mass rebuilds for a while. Every so often we'll wind up with a package with the halting problem - it continues to run, or not, but it doesn't ever finish building either. Ever. Several days later, it's still going, but not making progress.
Sure, some jobs, like the kernel, openoffice, glibc, etc. can take a good number of hours. But even those don't run for days. The latest culprits was a few perl modules that ran for >2 days with no end in sight.
File an RFE BZ on it and I'll look at it.
I have some python code from another project that does exactly this. I can drop it in. I'll send a patch for review. -- Michael
On Mon, Apr 16, 2007 at 01:07:52PM -0500, Michael E Brown wrote:
On Mon, Apr 16, 2007 at 12:50:11PM -0500, Clark Williams wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Matt Domsch wrote:
I'd like to make a feature request for mock: the ability for it to determine a job has taken too long and kill it. mock --timeout N (with N in minutes) is the UI I'm picturing.
I've been doing these mass rebuilds for a while. Every so often we'll wind up with a package with the halting problem - it continues to run, or not, but it doesn't ever finish building either. Ever. Several days later, it's still going, but not making progress.
Sure, some jobs, like the kernel, openoffice, glibc, etc. can take a good number of hours. But even those don't run for days. The latest culprits was a few perl modules that ran for >2 days with no end in sight.
File an RFE BZ on it and I'll look at it.
I have some python code from another project that does exactly this. I can drop it in. I'll send a patch for review.
here is my previous code to call an external command with a timeout. I'll do a patch when I get a minute, but this should give a good idea:
# helper class & functions for executeCommand() # User should handle this if they specify a timeout class commandTimeoutExpired(Exception): pass
# the problem with os.system() is that the command that is run gets any # keyboard input and/or signals. This means that <CTRL>-C interrupts the # sub-program instead of the python program. This helper function fixes # that. # It also allows us to set up a maximum timeout before all children are # killed def executeCommand(cmd, timeout=0): class alarmExc(Exception): pass def alarmhandler(signum,stackframe): raise alarmExc("timeout expired")
pid = os.fork() if pid: #parent rpid = ret = 0 oldhandler=signal.signal(signal.SIGALRM,alarmhandler) starttime = time.time() prevTimeout = signal.alarm(timeout) try: (rpid, ret) = os.waitpid(pid, 0) signal.alarm(0) signal.signal(signal.SIGALRM,oldhandler) if prevTimeout: passed = time.time() - starttime signal.alarm(math.ceil(prevTimeout - passed)) except alarmExc: os.kill(-pid, signal.SIGTERM) time.sleep(1) os.kill(-pid, signal.SIGKILL) (rpid, ret) = os.waitpid(pid, 0) signal.signal(signal.SIGALRM,oldhandler) if prevTimeout: passed = time.time() - starttime signal.alarm(max(math.ceil(prevTimeout - passed), 1)) raise commandTimeoutExpired( "Specified timeout of %s seconds expired before command finished. Command was: %s" % (timeout, cmd) )
# mask and return just return value return (ret & 0xFF00) >> 8 else: #child os.setpgrp() # become process group leader so that we can kill all our children ret = os.system(cmd) os._exit( (ret & 0xFF00) >> 8 )
On Thu, Apr 19, 2007 at 01:14:32AM -0500, Michael E Brown wrote:
On Mon, Apr 16, 2007 at 01:07:52PM -0500, Michael E Brown wrote:
On Mon, Apr 16, 2007 at 12:50:11PM -0500, Clark Williams wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Matt Domsch wrote:
I'd like to make a feature request for mock: the ability for it to determine a job has taken too long and kill it. mock --timeout N (with N in minutes) is the UI I'm picturing.
Here is my patch. I've tested: -- builds without timeout -- builds with timeout, succesful build -- builds with timeout, takes too long
I accidentally developed the patch against 0.6.11. I think it is a one line change to port to 0.6.12. Will do tomorrow if needed.
I dont think this changes any behaviour, but could somebody please check the logging? -- Michael
Patch for implementing a hard timeout for 'rpmbuild' command, to avoid builds that hang.
To activate: commandline: --rpmbuild_timeout=N config file: config_opts["rpmbuild_timeout"]=N
where N==seconds to wait before timing out
signed-off-by: Michael E Brown michael_e_brown@dell.com
--- mock.py-ORIGINAL 2007-04-23 23:28:30.000000000 -0500 +++ mock.py-PATCHED 2007-04-24 00:46:56.000000000 -0500 @@ -26,9 +26,11 @@ import rpmUtils.transaction import rpm import glob +import popen2 import shutil import types import grp +import signal import stat import time from exceptions import Exception @@ -58,6 +60,8 @@ def __str__(self): return self.msg
+class commandTimeoutExpired(Error): pass + class YumError(Error): def __init__(self, msg): Error.__init__(self, msg) @@ -379,11 +383,14 @@
self.state("build")
- (retval, output) = self.do_chroot(cmd) - - if retval != 0: - error(output) - raise BuildError, "Error building package from %s, See build log" % srpmfn + try: + (retval, output) = self.do_chroot(cmd, timeout=self.config['rpmbuild_timeout']) + + if retval != 0: + error(output) + raise BuildError, "Error building package from %s, See build log" % srpmfn + except commandTimeoutExpired: + raise BuildError, "Error building package from %s. Exceeded rpmbuild_timeout which was set to %s seconds." % (srpmfn, self.config['rpmbuild_timeout'])
bd_out = self.rootdir + self.builddir rpms = glob.glob(bd_out + '/RPMS/*.rpm') @@ -490,13 +497,15 @@ # poof, no more file if os.path.exists(mf): os.unlink(mf) -
- def do(self, command): + def do(self, command, timeout=0): """execute given command outside of chroot""" + class alarmExc(Exception): pass + def alarmhandler(signum,stackframe): + raise alarmExc("timeout expired")
retval = 0 - msg = "Executing %s" % command + msg = "Executing timeout(%s): %s" % (timeout, command) self.debug(msg) self.root_log(msg)
@@ -507,25 +516,62 @@ if self.state() == "build": logfile = self._build_log
- pipe = os.popen('{ ' + command + '; } 2>&1', 'r') - output = "" - for line in pipe: - logfile.write(line) - if self.config['debug']: - print line[:-1] - sys.stdout.flush() - logfile.flush() - output += line - status = pipe.close() - if status is None: - status = 0 - - if os.WIFEXITED(status): - retval = os.WEXITSTATUS(status) + output="" + (r,w) = os.pipe() + pid = os.fork() + if pid: #parent + rpid = ret = 0 + os.close(w) + oldhandler=signal.signal(signal.SIGALRM,alarmhandler) + starttime = time.time() + # timeout=0 means disable alarm signal. no timeout + signal.alarm(timeout)
- return (retval, output) + try: + # read output from child + r = os.fdopen(r, "r") + for line in r: + logfile.write(line) + if self.config['debug']: + print line[:-1] + sys.stdout.flush() + logfile.flush() + output += line + + # close read handle, get child return status, etc + r.close() + (rpid, ret) = os.waitpid(pid, 0) + signal.alarm(0) + signal.signal(signal.SIGALRM,oldhandler) + + except alarmExc: + os.kill(-pid, signal.SIGTERM) + time.sleep(1) + os.kill(-pid, signal.SIGKILL) + (rpid, ret) = os.waitpid(pid, 0) + signal.signal(signal.SIGALRM,oldhandler) + raise commandTimeoutExpired( "Timeout(%s) exceeded for command: %s" % (timeout, command)) + + # mask and return just return value, plus child output + return ((ret & 0xFF00) >> 8, output) + + else: #child + os.close(r) + # become process group leader so that our parent + # can kill our children + os.setpgrp() + + child = popen2.Popen4(command) + child.tochild.close() + + w = os.fdopen(w, "w") + for line in child.fromchild: + w.write(line) + w.close() + os._exit( (retval & 0xFF00) >> 8 ) +
- def do_chroot(self, command, fatal = False, exitcode=None): + def do_chroot(self, command, fatal = False, exitcode=None, timeout=0): """execute given command in root""" cmd = ""
@@ -539,7 +585,7 @@ self.rootdir, self.config['runuser'], command) - (ret, output) = self.do(cmd) + (ret, output) = self.do(cmd, timeout=timeout) if (ret != 0) and fatal: self.close() if exitcode: @@ -778,6 +824,8 @@ default=False, help="Turn on build-root caching") parser.add_option("--rebuildcache", action ="store_true", dest="rebuild_cache", default=False, help="Force rebuild of build-root cache") + parser.add_option("--rpmbuild_timeout", action="store", dest="rpmbuild_timeout", type="int", + default=None, help="Fail build if rpmbuild takes longer than 'timeout' seconds ")
return parser.parse_args()
@@ -789,6 +837,7 @@ config_opts['rm'] = '/usr/sbin/mock-helper rm' config_opts['mknod'] = '/usr/sbin/mock-helper mknod' config_opts['yum'] = '/usr/sbin/mock-helper yum' + config_opts['rpmbuild_timeout'] = 0 config_opts['runuser'] = '/sbin/runuser' config_opts['chroot_setup_cmd'] = 'install buildsys-build' config_opts['chrootuser'] = 'mockbuild' @@ -845,6 +894,8 @@ config_opts['statedir'] = options.statedir if options.uniqueext: config_opts['unique-ext'] = options.uniqueext + if options.rpmbuild_timeout is not None: + config_opts['rpmbuild_timeout'] = options.rpmbuild_timeout
def do_clean(config_opts, init=0): my = None @@ -972,7 +1023,7 @@
# cmdline options override config options set_config_opts_per_cmdline(config_opts, options) - + # do whatever we're here to do if args[0] == 'clean': # unset a --no-clean
On Tue, Apr 24, 2007 at 12:51:10AM -0500, Michael E Brown wrote:
On Thu, Apr 19, 2007 at 01:14:32AM -0500, Michael E Brown wrote:
On Mon, Apr 16, 2007 at 01:07:52PM -0500, Michael E Brown wrote:
On Mon, Apr 16, 2007 at 12:50:11PM -0500, Clark Williams wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Matt Domsch wrote:
I'd like to make a feature request for mock: the ability for it to determine a job has taken too long and kill it. mock --timeout N (with N in minutes) is the UI I'm picturing.
And here is the same patch against (what I hope is) the mock-0-6-branch.
I ran: $ git-format-patch origin/mock-0-6-branch 0001-Patch-for-implementing-a-hard-timeout-for-rpmbuild.patch
I dont have a machine set up to test this patch. it was a one line fix from the previous submission.
I'm not too handy with git yet, so I hope this is right... :)
-- Michael
On Tue, Apr 24, 2007 at 01:28:03AM -0500, Michael E Brown wrote:
On Tue, Apr 24, 2007 at 12:51:10AM -0500, Michael E Brown wrote:
On Thu, Apr 19, 2007 at 01:14:32AM -0500, Michael E Brown wrote:
On Mon, Apr 16, 2007 at 01:07:52PM -0500, Michael E Brown wrote:
On Mon, Apr 16, 2007 at 12:50:11PM -0500, Clark Williams wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Matt Domsch wrote:
I'd like to make a feature request for mock: the ability for it to determine a job has taken too long and kill it. mock --timeout N (with N in minutes) is the UI I'm picturing.
And here is the same patch against (what I hope is) the mock-0-6-branch.
I ran: $ git-format-patch origin/mock-0-6-branch 0001-Patch-for-implementing-a-hard-timeout-for-rpmbuild.patch
I dont have a machine set up to test this patch. it was a one line fix from the previous submission.
I'm not too handy with git yet, so I hope this is right... :)
Obviously not too handy with mutt, either, as I forgot to add the patch.
-- Michael
On Tue, Apr 24, 2007 at 08:41:36AM -0500, Michael E Brown wrote:
On Tue, Apr 24, 2007 at 01:28:03AM -0500, Michael E Brown wrote:
On Tue, Apr 24, 2007 at 12:51:10AM -0500, Michael E Brown wrote:
On Thu, Apr 19, 2007 at 01:14:32AM -0500, Michael E Brown wrote:
On Mon, Apr 16, 2007 at 01:07:52PM -0500, Michael E Brown wrote:
On Mon, Apr 16, 2007 at 12:50:11PM -0500, Clark Williams wrote:
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1
Matt Domsch wrote: > I'd like to make a feature request for mock: the ability for it to > determine a job has taken too long and kill it. mock --timeout N (with N > in minutes) is the UI I'm picturing.
And here is the same patch against (what I hope is) the mock-0-6-branch.
I ran: $ git-format-patch origin/mock-0-6-branch 0001-Patch-for-implementing-a-hard-timeout-for-rpmbuild.patch
I dont have a machine set up to test this patch. it was a one line fix from the previous submission.
I'm not too handy with git yet, so I hope this is right... :)
Obviously not too handy with mutt, either, as I forgot to add the patch.
Ok, It wasnt me that time, as I had the patch there, honest! Looks like something stripped it off :( Lets try again a different way.
(sorry for all the extra traffic.) -- Michael
From 40724405ee7c38ec5a4ba7ccecb7ae0bc7889d65 Mon Sep 17 00:00:00 2001 From: Michael E Brown mebrown@michaels-house.net Date: Tue, 24 Apr 2007 01:00:05 -0500 Subject: [PATCH] implementing a hard timeout for 'rpmbuild' command
To activate: commandline: --rpmbuild_timeout=N config file: config_opts["rpmbuild_timeout"]=N
where N==seconds to wait before timing out --- mock.py | 105 ++++++++++++++++++++++++++++++++++++++++++++++---------------- 1 files changed, 78 insertions(+), 27 deletions(-)
diff --git a/mock.py b/mock.py index 497a95f..370f48c 100644 --- a/mock.py +++ b/mock.py @@ -26,9 +26,11 @@ except: import rpmUtils.transaction import rpm import glob +import popen2 import shutil import types import grp +import signal import stat import time from exceptions import Exception @@ -58,6 +60,8 @@ class Error(Exception): def __str__(self): return self.msg
+class commandTimeoutExpired(Error): pass + class YumError(Error): def __init__(self, msg): Error.__init__(self, msg) @@ -378,11 +382,14 @@ class Root:
self.state("build")
- (retval, output) = self.do_chroot(cmd) - - if retval != 0: - error(output) - raise BuildError, "Error building package from %s, See build log" % srpmfn + try: + (retval, output) = self.do_chroot(cmd, timeout=self.config['rpmbuild_timeout']) + + if retval != 0: + error(output) + raise BuildError, "Error building package from %s, See build log" % srpmfn + except commandTimeoutExpired: + raise BuildError, "Error building package from %s. Exceeded rpmbuild_timeout which was set to %s seconds." % (srpmfn, self.config['rpmbuild_timeout'])
bd_out = self.rootdir + self.builddir rpms = glob.glob(bd_out + '/RPMS/*.rpm') @@ -489,13 +496,15 @@ class Root: # poof, no more file if os.path.exists(mf): os.unlink(mf) -
- def do(self, command): + def do(self, command, timeout=0): """execute given command outside of chroot""" + class alarmExc(Exception): pass + def alarmhandler(signum,stackframe): + raise alarmExc("timeout expired")
retval = 0 - msg = "Executing %s" % command + msg = "Executing timeout(%s): %s" % (timeout, command) self.debug(msg) self.root_log(msg)
@@ -506,25 +515,62 @@ class Root: if self.state() == "build": logfile = self._build_log
- pipe = os.popen('{ ' + command + '; } 2>&1', 'r') - output = "" - for line in pipe: - logfile.write(line) - if self.config['debug'] or self.config['verbose']: - print line[:-1] - sys.stdout.flush() - logfile.flush() - output += line - status = pipe.close() - if status is None: - status = 0 - - if os.WIFEXITED(status): - retval = os.WEXITSTATUS(status) + output="" + (r,w) = os.pipe() + pid = os.fork() + if pid: #parent + rpid = ret = 0 + os.close(w) + oldhandler=signal.signal(signal.SIGALRM,alarmhandler) + starttime = time.time() + # timeout=0 means disable alarm signal. no timeout + signal.alarm(timeout)
- return (retval, output) + try: + # read output from child + r = os.fdopen(r, "r") + for line in r: + logfile.write(line) + if self.config['debug'] or self.config['verbose']: + print line[:-1] + sys.stdout.flush() + logfile.flush() + output += line + + # close read handle, get child return status, etc + r.close() + (rpid, ret) = os.waitpid(pid, 0) + signal.alarm(0) + signal.signal(signal.SIGALRM,oldhandler) + + except alarmExc: + os.kill(-pid, signal.SIGTERM) + time.sleep(1) + os.kill(-pid, signal.SIGKILL) + (rpid, ret) = os.waitpid(pid, 0) + signal.signal(signal.SIGALRM,oldhandler) + raise commandTimeoutExpired( "Timeout(%s) exceeded for command: %s" % (timeout, command)) + + # mask and return just return value, plus child output + return ((ret & 0xFF00) >> 8, output) + + else: #child + os.close(r) + # become process group leader so that our parent + # can kill our children + os.setpgrp() + + child = popen2.Popen4(command) + child.tochild.close() + + w = os.fdopen(w, "w") + for line in child.fromchild: + w.write(line) + w.close() + os._exit( (retval & 0xFF00) >> 8 ) +
- def do_chroot(self, command, fatal = False, exitcode=None): + def do_chroot(self, command, fatal = False, exitcode=None, timeout=0): """execute given command in root""" cmd = ""
@@ -538,7 +584,7 @@ class Root: self.rootdir, self.config['runuser'], command) - (ret, output) = self.do(cmd) + (ret, output) = self.do(cmd, timeout=timeout) if (ret != 0) and fatal: self.close() if exitcode: @@ -785,6 +831,8 @@ def command_parse(): default=False, help="Turn on build-root caching") parser.add_option("--rebuildcache", action ="store_true", dest="rebuild_cache", default=False, help="Force rebuild of build-root cache") + parser.add_option("--rpmbuild_timeout", action="store", dest="rpmbuild_timeout", type="int", + default=None, help="Fail build if rpmbuild takes longer than 'timeout' seconds ")
return parser.parse_args()
@@ -796,6 +844,7 @@ def setup_default_config_opts(config_opts): config_opts['rm'] = '/usr/sbin/mock-helper rm' config_opts['mknod'] = '/usr/sbin/mock-helper mknod' config_opts['yum'] = '/usr/sbin/mock-helper yum' + config_opts['rpmbuild_timeout'] = 0 config_opts['runuser'] = '/sbin/runuser' config_opts['chroot_setup_cmd'] = 'install buildsys-build' config_opts['chrootuser'] = 'mockbuild' @@ -852,6 +901,8 @@ def set_config_opts_per_cmdline(config_opts, options): config_opts['statedir'] = options.statedir if options.uniqueext: config_opts['unique-ext'] = options.uniqueext + if options.rpmbuild_timeout is not None: + config_opts['rpmbuild_timeout'] = options.rpmbuild_timeout
def do_clean(config_opts, init=0): my = None @@ -979,7 +1030,7 @@ def main():
# cmdline options override config options set_config_opts_per_cmdline(config_opts, options) - + # do whatever we're here to do if args[0] == 'clean': # unset a --no-clean
"MD" == Matt Domsch Matt_Domsch@dell.com writes:
MD> The latest culprits was a few perl modules that ran for >2 days MD> with no end in sight.
BTW, those are usually Module::Build-using modules which are running with unsatisfied build dependencies. They will stop and prompt for input when that happens.
- J<
buildsys@lists.fedoraproject.org