Hello Dan Kenigsberg, Tomáš Došek,
I'd like you to do a code review. Please visit
to review the following change.
Change subject: Introduce a maximum time limit a migration may take ......................................................................
Introduce a maximum time limit a migration may take
There are cases where migrations can take too much time. To set a finite end to this we're now introducing a new configuration value migration_max_time_per_gib_mem which defined the maximum time the migration may take per GiB memory. The default value of 64 seconds is calculated by 50% of current default value for the maximum bandwidth. (1 GiB needs 32 seconds on the maximum bandwidth)
When migration_max_time_per_gib_mem is set to `0` the functionality is disabled.
This time tracking is now handled in the MigrationMonitorThread on the source host side of the migration.
Change-Id: Ifd2f76b9334fcb7d2db24c081cccae15e8fd0b0c Bug-Url: https://bugzilla.redhat.com/1069220 Signed-off-by: Vinzenz Feenstra vfeenstr@redhat.com Reviewed-on: http://gerrit.ovirt.org/21708 Reviewed-by: Dan Kenigsberg danken@redhat.com Reviewed-by: Tomáš Došek tdosek@redhat.com --- M lib/vdsm/config.py.in M vdsm/vm.py 2 files changed, 35 insertions(+), 9 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/22/25022/1
diff --git a/lib/vdsm/config.py.in b/lib/vdsm/config.py.in index c85ab88..7d85232 100644 --- a/lib/vdsm/config.py.in +++ b/lib/vdsm/config.py.in @@ -60,6 +60,11 @@ 'Please note, that this is not overall migration timeout. ' 'Source waits twice as long (to avoid races).'),
+ ('migration_max_time_per_gib_mem', '64', + 'The maximum time in seconds per GiB memory a migration may take ' + 'before the migration will be aborted by the source host. ' + 'Setting this value to 0 will disable this feature.'), + ('migration_listener_timeout', '30', 'Time to wait (in seconds) for migration destination to start ' 'listening before migration begins.'), diff --git a/vdsm/vm.py b/vdsm/vm.py index f70b87c..4387daf 100644 --- a/vdsm/vm.py +++ b/vdsm/vm.py @@ -303,6 +303,7 @@
def run(self): try: + startTime = time.time() mstate = '' self._setupVdsConnection() self._setupRemoteMachineParams() @@ -320,7 +321,7 @@ 'dstparams': self._dstparams, 'dstqemu': self._dstqemu} self._vm.saveState() - self._startUnderlyingMigration() + self._startUnderlyingMigration(startTime) self._finishSuccessfully() except libvirt.libvirtError as e: if e.get_error_code() == libvirt.VIR_ERR_OPERATION_ABORTED: @@ -336,7 +337,7 @@ self._recover(str(e)) self.log.error("Failed to migrate", exc_info=True)
- def _startUnderlyingMigration(self): + def _startUnderlyingMigration(self, startTime): if self._mode == 'file': hooks.before_vm_hibernate(self._vm._dom.XMLDesc(0), self._vm.conf) try: @@ -378,7 +379,8 @@ self._vm._migrationTimeout() / 2)
if MigrationMonitorThread._MIGRATION_MONITOR_INTERVAL: - self._monitorThread = MigrationMonitorThread(self._vm) + self._monitorThread = MigrationMonitorThread(self._vm, + startTime) self._monitorThread.start()
try: @@ -755,10 +757,11 @@ _MIGRATION_MONITOR_INTERVAL = config.getint( 'vars', 'migration_monitor_interval') # seconds
- def __init__(self, vm): + def __init__(self, vm, startTime): super(MigrationMonitorThread, self).__init__() self._stop = threading.Event() self._vm = vm + self._startTime = startTime self.daemon = True self.data_progress = 0 self.mem_progress = 0 @@ -772,8 +775,13 @@
self._vm.log.debug('starting migration monitor thread')
+ memSize = int(self._vm.conf['memSize']) + maxTimePerGiB = config.getint('vars', + 'migration_max_time_per_gib_mem') + migrationMaxTime = (maxTimePerGiB * memSize + 1023) / 1024 lastProgressTime = time.time() smallest_dataRemaining = None + progress_timeout = config.getint('vars', 'migration_progress_timeout')
while not self._stop.isSet(): self._stop.wait(self._MIGRATION_MONITOR_INTERVAL) @@ -782,16 +790,29 @@ memTotal, memProcessed, memRemaining, fileTotal, fileProcessed, _) = self._vm._dom.jobInfo()
- if (smallest_dataRemaining is None or + remaining = dataRemaining + memRemaining + abort = False + now = time.time() + if 0 < migrationMaxTime < now - self._startTime: + self._vm.log.warn('The migration took %d seconds which is ' + 'exceeding the configured maximum time ' + 'for migrations of %d seconds. The ' + 'migration will be aborted.', + now - self._startTime, + migrationMaxTime) + abort = True + elif (smallest_dataRemaining is None or smallest_dataRemaining > dataRemaining): smallest_dataRemaining = dataRemaining - lastProgressTime = time.time() - elif (time.time() - lastProgressTime > - config.getint('vars', 'migration_timeout')): + lastProgressTime = now + elif (now - lastProgressTime) > progress_timeout: # Migration is stuck, abort self._vm.log.warn( 'Migration is stuck: Hasn't progressed in %s seconds. ' - 'Aborting.' % (time.time() - lastProgressTime)) + 'Aborting.' % (now - lastProgressTime)) + abort = True + + if abort: self._vm._dom.abortJob() self.stop() break
Yaniv Bronhaim has posted comments on this change.
Change subject: Introduce a maximum time limit a migration may take ......................................................................
Patch Set 4: Code-Review+2
Vinzenz Feenstra has posted comments on this change.
Change subject: Introduce a maximum time limit a migration may take ......................................................................
Patch Set 5: Verified+1
Vinzenz Feenstra has posted comments on this change.
Change subject: Introduce a maximum time limit a migration may take ......................................................................
Patch Set 5:
Please note that this is already in the 3.4 branch as http://gerrit.ovirt.org/gitweb?p=vdsm.git;a=commit;h=2d52f16076ba4b10c54cb15...
Yaniv Bronhaim has posted comments on this change.
Change subject: Introduce a maximum time limit a migration may take ......................................................................
Patch Set 5: Code-Review+2
Yaniv Bronhaim has submitted this change and it was merged.
Change subject: Introduce a maximum time limit a migration may take ......................................................................
Introduce a maximum time limit a migration may take
There are cases where migrations can take too much time. To set a finite end to this we're now introducing a new configuration value migration_max_time_per_gib_mem which defined the maximum time the migration may take per GiB memory. The default value of 64 seconds is calculated by 50% of current default value for the maximum bandwidth. (1 GiB needs 32 seconds on the maximum bandwidth)
When migration_max_time_per_gib_mem is set to `0` the functionality is disabled.
This time tracking is now handled in the MigrationMonitorThread on the source host side of the migration.
Change-Id: Ifd2f76b9334fcb7d2db24c081cccae15e8fd0b0c Bug-Url: https://bugzilla.redhat.com/1069731 Signed-off-by: Vinzenz Feenstra vfeenstr@redhat.com Reviewed-on: http://gerrit.ovirt.org/21708 Reviewed-by: Dan Kenigsberg danken@redhat.com Reviewed-by: Tomáš Došek tdosek@redhat.com Reviewed-on: http://gerrit.ovirt.org/25022 Reviewed-by: Yaniv Bronhaim ybronhei@redhat.com --- M lib/vdsm/config.py.in M vdsm/vm.py 2 files changed, 34 insertions(+), 9 deletions(-)
Approvals: Yaniv Bronhaim: Looks good to me, approved Vinzenz Feenstra: Verified
vdsm-patches@lists.fedorahosted.org