Vinzenz Feenstra has uploaded a new change for review.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
vdsm: Reduce the migration progress timeout
The progress of the migration should indicate a stuck migration way before the currently configured migration_timeout, which has by default 300 seconds.
Half of the time should be more than enough for now.
This commit introduces the migration_progress_timeout configuration value to be able adjusting this value.
Change-Id: I8f314d70b8f32cfff58f9776bcc2182a748a9b67 Signed-off-by: Vinzenz Feenstra vfeenstr@redhat.com --- M lib/vdsm/config.py.in M vdsm/vm.py 2 files changed, 7 insertions(+), 2 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/02/16602/1
diff --git a/lib/vdsm/config.py.in b/lib/vdsm/config.py.in index 2a5618a..d95c78d 100644 --- a/lib/vdsm/config.py.in +++ b/lib/vdsm/config.py.in @@ -52,6 +52,11 @@ 'Please note, that this is not overall migration timeout. ' 'Source waits twice as long (to avoid races).'),
+ ('migration_progress_timeout', '150', + 'Maximum time the source host waits during a migration in case ' + 'that there is no progress. If the time has passed, the migration ' + 'will be aborted.'), + ('migration_listener_timeout', '30', 'Time to wait (in seconds) for migration destination to start ' 'listening before migration begins.'), diff --git a/vdsm/vm.py b/vdsm/vm.py index 281c584..309a1bd 100644 --- a/vdsm/vm.py +++ b/vdsm/vm.py @@ -746,6 +746,7 @@
lastProgressTime = time.time() smallest_dataRemaining = None + progress_timeout = config.getint('vars', 'migration_progress_timeout')
while not self._stop.isSet(): self._stop.wait(self._MIGRATION_MONITOR_INTERVAL) @@ -758,8 +759,7 @@ smallest_dataRemaining > dataRemaining): smallest_dataRemaining = dataRemaining lastProgressTime = time.time() - elif (time.time() - lastProgressTime > - config.getint('vars', 'migration_timeout')): + elif (time.time() - lastProgressTime) > progress_timeout: # Migration is stuck, abort self._vm.log.warn( 'Migration is stuck: Hasn't progressed in %s seconds. '
oVirt Jenkins CI Server has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 1:
Build Successful
http://jenkins.ovirt.org/job/vdsm_unit_tests_gerrit/3253/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_pep8_gerrit/3174/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_unit_tests_gerrit_el/2364/ : SUCCESS
Vinzenz Feenstra has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 1: Verified
Michal Skrivanek has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 1: Code-Review+1
Vinzenz Feenstra has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 2: Verified+1
oVirt Jenkins CI Server has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 2:
Build Successful
http://jenkins.ovirt.org/job/vdsm_unit_tests_gerrit/4399/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_unit_tests_gerrit_el/3502/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_pep8_gerrit/4318/ : SUCCESS
oVirt Jenkins CI Server has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 3: Verified-1
Build Failed
http://jenkins.ovirt.org/job/vdsm_unit_tests_gerrit/4974/ : ABORTED
http://jenkins.ovirt.org/job/vdsm_unit_tests_gerrit_el/4088/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_pep8_gerrit/4898/ : SUCCESS
oVirt Jenkins CI Server has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 4:
Build Successful
http://jenkins.ovirt.org/job/vdsm_unit_tests_gerrit_el/4710/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_pep8_gerrit/5510/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_unit_tests_gerrit/5589/ : SUCCESS
Vinzenz Feenstra has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 4: Verified+1
Michal Skrivanek has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 4:
(1 comment)
.................................................... File vdsm/vm.py Line 747: if (smallest_dataRemaining is None or Line 748: smallest_dataRemaining > dataRemaining): Line 749: smallest_dataRemaining = dataRemaining Line 750: lastProgressTime = time.time() Line 751: elif (time.time() - lastProgressTime) > progress_timeout: what about migration_timeout now? Line 752: # Migration is stuck, abort Line 753: self._vm.log.warn( Line 754: 'Migration is stuck: Hasn't progressed in %s seconds. ' Line 755: 'Aborting.' % (time.time() - lastProgressTime))
Vinzenz Feenstra has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 4:
(1 comment)
.................................................... File vdsm/vm.py Line 747: if (smallest_dataRemaining is None or Line 748: smallest_dataRemaining > dataRemaining): Line 749: smallest_dataRemaining = dataRemaining Line 750: lastProgressTime = time.time() Line 751: elif (time.time() - lastProgressTime) > progress_timeout: It is still used here:
3472 elif 'migrationDest' in self.conf:
3473 timeout = config.getint('vars', 'migration_timeout')
3474 self.log.debug("Waiting %s seconds for end of migration" % timeout)
3475 self._incomingMigrationFinished.wait(timeout)
and here:
2404 def _migrationTimeout(self):
2405 timeout = config.getint('vars', 'migration_timeout')
2406 mem = int(self.conf['memSize'])
2407 if mem > 2048:
2408 timeout = timeout * mem / 2048
2409 return timeout Line 752: # Migration is stuck, abort Line 753: self._vm.log.warn( Line 754: 'Migration is stuck: Hasn't progressed in %s seconds. ' Line 755: 'Aborting.' % (time.time() - lastProgressTime))
Michal Skrivanek has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 4:
(1 comment)
.................................................... File vdsm/vm.py Line 747: if (smallest_dataRemaining is None or Line 748: smallest_dataRemaining > dataRemaining): Line 749: smallest_dataRemaining = dataRemaining Line 750: lastProgressTime = time.time() Line 751: elif (time.time() - lastProgressTime) > progress_timeout: yeah…but it seems to me it's not really canceling the migration anywhere. Line 752: # Migration is stuck, abort Line 753: self._vm.log.warn( Line 754: 'Migration is stuck: Hasn't progressed in %s seconds. ' Line 755: 'Aborting.' % (time.time() - lastProgressTime))
Federico Simoncelli has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 4:
(1 comment)
.................................................... File lib/vdsm/config.py.in Line 50: 'recognized by kvm/qemu if a coma separated list given then a ' Line 51: 'NIC per device will be created.'), Line 52: Line 53: ('migration_timeout', '300', Line 54: 'Maximum time the destination waits since migration is stalled. ' This description is no longer accurate. Line 55: 'Please note, that this is not overall migration timeout. ' Line 56: 'Source waits twice as long (to avoid races).'), Line 57: Line 58: ('migration_progress_timeout', '150',
Michal Skrivanek has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 4:
(1 comment)
.................................................... File lib/vdsm/config.py.in Line 51: 'NIC per device will be created.'), Line 52: Line 53: ('migration_timeout', '300', Line 54: 'Maximum time the destination waits since migration is stalled. ' Line 55: 'Please note, that this is not overall migration timeout. ' please revert 6f52253540d9884b173cc5e91d275f38a4ea0c19 Line 56: 'Source waits twice as long (to avoid races).'), Line 57: Line 58: ('migration_progress_timeout', '150', Line 59: 'Maximum time the source host waits during a migration in case '
oVirt Jenkins CI Server has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 5:
Build Successful
http://jenkins.ovirt.org/job/vdsm_pep8_gerrit/6101/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_unit_tests_gerrit_el/5314/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_unit_tests_gerrit/6204/ : SUCCESS
oVirt Jenkins CI Server has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 6:
Build Successful
http://jenkins.ovirt.org/job/vdsm_pep8_gerrit/6102/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_unit_tests_gerrit_el/5315/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_unit_tests_gerrit/6205/ : SUCCESS
oVirt Jenkins CI Server has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 7:
Build Successful
http://jenkins.ovirt.org/job/vdsm_pep8_gerrit/6108/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_unit_tests_gerrit_el/5321/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_unit_tests_gerrit/6211/ : SUCCESS
oVirt Jenkins CI Server has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 8:
Build Successful
http://jenkins.ovirt.org/job/vdsm_unit_tests_gerrit/6497/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_unit_tests_gerrit_el/5604/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_pep8_gerrit/6410/ : SUCCESS
Vinzenz Feenstra has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 8: Verified+1
Dan Kenigsberg has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 8: Code-Review-1
(1 comment)
.................................................... File lib/vdsm/config.py.in Line 54: 'recognized by kvm/qemu if a coma separated list given then a ' Line 55: 'NIC per device will be created.'), Line 56: Line 57: ('migration_timeout', '300', Line 58: 'Migration Source Host: Uses at least 1/2 of this value for VMs ' are we sure we want to document this crazy-looking logic? And why is it related to the subject of this patch? Line 59: 'with less than 2 GiB of RAM otherwise 1/4 of this value per GiB ' Line 60: 'RAM for calculating the delay before setting/increasing the ' Line 61: 'migration_downtime.' Line 62: 'Migration Destination Host: Maximum time the destination waits '
Vinzenz Feenstra has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 8:
(1 comment)
.................................................... File lib/vdsm/config.py.in Line 54: 'recognized by kvm/qemu if a coma separated list given then a ' Line 55: 'NIC per device will be created.'), Line 56: Line 57: ('migration_timeout', '300', Line 58: 'Migration Source Host: Uses at least 1/2 of this value for VMs ' It's relevant to this patch because it was wrong before and I am modifying the other part of it. It's not clear over the list of patches what I am removing if I am leaving this wrong documentation in place.
And this crazy looking logic for whatever reason is what has been implemented, I am only documentation as it is right now. However if you look at the needed by part of this patchset you'll see that there is a bunch more which in the end will change all of this documentation and I hope it's more clear in the end. Line 59: 'with less than 2 GiB of RAM otherwise 1/4 of this value per GiB ' Line 60: 'RAM for calculating the delay before setting/increasing the ' Line 61: 'migration_downtime.' Line 62: 'Migration Destination Host: Maximum time the destination waits '
Dan Kenigsberg has posted comments on this change.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
Patch Set 8: Code-Review+2
(1 comment)
.................................................... File lib/vdsm/config.py.in Line 54: 'recognized by kvm/qemu if a coma separated list given then a ' Line 55: 'NIC per device will be created.'), Line 56: Line 57: ('migration_timeout', '300', Line 58: 'Migration Source Host: Uses at least 1/2 of this value for VMs ' Not every bit of current logic should be documented. Particularly when we do NOT want people to depend on this over-complex craziness.
Anyway, I'll take this in - only since it's being fixed down the branch. Line 59: 'with less than 2 GiB of RAM otherwise 1/4 of this value per GiB ' Line 60: 'RAM for calculating the delay before setting/increasing the ' Line 61: 'migration_downtime.' Line 62: 'Migration Destination Host: Maximum time the destination waits '
Dan Kenigsberg has submitted this change and it was merged.
Change subject: vdsm: Reduce the migration progress timeout ......................................................................
vdsm: Reduce the migration progress timeout
The progress of the migration should indicate a stuck migration way before the currently configured migration_timeout, which has by default 300 seconds.
Half of the time should be more than enough for now.
This commit introduces the migration_progress_timeout configuration value to be able adjusting this value.
Change-Id: I8f314d70b8f32cfff58f9776bcc2182a748a9b67 Signed-off-by: Vinzenz Feenstra vfeenstr@redhat.com Reviewed-on: http://gerrit.ovirt.org/16602 Reviewed-by: Dan Kenigsberg danken@redhat.com --- M lib/vdsm/config.py.in M vdsm/vm.py 2 files changed, 14 insertions(+), 5 deletions(-)
Approvals: Vinzenz Feenstra: Verified Dan Kenigsberg: Looks good to me, approved
vdsm-patches@lists.fedorahosted.org