Hello Dan Kenigsberg, Milan Zamazal,
I'd like you to do a code review. Please visit
https://gerrit.ovirt.org/65814
to review the following change.
Change subject: vm: introduce a `monitorable' attribute ......................................................................
vm: introduce a `monitorable' attribute
In the change I3e61626625a2e0517d55dc61e361f3f5eb690c00 we fixed the stats_age reporting, in order to correctly report the real age of VM monitoring samples.
In the same change we acknowledged a possible change in behaviour: depending on the timing of operations and on the load of the host, VMs could be reported 'Unresponsive' for a little time while starting up or shutting down.
This information is tecnhically correct: the monitoring subsystem dutifully reports unknown sampling data, which is translated into outdated values, thus unresponsive VM, but it is misleading for both Engine and users.
There are two root causes here: 1. creation, destruction and monitoring of a VM are all operations which are asynchronous to each other, for both performance and historical reasons - and both reasons are still relevant today. 2. the semantic of this reporting (VM unresponsive for stats too old) is poorly defined, so the right thing is to keep backward compatibility, lacking better description.
To solve this issue we generalize the old VM.incomingMigrationPending attribute into a 'monitorable' attribute. A VM is monitorable if it is in a meaningful state, so after the creation process is completed, or before the shutdown process is initiated.
We should adjust responsiveness for stats too old only if a VM is monitorable, not inconditionally. Otherwise we can have misreports on slow startup or slow shutdowns.
Please note that the `monitorable' state share similarities with the VM state (VM.lastStatus attribute), but the two concepts overlap only for a small part. Lacking a precise definition, we use two independent variables to track those attributes.
Change-Id: Idb12277a5f0fdaf680032af12fa7c104c3bd6ffb Backport-To: 4.0 Backport-To: 3.6 Bug-Url: https://bugzilla.redhat.com/1382578 Bug-Url: https://bugzilla.redhat.com/1382583 Signed-off-by: Francesco Romani fromani@redhat.com Reviewed-on: https://gerrit.ovirt.org/65590 Reviewed-by: Milan Zamazal mzamazal@redhat.com Continuous-Integration: Jenkins CI Reviewed-by: Dan Kenigsberg danken@redhat.com --- M lib/vdsm/virt/periodic.py M lib/vdsm/virt/vmstats.py M tests/periodicTests.py M tests/vmStatsTests.py M vdsm/virt/vm.py 5 files changed, 67 insertions(+), 27 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/14/65814/1
diff --git a/lib/vdsm/virt/periodic.py b/lib/vdsm/virt/periodic.py index 00229f4..661f9bf 100644 --- a/lib/vdsm/virt/periodic.py +++ b/lib/vdsm/virt/periodic.py @@ -290,7 +290,7 @@ def required(self): # disable everything until migration destination VM # isn't fully started, to avoid false positives log spam. - return not self._vm.incomingMigrationPending() + return self._vm.monitorable
@property def runnable(self): diff --git a/lib/vdsm/virt/vmstats.py b/lib/vdsm/virt/vmstats.py index ed424cc..eb27345 100644 --- a/lib/vdsm/virt/vmstats.py +++ b/lib/vdsm/virt/vmstats.py @@ -513,7 +513,7 @@ try: yield except KeyError: - if vm_obj.incomingMigrationPending(): + if not vm_obj.monitorable: # If a VM is migration destination, # libvirt doesn't give any disk stat. pass diff --git a/tests/periodicTests.py b/tests/periodicTests.py index 6c988cf..b48ed18 100644 --- a/tests/periodicTests.py +++ b/tests/periodicTests.py @@ -198,6 +198,15 @@ VM_NUM = 5 # just a number, no special meaning
+VM_IDS = [ + [()], + [((0,))], + [((0, 2))], + [((VM_NUM-1,))], + [((VM_NUM-2, VM_NUM-1))] +] + + @expandPermutations class VmDispatcherTests(TestCaseBase):
@@ -208,31 +217,23 @@
_Visitor.VMS.clear()
- @permutations([[()], - [((0,))], - [((0, 2))], - [((VM_NUM-1,))], - [((VM_NUM-2, VM_NUM-1))]]) + @permutations(VM_IDS) def test_dispatch(self, failed_ids): for i in failed_ids: with self.cif.vmContainerLock: vm_id = _fake_vm_id(i) self.cif.vmContainer[vm_id].fail_required = True
- op = periodic.VmDispatcher( - self.cif.getVMs, _FakeExecutor(), _Visitor, 0) - # we don't care about executor (hence the simplistic fake) - op() + self._check_dispatching(failed_ids)
- vms = self.cif.getVMs() - expected = ( - set(vms.keys()) - - set(_fake_vm_id(i) for i in failed_ids)) - for vm_id in expected: - self.assertEqual(_Visitor.VMS.get(vm_id), 1) + @permutations(VM_IDS) + def test_skip_not_monitorable(self, unmonitorable_ids): + for i in unmonitorable_ids: + with self.cif.vmContainerLock: + vm_id = _fake_vm_id(i) + self.cif.vmContainer[vm_id].monitorable = False
- for vm_id in failed_ids: - self.assertEqual(_Visitor.VMS.get(vm_id), None) + self._check_dispatching(unmonitorable_ids)
def test_dispatch_fails(self): """ @@ -249,6 +250,24 @@
self.assertEqual(set(skipped), set(self.cif.getVMs().keys())) + + def _check_dispatching(self, skip_ids): + op = periodic.VmDispatcher( + self.cif.getVMs, _FakeExecutor(), _Visitor, 0) + # we don't care about executor (hence the simplistic fake) + op() + + for vm_id in skip_ids: + self.assertNotIn(_fake_vm_id(vm_id), _Visitor.VMS) + + vms = self.cif.getVMs() + + expected = ( + set(vms.keys()) - + set(_fake_vm_id(i) for i in skip_ids) + ) + for vm_id in expected: + self.assertEqual(_Visitor.VMS.get(vm_id), 1)
def _make_fake_vms(self): for i in range(VM_NUM): @@ -306,13 +325,13 @@ def required(self): if getattr(self._vm, 'fail_required', False): raise ValueError('required failed') - return True + return super(_Visitor, self).required
@property def runnable(self): if getattr(self._vm, 'fail_runnable', False): raise ValueError('runnable failed') - return True + return super(_Visitor, self).runnable
def _execute(self): _Visitor.VMS[self._vm.id] += 1 @@ -362,6 +381,10 @@ self.name = vmName self.migrating = False self.lastStatus = vmstatus.UP + self.monitorable = True + + def isDomainReadyForCommands(self): + return True
def isMigrating(self): return self.migrating diff --git a/tests/vmStatsTests.py b/tests/vmStatsTests.py index 7c385ba..fa213a7 100644 --- a/tests/vmStatsTests.py +++ b/tests/vmStatsTests.py @@ -582,8 +582,9 @@ self.drives = drives if drives is not None else [] self.migrationPending = False
- def incomingMigrationPending(self): - return self.migrationPending + @property + def monitorable(self): + return not self.migrationPending
def getNicDevices(self): return self.nics diff --git a/vdsm/virt/vm.py b/vdsm/virt/vm.py index 07be2e6..395b85a 100644 --- a/vdsm/virt/vm.py +++ b/vdsm/virt/vm.py @@ -312,6 +312,13 @@ self._numaInfo = {} self._vmJobs = None self._clientPort = '' + self._monitorable = False + + @property + def monitorable(self): + if 'migrationDest' in self.conf or 'restoreState' in self.conf: + return False + return self._monitorable
@property def start_time(self): @@ -769,9 +776,6 @@ if acquired: self.log.debug('Releasing incoming migration semaphore') migration.incomingMigrations.release() - - def incomingMigrationPending(self): - return 'migrationDest' in self.conf or 'restoreState' in self.conf
def stopDisksStatsCollection(self): self._volumesPrepared = False @@ -1411,12 +1415,19 @@ stats['migrationProgress'] = self.migrateStatus()['progress']
try: + # Prevent races with the creation thread. + # Mimicing < 3.5 code, the creation thread marks the VM as + # monitorable only after the stats cache is initialized. + # Here we need to do the reverse: check first if a VM is + # monitorable, and only if it is, consider the stats_age. + monitorable = self._monitorable vm_sample = sampling.stats_cache.get(self.id) decStats = vmstats.produce(self, vm_sample.first_value, vm_sample.last_value, vm_sample.interval) - self._setUnresponsiveIfTimeout(stats, vm_sample.stats_age) + if monitorable: + self._setUnresponsiveIfTimeout(stats, vm_sample.stats_age) except Exception: self.log.exception("Error fetching vm stats") else: @@ -1823,6 +1834,7 @@
self._guestEventTime = self._startTime sampling.stats_cache.add(self.id) + self._monitorable = True try: self.guestAgent.start() except Exception: @@ -4085,6 +4097,10 @@ nic.portMirroring.remove(network)
self.log.info('Release VM resources') + # this must be done *before* self._cleanupStatsCache() to preserve + # the invariant: if a VM is monitorable, it has a stats cache + # entry, to avoid false positives when reporting stats too old. + self._monitorable = False self.lastStatus = vmstatus.POWERING_DOWN # Terminate the VM's creation thread. self._incomingMigrationFinished.set()
gerrit-hooks has posted comments on this change.
Change subject: vm: introduce a `monitorable' attribute ......................................................................
Patch Set 1:
* Update Tracker::#1382578::IGNORE, not relevant for Red Hat classification * Update Tracker::#1382583::IGNORE, not relevant for Red Hat classification * Check Bug-Url::IGNORE, not relevant for 'Red Hat' classification * Check Bug-Url::IGNORE, not relevant for 'Red Hat' classification * Check Public Bug::#1382578::OK, public bug * Check Public Bug::#1382583::OK, public bug * Check merged to previous::OK, change not open on any previous branch
Milan Zamazal has posted comments on this change.
Change subject: vm: introduce a `monitorable' attribute ......................................................................
Patch Set 1: Code-Review+1
vdsm-patches@lists.fedorahosted.org