Change in vdsm[ovirt-4.0]: vm: introduce a `monitorable' attribute - vdsm-patches - Fedora mailing-lists

27 Oct 2016

Hello Dan Kenigsberg, Milan Zamazal,
I'd like you to do a code review.  Please visit
https://gerrit.ovirt.org/65814
to review the following change.
Change subject: vm: introduce a `monitorable' attribute
......................................................................
vm: introduce a `monitorable' attribute
In the change I3e61626625a2e0517d55dc61e361f3f5eb690c00
we fixed the stats_age reporting, in order to correctly report
the real age of VM monitoring samples.
In the same change we acknowledged a possible change in behaviour:
depending on the timing of operations and on the load of the host,
VMs could be reported 'Unresponsive' for a little time while
starting up or shutting down.
This information is tecnhically correct: the monitoring subsystem
dutifully reports unknown sampling data, which is translated into
outdated values, thus unresponsive VM, but it is misleading for
both Engine and users.
There are two root causes here:
1. creation, destruction and monitoring of a VM are all operations
   which are asynchronous to each other, for both performance and
   historical reasons - and both reasons are still relevant today.
2. the semantic of this reporting (VM unresponsive for stats too old)
   is poorly defined, so the right thing is to keep backward
   compatibility, lacking better description.
To solve this issue we generalize the old VM.incomingMigrationPending
attribute into a 'monitorable' attribute.
A VM is monitorable if it is in a meaningful state, so after the
creation process is completed, or before the shutdown process is
initiated.
We should adjust responsiveness for stats too old
only if a VM is monitorable, not inconditionally.
Otherwise we can have misreports on slow startup
or slow shutdowns.
Please note that the `monitorable' state share similarities with
the VM state (VM.lastStatus attribute), but the two concepts
overlap only for a small part.
Lacking a precise definition, we use two independent variables
to track those attributes.
Change-Id: Idb12277a5f0fdaf680032af12fa7c104c3bd6ffb
Backport-To: 4.0
Backport-To: 3.6
Bug-Url: https://bugzilla.redhat.com/1382578
Bug-Url: https://bugzilla.redhat.com/1382583
Signed-off-by: Francesco Romani fromani@redhat.com
Reviewed-on: https://gerrit.ovirt.org/65590
Reviewed-by: Milan Zamazal mzamazal@redhat.com
Continuous-Integration: Jenkins CI
Reviewed-by: Dan Kenigsberg danken@redhat.com
---
M lib/vdsm/virt/periodic.py
M lib/vdsm/virt/vmstats.py
M tests/periodicTests.py
M tests/vmStatsTests.py
M vdsm/virt/vm.py
5 files changed, 67 insertions(+), 27 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/14/65814/1

diff --git a/lib/vdsm/virt/periodic.py b/lib/vdsm/virt/periodic.py
index 00229f4..661f9bf 100644
--- a/lib/vdsm/virt/periodic.py
+++ b/lib/vdsm/virt/periodic.py
@@ -290,7 +290,7 @@
     def required(self):
         # disable everything until migration destination VM
         # isn't fully started, to avoid false positives log spam.
-        return not self._vm.incomingMigrationPending()
+        return self._vm.monitorable
@property
     def runnable(self):
diff --git a/lib/vdsm/virt/vmstats.py b/lib/vdsm/virt/vmstats.py
index ed424cc..eb27345 100644
--- a/lib/vdsm/virt/vmstats.py
+++ b/lib/vdsm/virt/vmstats.py
@@ -513,7 +513,7 @@
     try:
         yield
     except KeyError:
-        if vm_obj.incomingMigrationPending():
+        if not vm_obj.monitorable:
             # If a VM is migration destination,
             # libvirt doesn't give any disk stat.
             pass
diff --git a/tests/periodicTests.py b/tests/periodicTests.py
index 6c988cf..b48ed18 100644
--- a/tests/periodicTests.py
+++ b/tests/periodicTests.py
@@ -198,6 +198,15 @@
 VM_NUM = 5  # just a number, no special meaning
+VM_IDS = [
+    [()],
+    [((0,))],
+    [((0, 2))],
+    [((VM_NUM-1,))],
+    [((VM_NUM-2, VM_NUM-1))]
+]
+
+
 @expandPermutations
 class VmDispatcherTests(TestCaseBase):
@@ -208,31 +217,23 @@
_Visitor.VMS.clear()
-    @permutations([[()],
-                   [((0,))],
-                   [((0, 2))],
-                   [((VM_NUM-1,))],
-                   [((VM_NUM-2, VM_NUM-1))]])
+    @permutations(VM_IDS)
     def test_dispatch(self, failed_ids):
         for i in failed_ids:
             with self.cif.vmContainerLock:
                 vm_id = _fake_vm_id(i)
                 self.cif.vmContainer[vm_id].fail_required = True
-        op = periodic.VmDispatcher(
-            self.cif.getVMs, _FakeExecutor(), _Visitor, 0)
-        # we don't care about executor (hence the simplistic fake)
-        op()
+        self._check_dispatching(failed_ids)
-        vms = self.cif.getVMs()
-        expected = (
-            set(vms.keys()) -
-            set(_fake_vm_id(i) for i in failed_ids))
-        for vm_id in expected:
-            self.assertEqual(_Visitor.VMS.get(vm_id), 1)
+    @permutations(VM_IDS)
+    def test_skip_not_monitorable(self, unmonitorable_ids):
+        for i in unmonitorable_ids:
+            with self.cif.vmContainerLock:
+                vm_id = _fake_vm_id(i)
+                self.cif.vmContainer[vm_id].monitorable = False
-        for vm_id in failed_ids:
-            self.assertEqual(_Visitor.VMS.get(vm_id), None)
+        self._check_dispatching(unmonitorable_ids)
def test_dispatch_fails(self):
         """
@@ -249,6 +250,24 @@
self.assertEqual(set(skipped),
                          set(self.cif.getVMs().keys()))
+
+    def _check_dispatching(self, skip_ids):
+        op = periodic.VmDispatcher(
+            self.cif.getVMs, _FakeExecutor(), _Visitor, 0)
+        # we don't care about executor (hence the simplistic fake)
+        op()
+
+        for vm_id in skip_ids:
+            self.assertNotIn(_fake_vm_id(vm_id), _Visitor.VMS)
+
+        vms = self.cif.getVMs()
+
+        expected = (
+            set(vms.keys()) -
+            set(_fake_vm_id(i) for i in skip_ids)
+        )
+        for vm_id in expected:
+            self.assertEqual(_Visitor.VMS.get(vm_id), 1)
def _make_fake_vms(self):
         for i in range(VM_NUM):
@@ -306,13 +325,13 @@
     def required(self):
         if getattr(self._vm, 'fail_required', False):
             raise ValueError('required failed')
-        return True
+        return super(_Visitor, self).required
@property
     def runnable(self):
         if getattr(self._vm, 'fail_runnable', False):
             raise ValueError('runnable failed')
-        return True
+        return super(_Visitor, self).runnable
def _execute(self):
         _Visitor.VMS[self._vm.id] += 1
@@ -362,6 +381,10 @@
         self.name = vmName
         self.migrating = False
         self.lastStatus = vmstatus.UP
+        self.monitorable = True
+
+    def isDomainReadyForCommands(self):
+        return True
def isMigrating(self):
         return self.migrating
diff --git a/tests/vmStatsTests.py b/tests/vmStatsTests.py
index 7c385ba..fa213a7 100644
--- a/tests/vmStatsTests.py
+++ b/tests/vmStatsTests.py
@@ -582,8 +582,9 @@
         self.drives = drives if drives is not None else []
         self.migrationPending = False
-    def incomingMigrationPending(self):
-        return self.migrationPending
+    @property
+    def monitorable(self):
+        return not self.migrationPending
def getNicDevices(self):
         return self.nics
diff --git a/vdsm/virt/vm.py b/vdsm/virt/vm.py
index 07be2e6..395b85a 100644
--- a/vdsm/virt/vm.py
+++ b/vdsm/virt/vm.py
@@ -312,6 +312,13 @@
         self._numaInfo = {}
         self._vmJobs = None
         self._clientPort = ''
+        self._monitorable = False
+
+    @property
+    def monitorable(self):
+        if 'migrationDest' in self.conf or 'restoreState' in self.conf:
+            return False
+        return self._monitorable
@property
     def start_time(self):
@@ -769,9 +776,6 @@
             if acquired:
                 self.log.debug('Releasing incoming migration semaphore')
                 migration.incomingMigrations.release()
-
-    def incomingMigrationPending(self):
-        return 'migrationDest' in self.conf or 'restoreState' in self.conf
def stopDisksStatsCollection(self):
         self._volumesPrepared = False
@@ -1411,12 +1415,19 @@
             stats['migrationProgress'] = self.migrateStatus()['progress']
try:
+            # Prevent races with the creation thread.
+            # Mimicing < 3.5 code, the creation thread marks the VM as
+            # monitorable only after the stats cache is initialized.
+            # Here we need to do the reverse: check first if a VM is
+            # monitorable, and only if it is, consider the stats_age.
+            monitorable = self._monitorable
             vm_sample = sampling.stats_cache.get(self.id)
             decStats = vmstats.produce(self,
                                        vm_sample.first_value,
                                        vm_sample.last_value,
                                        vm_sample.interval)
-            self._setUnresponsiveIfTimeout(stats, vm_sample.stats_age)
+            if monitorable:
+                self._setUnresponsiveIfTimeout(stats, vm_sample.stats_age)
         except Exception:
             self.log.exception("Error fetching vm stats")
         else:
@@ -1823,6 +1834,7 @@
self._guestEventTime = self._startTime
         sampling.stats_cache.add(self.id)
+        self._monitorable = True
         try:
             self.guestAgent.start()
         except Exception:
@@ -4085,6 +4097,10 @@
                         nic.portMirroring.remove(network)
self.log.info('Release VM resources')
+            # this must be done *before* self._cleanupStatsCache() to preserve
+            # the invariant: if a VM is monitorable, it has a stats cache
+            # entry, to avoid false positives when reporting stats too old.
+            self._monitorable = False
             self.lastStatus = vmstatus.POWERING_DOWN
             # Terminate the VM's creation thread.
             self._incomingMigrationFinished.set()
-- 
To view, visit https://gerrit.ovirt.org/65814
To unsubscribe, visit https://gerrit.ovirt.org/settings

Gerrit-MessageType: newchange
Gerrit-Change-Id: Idb12277a5f0fdaf680032af12fa7c104c3bd6ffb
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: ovirt-4.0
Gerrit-Owner: Francesco Romani fromani@redhat.com
Gerrit-Reviewer: Dan Kenigsberg danken@redhat.com
Gerrit-Reviewer: Milan Zamazal mzamazal@redhat.com