Nir Soffer has uploaded a new change for review.
Change subject: vm: Continue to sample after errors
......................................................................
vm: Continue to sample after errors
When vm is running, we monitor disk usage, and if the disk becomes too
full, we extend the disk. This avoid pausing of the vm after io errors.
However, when sampling vm with multiple disks, an error when sampling
one disk exit the sampling function and skip the next disks, making this
machnisim useless.
This patch logs exceptions raised when sampling one disk and continue to
sample others.
This patch is for ovirt-3.3.1 only - master patch must be different
because of recent refactoring in this area.
Change-Id: I8dbe60a4d3b216a5cd998d163407c09b12f2f28c
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
---
M vdsm/vm.py
1 file changed, 14 insertions(+), 10 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/75/22575/1
diff --git a/vdsm/vm.py b/vdsm/vm.py
index 90ab5a7..2d0cece 100644
--- a/vdsm/vm.py
+++ b/vdsm/vm.py
@@ -506,20 +506,24 @@
return
for vmDrive in self._vm._devices[DISK_DEVICES]:
- if not vmDrive.isExtendable():
- continue
+ try:
+ if not vmDrive.isExtendable():
+ continue
- capacity, alloc, physical = \
- self._vm._dom.blockInfo(vmDrive.path, 0)
+ capacity, alloc, physical = \
+ self._vm._dom.blockInfo(vmDrive.path, 0)
- if physical - alloc >= vmDrive.watermarkLimit:
- continue
+ if physical - alloc >= vmDrive.watermarkLimit:
+ continue
- self._log.info('%s/%s apparent: %s capacity: %s, alloc: %s, '
- 'phys: %s', vmDrive.domainID, vmDrive.volumeID,
- vmDrive.apparentsize, capacity, alloc, physical)
+ self._log.info('%s/%s apparent: %s capacity: %s, alloc: %s, '
+ 'phys: %s', vmDrive.domainID, vmDrive.volumeID,
+ vmDrive.apparentsize, capacity, alloc, physical)
- self._vm.extendDriveVolume(vmDrive)
+ self._vm.extendDriveVolume(vmDrive)
+ except Exception:
+ self._log.exception("%s/%s", vmDrive.domainID,
+ vmDrive.volumeID)
def _updateVolumes(self):
if not self._vm.isDisksStatsCollectionEnabled():
--
To view, visit http://gerrit.ovirt.org/22575
To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8dbe60a4d3b216a5cd998d163407c09b12f2f28c
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: ovirt-3.3
Gerrit-Owner: Nir Soffer <nsoffer(a)redhat.com>
Nir Soffer has uploaded a new change for review.
Change subject: vm: Fix attribute error when accessing drive in sampling method
......................................................................
vm: Fix attribute error when accessing drive in sampling method
Du to race when migration is finished and monitoring, drive may not have
a format attribute when accessing it from the monitor. This patch use
getattr to log spam.
Change-Id: Ia50e8af94b9c9b54332066a3f30999ce73e7a56f
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
---
M vdsm/vm.py
1 file changed, 2 insertions(+), 1 deletion(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/18/22518/1
diff --git a/vdsm/vm.py b/vdsm/vm.py
index bb4a7ec..7e2d220 100644
--- a/vdsm/vm.py
+++ b/vdsm/vm.py
@@ -506,7 +506,8 @@
return
for vmDrive in self._vm._devices[DISK_DEVICES]:
- if not vmDrive.blockDev or vmDrive.format != 'cow':
+ # Note: drive may not have a format attribute during migration
+ if not vmDrive.blockDev or getattr(vmDrive, 'format', None) != 'cow':
continue
capacity, alloc, physical = \
--
To view, visit http://gerrit.ovirt.org/22518
To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia50e8af94b9c9b54332066a3f30999ce73e7a56f
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: ovirt-3.3
Gerrit-Owner: Nir Soffer <nsoffer(a)redhat.com>
Dan Kenigsberg has uploaded a new change for review.
Change subject: sampling: use constants for counter bounds
......................................................................
sampling: use constants for counter bounds
When we report cpu and network usage, we take two samples of Linux
counters, and divide their difference by the elapsed time. If a sampled
counter wraps around its upper bound, we might report an invalid
negative value. To avoid that, we take the modulu of the difference.
For example, assume that the first sample was (2**64 - 10) jiffies and
30 jiffies have passed until the second sample, the difference would be
the hugely negative value (30 - 2**64). Taking modulu 2**64 returns the
correct value of 30 jiffies.
JIFFIES_BOUND is taken from the size of clock_t and NETSTATS_BOUND -
from the size of the fields of struct net_device_stats. I am not aware
of any programmatic way to acquire this value, but they are both of 64
bit size on x86_64 and ppc64.
Taking modulu 2**32 works perfectly well, since two subsequent samples
are unlikly to be that far apart, and it has the benefit of working well
on a 32 bit host, too.
Change-Id: I706000106c3bc31edf8541c980bce1f49464ebf8
Signed-off-by: Dan Kenigsberg <danken(a)redhat.com>
---
M vdsm/sampling.py
M vdsm/vm.py
2 files changed, 11 insertions(+), 8 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/94/24194/1
diff --git a/vdsm/sampling.py b/vdsm/sampling.py
index 54b6381..a5492ce 100644
--- a/vdsm/sampling.py
+++ b/vdsm/sampling.py
@@ -42,6 +42,9 @@
if not os.path.exists(_THP_STATE_PATH):
_THP_STATE_PATH = '/sys/kernel/mm/redhat_transparent_hugepage/enabled'
+JIFFIES_BOUND = 2 ** 32
+NETSTATS_BOUND = 2 ** 32
+
class InterfaceSample:
"""
@@ -430,14 +433,14 @@
return stats
hs0, hs1 = self._samples[0], self._samples[-1]
interval = hs1.timestamp - hs0.timestamp
- jiffies = (hs1.pidcpu.user - hs0.pidcpu.user) % (2 ** 32)
+ jiffies = (hs1.pidcpu.user - hs0.pidcpu.user) % JIFFIES_BOUND
stats['cpuUserVdsmd'] = (jiffies / interval)
- jiffies = hs1.pidcpu.sys - hs0.pidcpu.sys % (2 ** 32)
+ jiffies = hs1.pidcpu.sys - hs0.pidcpu.sys % JIFFIES_BOUND
stats['cpuSysVdsmd'] = (jiffies / interval)
- jiffies = (hs1.totcpu.user - hs0.totcpu.user) % (2 ** 32)
+ jiffies = (hs1.totcpu.user - hs0.totcpu.user) % JIFFIES_BOUND
stats['cpuUser'] = jiffies / interval / self._ncpus
- jiffies = (hs1.totcpu.sys - hs0.totcpu.sys) % (2 ** 32)
+ jiffies = (hs1.totcpu.sys - hs0.totcpu.sys) % JIFFIES_BOUND
stats['cpuSys'] = jiffies / interval / self._ncpus
stats['cpuIdle'] = max(0.0,
100.0 - stats['cpuUser'] - stats['cpuSys'])
@@ -479,9 +482,9 @@
ifrate = ifrate or 1000
Mbps2Bps = (10 ** 6) / 8
thisRx = (hs1.interfaces[ifid].rx - hs0.interfaces[ifid].rx) % \
- (2 ** 32)
+ NETSTATS_BOUND
thisTx = (hs1.interfaces[ifid].tx - hs0.interfaces[ifid].tx) % \
- (2 ** 32)
+ NETSTATS_BOUND
rxRate = 100.0 * thisRx / interval / ifrate / Mbps2Bps
txRate = 100.0 * thisTx / interval / ifrate / Mbps2Bps
if txRate > 100 or rxRate > 100:
diff --git a/vdsm/vm.py b/vdsm/vm.py
index aae8bd6..07fb581 100644
--- a/vdsm/vm.py
+++ b/vdsm/vm.py
@@ -606,11 +606,11 @@
ifRxBytes = (100.0 *
(eInfo[nic.name][0] - sInfo[nic.name][0]) %
- 2 ** 32 /
+ sampling.NETSTATS_BOUND /
sampleInterval / ifSpeed / self.MBPS_TO_BPS)
ifTxBytes = (100.0 *
(eInfo[nic.name][4] - sInfo[nic.name][4]) %
- 2 ** 32 /
+ sampling.NETSTATS_BOUND /
sampleInterval / ifSpeed / self.MBPS_TO_BPS)
ifStats['rxRate'] = '%.1f' % ifRxBytes
--
To view, visit http://gerrit.ovirt.org/24194
To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I706000106c3bc31edf8541c980bce1f49464ebf8
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Dan Kenigsberg <danken(a)redhat.com>
Francesco Romani has uploaded a new change for review.
Change subject: vm: per-attribute monitor response check
......................................................................
vm: per-attribute monitor response check
the responsiveness of the monitor is reported through a single
instance variable, which is updated after each libvirt call.
If one single call goes in timeout, while the others go well,
due to interplay between timeouts and polling interval, it can
happen that reported status bounces back and forth between 'Up'
and 'Not Responding'.
This patch address this behaviour by keeping track of the
timeouts per-dom-attribute instead of per-dom, and reports
the monitor as not respondig if, and until, at least one had
a timeout in the last call.
Change-Id: I32a98d34cde91fa9dc3d07f03c47a5f2f22da620
Signed-off-by: Francesco Romani <fromani(a)redhat.com>
---
M vdsm/vm.py
1 file changed, 12 insertions(+), 7 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/38/23138/1
diff --git a/vdsm/vm.py b/vdsm/vm.py
index 55683f4..9d7ca03 100644
--- a/vdsm/vm.py
+++ b/vdsm/vm.py
@@ -853,11 +853,11 @@
def f(*args, **kwargs):
try:
ret = attr(*args, **kwargs)
- self._cb(False)
+ self._cb(False, name)
return ret
except libvirt.libvirtError as e:
if e.get_error_code() == libvirt.VIR_ERR_OPERATION_TIMEOUT:
- self._cb(True)
+ self._cb(True, name)
toe = TimeoutError(e.get_error_message())
toe.err = e.err
raise toe
@@ -1938,6 +1938,9 @@
if (self.arch not in ['ppc64', 'x86_64']):
raise RuntimeError('Unsupported architecture: %s' % self.arch)
+
+ self._attrTimeoutLock = threading.Lock()
+ self._attrTimeoutExperienced = {} # will keep track of timeout data
def _get_lastStatus(self):
PAUSED_STATES = ('Powering down', 'RebootInProgress', 'Up')
@@ -3631,11 +3634,13 @@
def _monitorDependentInit(self, timeout=None):
self.log.warning('unsupported by libvirt vm')
- def _timeoutExperienced(self, timeout):
- if timeout:
- self._monitorResponse = -1
- else:
- self._monitorResponse = 0
+ def _timeoutExperienced(self, timeout, attrName):
+ with self._attrTimeoutLock:
+ self._attrTimeoutExperienced[attrName] = timeout
+ if any(self._attrTimeoutExperienced.itervalues()):
+ self._monitorResponse = -1
+ else:
+ self._monitorResponse = 0
def _waitForIncomingMigrationFinish(self):
if 'restoreState' in self.conf:
--
To view, visit http://gerrit.ovirt.org/23138
To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I32a98d34cde91fa9dc3d07f03c47a5f2f22da620
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Francesco Romani <fromani(a)redhat.com>
Francesco Romani has uploaded a new change for review.
Change subject: vm: make new timekeeping revertable
......................................................................
vm: make new timekeeping revertable
The commit Icb0752e54a4cb9ff609b8ddfaf5c8fe2ed5b9e72
implemented the new timekeeping options recommended
by QEMU developers.
In order to maximize the backward compatibility and
to deal with possible regression with old guests,
this patch makes the new timekeeping settings revertable
by exposing a new configuration variable.
The default is enabled because those settings, being
recommended, are supposed to be safe.
Change-Id: I471be44454dcae6e73c46a473eb1eee19a5275ab
Signed-off-by: Francesco Romani <fromani(a)redhat.com>
---
M lib/vdsm/config.py.in
M vdsm/vm.py
2 files changed, 8 insertions(+), 3 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/43/24443/1
diff --git a/lib/vdsm/config.py.in b/lib/vdsm/config.py.in
index 01590a1..075face 100644
--- a/lib/vdsm/config.py.in
+++ b/lib/vdsm/config.py.in
@@ -192,6 +192,9 @@
('transient_disks_repository', '@VDSMLIBDIR@/transient',
'Local path to the transient disks repository.'),
+
+ ('new_timekeeping_enable', 'true',
+ 'Enable the new recomended QEMU time keeping settings'),
]),
# Section: [ksm]
diff --git a/vdsm/vm.py b/vdsm/vm.py
index 9371049..d3baca8 100644
--- a/vdsm/vm.py
+++ b/vdsm/vm.py
@@ -970,10 +970,12 @@
m = XMLElement('clock', offset='variable',
adjustment=str(self.conf.get('timeOffset', 0)))
m.appendChildWithArgs('timer', name='rtc', tickpolicy='catchup')
- m.appendChildWithArgs('timer', name='pit', tickpolicy='delay')
- if self.arch == caps.Architecture.X86_64:
- m.appendChildWithArgs('timer', name='hpet', present='no')
+ if config.getboolean('vars', 'new_timekeeping_enable'):
+ m.appendChildWithArgs('timer', name='pit', tickpolicy='delay')
+
+ if self.arch == caps.Architecture.X86_64:
+ m.appendChildWithArgs('timer', name='hpet', present='no')
self.dom.appendChild(m)
--
To view, visit http://gerrit.ovirt.org/24443
To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I471be44454dcae6e73c46a473eb1eee19a5275ab
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Francesco Romani <fromani(a)redhat.com>
Adam Litke has uploaded a new change for review.
Change subject: virt: Add a function to execute a qemu monitor command
......................................................................
virt: Add a function to execute a qemu monitor command
Sometimes it is necessary to execute a monitor command directly (eg. if
libvirt does not yet support a certain qemu feature). Libvirt will mark
the VM as 'tainted' once this is done for the first time. The reason is
that making modifications to a VM using this facility could cause a
split-brain situation. It is generally harmless to execute monitor
commands which act in a read-only fashion. This function is also useful
as a utility for developers who are working on new features.
Change-Id: I3e9e07ba0c236c0938b129ae90af825f18f0e644
Signed-off-by: Adam Litke <alitke(a)redhat.com>
---
M vdsm/virt/vm.py
1 file changed, 11 insertions(+), 0 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/50/27950/1
diff --git a/vdsm/virt/vm.py b/vdsm/virt/vm.py
index 9cc9958..de27a53 100644
--- a/vdsm/virt/vm.py
+++ b/vdsm/virt/vm.py
@@ -30,9 +30,11 @@
import threading
import time
import xml.dom.minidom
+import json
# 3rd party libs imports
import libvirt
+import libvirt_qemu
# vdsm imports
from vdsm import constants
@@ -4948,6 +4950,15 @@
if 'tlsPort' in dev:
self.conf['displaySecurePort'] = dev['tlsPort']
+ def _internalQMPMonitorCommand(self, cmdDict):
+ """
+ Execute a qemu monitor command directly.
+ WARNING: This will cause libvirt to mark the VM as tainted.
+ """
+ jsonCmd = json.dumps(cmdDict)
+ ret = libvirt_qemu.qemuMonitorCommand(self._dom, jsonCmd, 0)
+ return json.loads(ret)
+
def _getNetworkIp(network):
try:
--
To view, visit http://gerrit.ovirt.org/27950
To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I3e9e07ba0c236c0938b129ae90af825f18f0e644
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Adam Litke <alitke(a)redhat.com>