Change in vdsm[master]: clusterlock: Remove uneeded workaround
by Nir Soffer
Nir Soffer has uploaded a new change for review.
Change subject: clusterlock: Remove uneeded workaround
......................................................................
clusterlock: Remove uneeded workaround
Sanlock version XXXX had a off-by-one bug when calling get_hosts with a
host id, returning info for the next host. This bug is fixed in version
XXX. Now we can use the hostId parameter, making the call more efficient
and simpligying clusterlock code.
Change-Id: Ide75e749fbc2916540c2b526b78fedc247b5c6f9
Signed-off-by: Nir Soffer <nsoffer(a)redhat.com>
---
M vdsm.spec.in
M vdsm/storage/clusterlock.py
2 files changed, 5 insertions(+), 16 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/62/31162/1
diff --git a/vdsm.spec.in b/vdsm.spec.in
index 5ba6fc6..fc39eb9 100644
--- a/vdsm.spec.in
+++ b/vdsm.spec.in
@@ -167,7 +167,7 @@
Requires: iscsi-initiator-utils >= 6.2.0.873-21
%endif
-Requires: sanlock >= 2.8, sanlock-python
+Requires: sanlock >= XXX, sanlock-python
%if 0%{?rhel}
Requires: python-ethtool >= 0.6-3
diff --git a/vdsm/storage/clusterlock.py b/vdsm/storage/clusterlock.py
index 24a5d81..17cdd53 100644
--- a/vdsm/storage/clusterlock.py
+++ b/vdsm/storage/clusterlock.py
@@ -265,26 +265,15 @@
return False
def getHostStatus(self, hostId):
- # Note: get_hosts has off-by-one bug when asking for particular host
- # id, so get all hosts info and filter.
- # See https://bugzilla.redhat.com/1111210
try:
- hosts = sanlock.get_hosts(self._sdUUID)
+ hosts = sanlock.get_hosts(self._sdUUID, hostId)
except sanlock.SanlockException as e:
self.log.debug("Unable to get host %d status in lockspace %s: %s",
hostId, self._sdUUID, e)
return HOST_STATUS_UNAVAILABLE
-
- for info in hosts:
- if info['host_id'] == hostId:
- status = info['flags']
- return self.STATUS_NAME[status]
-
- # get_hosts with host_id=0 returns only hosts with timestamp != 0,
- # which means that no host is using this host id now. If there a
- # timestamp, sanlock will return HOST_UNKNOWN and then HOST_LIVE or
- # HOST_FAIL.
- return HOST_STATUS_FREE
+ else:
+ status = hosts[0]['flags']
+ return self.STATUS_NAME[status]
# The hostId parameter is maintained here only for compatibility with
# ClusterLock. We could consider to remove it in the future but keeping it
--
To view, visit http://gerrit.ovirt.org/31162
To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ide75e749fbc2916540c2b526b78fedc247b5c6f9
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Nir Soffer <nsoffer(a)redhat.com>
7 years, 10 months
Change in vdsm[master]: vmchannel: performance enhancment. changing thread name
by emarcian@redhat.com
Eldad Marciano has uploaded a new change for review.
Change subject: vmchannel: performance enhancment. changing thread name
......................................................................
vmchannel: performance enhancment.
changing thread name
Change-Id: I8489ec3a65ed9025eb91dfc2f408450b02fd7e0c
Signed-off-by: emarcian <emarcian(a)redhat.com>
---
M vdsm/virt/vmchannels.py
1 file changed, 2 insertions(+), 2 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/42/36342/1
diff --git a/vdsm/virt/vmchannels.py b/vdsm/virt/vmchannels.py
index c143162..e0b305e 100644
--- a/vdsm/virt/vmchannels.py
+++ b/vdsm/virt/vmchannels.py
@@ -167,8 +167,8 @@
events = NoIntrPoll(self._epoll.poll, 1)
if events:
for (fileno, event) in events:
- _wait_for_events = threading.Thread(name='handle_events-%s' % fileno, target=self._handle_event, args=(fileno, event))
- _wait_for_events.start()
+ _handle_event = threading.Thread(name='handle_events-%s' % fileno, target=self._handle_event, args=(fileno, event))
+ _handle_event.start()
else:
self._update_channels()
if (self._timeout is not None) and (self._timeout > 0):
--
To view, visit http://gerrit.ovirt.org/36342
To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I8489ec3a65ed9025eb91dfc2f408450b02fd7e0c
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Eldad Marciano <emarcian(a)redhat.com>
7 years, 10 months
Change in vdsm[master]: tests: add tests for sampling.SampleVMs
by fromani@redhat.com
Francesco Romani has uploaded a new change for review.
Change subject: tests: add tests for sampling.SampleVMs
......................................................................
tests: add tests for sampling.SampleVMs
The SampleVMs class is responisble for handling bulk stats sampling,
and includes all the logic to deal with blocked domains.
However, this logic has the time among its variables, and it is
built on quite some assumptions, like running on an executor and so on.
All those factors make it difficult to test the bare logic without
some (minor) cheating and without some significant faking.
However, the nasty spots should be the ones more tested, not less
tested, so this patch adds the initial batch of unit tests for the
sampling logic.
Change-Id: Id66dbd420ca29d08ae4063dc83b858be34b8940f
Signed-off-by: Francesco Romani <fromani(a)redhat.com>
---
M tests/Makefile.am
A tests/bulkSamplingTests.py
2 files changed, 213 insertions(+), 0 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/53/40053/1
diff --git a/tests/Makefile.am b/tests/Makefile.am
index 7fa1c09..4f984dd 100644
--- a/tests/Makefile.am
+++ b/tests/Makefile.am
@@ -26,6 +26,7 @@
alignmentScanTests.py \
blocksdTests.py \
bridgeTests.py \
+ bulkSamplingTests.py \
cPopenTests.py \
capsTests.py \
clientifTests.py \
diff --git a/tests/bulkSamplingTests.py b/tests/bulkSamplingTests.py
new file mode 100644
index 0000000..5c536ff
--- /dev/null
+++ b/tests/bulkSamplingTests.py
@@ -0,0 +1,212 @@
+#
+# Copyright 2015 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+#
+# Refer to the README and COPYING files for full details of the license
+#
+
+import virt.sampling as sampling
+
+from testlib import VdsmTestCase as TestCaseBase
+
+
+# these tests are complex and fragile. In order to maximize
+# readability and robustness, intentionally includes tailor-made
+# minimal Fakes to support those tests.
+
+
+class FakeStatsCache(object):
+ def __init__(self):
+ self.data = []
+ self._clock = 0 # private only for API clash
+
+ def clock(self):
+ return self._clock
+
+ def set_clock(self, value):
+ self._clock = value
+
+ def put(self, bulk_stats, timestamp):
+ self.data.append((bulk_stats, timestamp))
+
+
+class FakeDomain(object):
+ def __init__(self, name):
+ self._name = name
+ self._dom = self # yep, this is an ugly hack
+
+ def UUIDString(self):
+ # yes this is cheating
+ return self._name
+
+
+class FakeVM(object):
+ def __init__(self, vmid):
+ self.id = vmid
+ self._dom = FakeDomain(vmid)
+ self.ready_for_commands = True
+
+ def isDomainReadyForCommands(self):
+ return self.ready_for_commands
+
+
+class CollectMode(object):
+ NONE = 0
+ SLOW = 1
+ FAST = 2
+
+
+class FakeConnection(object):
+ def __init__(self, vms):
+ self.vms = vms
+ # testable fields
+ self.executions = []
+
+ def getVMs(self):
+ return self.vms
+
+ def getAllDomainStats(self, flags=0):
+ self.executions.append(CollectMode.FAST)
+ return [(vm._dom, {'vmid': vm._dom.UUIDString()})
+ for vm in self.vms.values()]
+
+ def domainListGetStats(self, doms, flags=0):
+ self.executions.append(CollectMode.SLOW)
+ return [(dom, {'vmid': dom.UUIDString()})
+ for dom in doms
+ if dom.UUIDString() in self.vms]
+
+
+class SampleVMsTests(TestCaseBase):
+
+ def test_collect_fast_path_as_default(self):
+ cache = FakeStatsCache()
+ vms = {
+ '1': FakeVM('1'),
+ '2': FakeVM('2')
+ }
+ conn = FakeConnection(vms)
+
+ sampler = sampling.SampleVMs(conn, conn.getVMs, cache)
+ sampler()
+
+ self.assertEqual(conn.executions[0], CollectMode.FAST)
+ self.assertVmsInBulkStats(vms.keys(), cache.data[0][0])
+
+ def test_collect_slow_path_after_blocked(self):
+ cache = FakeStatsCache()
+ vms = {
+ '1': FakeVM('1'),
+ '2': FakeVM('2')
+ }
+ conn = FakeConnection(vms)
+
+ sampler = sampling.SampleVMs(conn, conn.getVMs, cache)
+ # this is cheating, but setting up proper blocking is costly
+ # and fragile.
+ sampler._sampling = True
+ sampler()
+
+ self.assertEqual(conn.executions[0], CollectMode.SLOW)
+ # SampleVMs doesn't care in the slow path if all registed
+ # VMs are actually responsive or not. Quite the opposite, it is
+ # good sign if all VMs are now responsive, it means we can
+ # restart using getAllDomainStats on the next cycle.
+ self.assertVmsInBulkStats(vms.keys(), cache.data[0][0])
+
+ def test_collect_unresponsive_vm(self):
+ cache = FakeStatsCache()
+ vms = {
+ '1': FakeVM('1'),
+ '2': FakeVM('2'),
+ '3': FakeVM('3')
+ }
+ vms['2'].ready_for_commands = False
+ conn = FakeConnection(vms)
+
+ sampler = sampling.SampleVMs(conn, conn.getVMs, cache)
+ # this is cheating, but setting up proper blocking is costly
+ # and fragile.
+ sampler._sampling = True
+ sampler()
+
+ self.assertEqual(conn.executions[0], CollectMode.SLOW)
+ self.assertVmsInBulkStats(('1', '3'), cache.data[0][0])
+ self.assertVmStuck(sampler, '2')
+
+ def test_slow_collect_while_unresponsive_vm(self):
+ cache = FakeStatsCache()
+ vms = {
+ '1': FakeVM('1'),
+ '2': FakeVM('2'),
+ '3': FakeVM('3')
+ }
+ vms['2'].ready_for_commands = False
+ conn = FakeConnection(vms)
+
+ sampler = sampling.SampleVMs(conn, conn.getVMs, cache)
+ sampler._sampling = True # cheat to bootstrap slow collection
+ sampler()
+ sampler()
+ sampler()
+
+ self.assertSamplingExecutions(
+ conn.executions,
+ (CollectMode.SLOW, CollectMode.SLOW, CollectMode.SLOW))
+ for datum in cache.data:
+ self.assertVmsInBulkStats(('1', '3'), datum[0])
+ self.assertVmStuck(sampler, '2')
+
+ def test_fast_collect_once_vm_responsive(self):
+ cache = FakeStatsCache()
+ vms = {
+ '1': FakeVM('1'),
+ '2': FakeVM('2'),
+ '3': FakeVM('3')
+ }
+ vms['2'].ready_for_commands = False
+ conn = FakeConnection(vms)
+
+ sampler = sampling.SampleVMs(conn, conn.getVMs, cache)
+ sampler._sampling = True # cheat to bootstrap slow collection
+ sampler()
+ sampler()
+ vms['2'].ready_for_commands = True
+ sampler._skip_doms.clear() # cheating, again
+ sampler()
+
+ self.assertSamplingExecutions(
+ conn.executions,
+ (CollectMode.SLOW, CollectMode.SLOW, CollectMode.FAST))
+ for datum in cache.data[:-1]:
+ self.assertVmsInBulkStats(('1', '3'), datum[0])
+ self.assertVmsInBulkStats(vms.keys(), cache.data[-1][0])
+
+ def assertSamplingExecutions(self, executions, expected):
+ for done, exp in zip(executions[-len(expected):], expected):
+ self.assertEqual(done, exp)
+
+ def assertVmsInBulkStats(self, vm_ids, bulk_stats):
+ """
+ checks only for the presence of stats. Stats data is fake,
+ so ignore it as meaningless.
+ """
+ self.assertEqual(len(vm_ids), len(bulk_stats))
+ for vm_id in vm_ids:
+ self.assertIn(vm_id, bulk_stats)
+
+ def assertVmStuck(self, sampler, vm_id):
+ self.assertTrue(sampler._skip_doms.get(vm_id))
--
To view, visit https://gerrit.ovirt.org/40053
To unsubscribe, visit https://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Id66dbd420ca29d08ae4063dc83b858be34b8940f
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Francesco Romani <fromani(a)redhat.com>
7 years, 10 months
Change in vdsm[master]: v2v: janitorial: use the response module
by fromani@redhat.com
Francesco Romani has uploaded a new change for review.
Change subject: v2v: janitorial: use the response module
......................................................................
v2v: janitorial: use the response module
This patch makes v2v.py use the response module.
Change-Id: I678e764a2895e7c34df6852759b1d43baeb687a1
Signed-off-by: Francesco Romani <fromani(a)redhat.com>
---
M vdsm/v2v.py
1 file changed, 4 insertions(+), 5 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/47/38447/1
diff --git a/vdsm/v2v.py b/vdsm/v2v.py
index 4fbecf0..18f59d9 100644
--- a/vdsm/v2v.py
+++ b/vdsm/v2v.py
@@ -22,8 +22,8 @@
import libvirt
-from vdsm.define import errCode, doneCode
from vdsm import libvirtconnection
+from vdsm import response
import caps
@@ -39,7 +39,7 @@
def get_external_vms(uri, username, password):
if not supported():
- return errCode["noimpl"]
+ return response.error("noimpl")
try:
conn = libvirtconnection.open_connection(uri=uri,
@@ -47,8 +47,7 @@
passwd=password)
except libvirt.libvirtError as e:
logging.error('error connection to hypervisor: %r', e.message)
- return {'status': {'code': errCode['V2VConnection']['status']['code'],
- 'message': e.message}}
+ return response.error('V2VConnection', e.message)
with closing(conn):
vms = []
@@ -67,7 +66,7 @@
for disk in params['disks']:
_add_disk_info(conn, disk)
vms.append(params)
- return {'status': doneCode, 'vmList': vms}
+ return response.success(vmList=vms)
def _mem_to_mib(size, unit):
--
To view, visit https://gerrit.ovirt.org/38447
To unsubscribe, visit https://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I678e764a2895e7c34df6852759b1d43baeb687a1
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Francesco Romani <fromani(a)redhat.com>
7 years, 10 months
Change in vdsm[master]: WIP
by fromani@redhat.com
Francesco Romani has uploaded a new change for review.
Change subject: WIP
......................................................................
WIP
Change-Id: I39c2e6e4bca286a513992b7231f1356e8dd871a1
Signed-off-by: Francesco Romani <fromani(a)redhat.com>
---
M vdsm/virt/sampling.py
1 file changed, 47 insertions(+), 37 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/31/40431/1
diff --git a/vdsm/virt/sampling.py b/vdsm/virt/sampling.py
index 45afdfd..eec5ae5 100644
--- a/vdsm/virt/sampling.py
+++ b/vdsm/virt/sampling.py
@@ -551,51 +551,30 @@
return doms
-class HostStatsThread(threading.Thread):
- """
- A thread that periodically samples host statistics.
- """
- _CONNLOG = logging.getLogger('connectivity')
+class HostSampler(object):
+ _connlog = logging.getlogger('connectivity')
def __init__(self, log):
- self.startTime = time.time()
-
- threading.Thread.__init__(self)
- self.daemon = True
self._log = log
- self._stopEvent = threading.Event()
+ self._start_time = time.time()
self._samples = SampleWindow(size=5)
self._pid = os.getpid()
self._ncpus = max(os.sysconf('SC_NPROCESSORS_ONLN'), 1)
- self._sampleInterval = \
- config.getint('vars', 'host_sample_stats_interval')
+ self._sample_interval = config.getint(
+ 'vars', 'host_sample_stats_interval')
- def stop(self):
- self._stopEvent.set()
-
- def run(self):
- try:
- # wait a bit before starting to sample
- time.sleep(self._sampleInterval)
- while not self._stopEvent.isSet():
- try:
- sample = HostSample(self._pid)
- self._samples.append(sample)
- prev, last = self._samples.last_pair()
- if prev is None:
- self._CONNLOG.debug('%s', sample.to_connlog())
- else:
- diff = sample.connlog_diff(prev)
- if diff:
- self._CONNLOG.debug('%s', diff)
- except TimeoutError:
- self._log.exception("Timeout while sampling stats")
- self._stopEvent.wait(self._sampleInterval)
- except:
- if not self._stopEvent.isSet():
- self._log.exception("Error while sampling stats")
+ def __call__(self):
+ sample = HostSample(self._pid)
+ self._samples.append(sample)
+ prev, last = self._samples.last_pair()
+ if prev is None:
+ self._CONNLOG.debug('%s', sample.to_connlog())
+ else:
+ diff = sample.connlog_diff(prev)
+ if diff:
+ self._CONNLOG.debug('%s', diff)
@utils.memoized
def _boot_time(self):
@@ -625,7 +604,7 @@
hs0, hs1, _ = self._samples.stats()
interval = hs1.timestamp - hs0.timestamp
- stats.update(self._get_interfaces_stats(hs0, hs1, interval))
+ stats.update(_get_interfaces_stats(hs0, hs1, interval))
jiffies = (hs1.pidcpu.user - hs0.pidcpu.user) % (2 ** 32)
stats['cpuUserVdsmd'] = jiffies / interval
@@ -655,6 +634,37 @@
return stats
+
+class HostStatsThread(threading.Thread):
+ """
+ A thread that periodically samples host statistics.
+ """
+ _connlog = logging.getlogger('connectivity')
+
+ def __init__(self, log):
+ threading.Thread.__init__(self)
+ self._sampler = HostSampler(log)
+ self.daemon = True
+ self._stopEvent = threading.Event()
+
+ def stop(self):
+ self._stopEvent.set()
+
+ def run(self):
+ try:
+ # wait a bit before starting to sample
+ time.sleep(self._sampleInterval)
+ while not self._stopEvent.isSet():
+ try:
+ self._sampler()
+ except TimeoutError:
+ self._log.exception("Timeout while sampling stats")
+ self._stopEvent.wait(self._sampleInterval)
+ except:
+ if not self._stopEvent.isSet():
+ self._log.exception("Error while sampling stats")
+
+
def _get_cpu_core_stats(hs0, hs1):
"""
:returns: a dict that with the following formats:
--
To view, visit https://gerrit.ovirt.org/40431
To unsubscribe, visit https://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I39c2e6e4bca286a513992b7231f1356e8dd871a1
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Francesco Romani <fromani(a)redhat.com>
7 years, 10 months
Change in vdsm[master]: WIP: rename stats cache to vm stats cache
by fromani@redhat.com
Francesco Romani has uploaded a new change for review.
Change subject: WIP: rename stats cache to vm stats cache
......................................................................
WIP: rename stats cache to vm stats cache
Future patch want to modernize HostStats* and
port it to periodic Operations infrastructure.
To make room for this change and reduce
ambiguity, re-introduced the ubiquitous
vm* prefix.
No changes besides naming.
Change-Id: If38c49f686dfc2bc0994d444ff24c7736f2e951b
Signed-off-by: Francesco Romani <fromani(a)redhat.com>
---
M tests/samplingTests.py
M tests/vmfakelib.py
M vdsm/virt/periodic.py
M vdsm/virt/sampling.py
M vdsm/virt/vm.py
5 files changed, 16 insertions(+), 16 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/71/40371/1
diff --git a/tests/samplingTests.py b/tests/samplingTests.py
index 2624dd7..1f2a65c 100644
--- a/tests/samplingTests.py
+++ b/tests/samplingTests.py
@@ -250,13 +250,13 @@
self.assertTrue(self._sampleCount >= self.STOP_SAMPLE)
-class StatsCacheTests(TestCaseBase):
+class VmStatsCacheTests(TestCaseBase):
FAKE_CLOCK_STEP = 1
def setUp(self):
self.clock = 0
- self.cache = sampling.StatsCache(clock=self.fake_monotonic_time)
+ self.cache = sampling.VmStatsCache(clock=self.fake_monotonic_time)
def fake_monotonic_time(self):
self.clock += self.FAKE_CLOCK_STEP
diff --git a/tests/vmfakelib.py b/tests/vmfakelib.py
index d8ce6ce..544eb5a 100644
--- a/tests/vmfakelib.py
+++ b/tests/vmfakelib.py
@@ -201,7 +201,7 @@
fake._guestCpuRunning = runCpu
if status is not None:
fake._lastStatus = status
- sampling.stats_cache.add(fake.id)
+ sampling.vm_stats_cache.add(fake.id)
yield fake
diff --git a/vdsm/virt/periodic.py b/vdsm/virt/periodic.py
index a3f8cc6..00615c3 100644
--- a/vdsm/virt/periodic.py
+++ b/vdsm/virt/periodic.py
@@ -94,7 +94,7 @@
sampling.VMBulkSampler(
libvirtconnection.get(cif),
cif.getVMs,
- sampling.stats_cache),
+ sampling.vm_stats_cache),
config.getint('vars', 'vm_sample_interval')),
# we do this only until we get high water mark notifications
diff --git a/vdsm/virt/sampling.py b/vdsm/virt/sampling.py
index 6804406..89f109b 100644
--- a/vdsm/virt/sampling.py
+++ b/vdsm/virt/sampling.py
@@ -375,7 +375,7 @@
EMPTY_SAMPLE = StatsSample(None, None, None, None)
-class StatsCache(object):
+class VmStatsCache(object):
"""
Cache for bulk stats samples.
Provide facilities to retrieve per-vm samples, and the glue code to deal
@@ -391,7 +391,7 @@
VDSM has countermeasures for those cases. Stuck threads are replaced,
thanks to Executor. But still, before to be destroyed, a replaced
- thread can mistakenly try to add a sample to a StatsCache.
+ thread can mistakenly try to add a sample to a VmStatsCache.
Because of worker thread replacement, that sample from stuck thread
can be stale.
@@ -402,7 +402,7 @@
between a well behaving call and an unblocked stuck call.
"""
- _log = logging.getLogger("sampling.StatsCache")
+ _log = logging.getLogger("sampling.VmStatsCache")
def __init__(self, clock=utils.monotonic_time):
self._clock = clock
@@ -478,7 +478,7 @@
self._vm_last_timestamp[vmid] = monotonic_ts
-stats_cache = StatsCache()
+vm_stats_cache = VmStatsCache()
# this value can be tricky to tune.
@@ -492,18 +492,18 @@
class VMBulkSampler(object):
- def __init__(self, conn, get_vms, stats_cache,
+ def __init__(self, conn, get_vms, vm_stats_cache,
stats_flags=0, timeout=_TIMEOUT):
self._conn = conn
self._get_vms = get_vms
- self._stats_cache = stats_cache
+ self._vm_stats_cache = vm_stats_cache
self._stats_flags = stats_flags
self._skip_doms = ExpiringCache(timeout)
self._sampling = Stage()
self._log = logging.getLogger("sampling.VMBulkSampler")
def __call__(self):
- timestamp = self._stats_cache.clock()
+ timestamp = self._vm_stats_cache.clock()
# we are deep in the hot path. bool(ExpiringCache)
# *is* costly so we should avoid it if we can.
fast_path = (self._sampling.empty and not self._skip_doms)
@@ -514,7 +514,7 @@
# If everything's ok, we can skip all the costly checks.
bulk_stats = self._conn.getAllDomainStats(
self._stats_flags)
- self._stats_cache.put(_translate(bulk_stats), timestamp)
+ self._vm_stats_cache.put(_translate(bulk_stats), timestamp)
else:
# A previous call got stuck, or not every domain
# has properly recovered. Thus we must whitelist domains.
@@ -523,7 +523,7 @@
if doms:
bulk_stats = self._conn.domainListGetStats(
doms, self._stats_flags)
- self._stats_cache.put(_translate(bulk_stats), timestamp)
+ self._vm_stats_cache.put(_translate(bulk_stats), timestamp)
def _get_responsive_doms(self):
vms = self._get_vms()
diff --git a/vdsm/virt/vm.py b/vdsm/virt/vm.py
index dcc215b..2b7e3eb 100644
--- a/vdsm/virt/vm.py
+++ b/vdsm/virt/vm.py
@@ -1308,7 +1308,7 @@
decStats = {}
try:
- vm_sample = sampling.stats_cache.get(self.id)
+ vm_sample = sampling.vm_stats_cache.get(self.id)
decStats = vmstats.produce(self,
vm_sample.first_value,
vm_sample.last_value,
@@ -1675,7 +1675,7 @@
nic.name)
self._guestEventTime = self._startTime
- sampling.stats_cache.add(self.id)
+ sampling.vm_stats_cache.add(self.id)
try:
self.guestAgent.connect()
except Exception:
@@ -3454,7 +3454,7 @@
self.lastStatus = vmstatus.POWERING_DOWN
# Terminate the VM's creation thread.
self._incomingMigrationFinished.set()
- sampling.stats_cache.remove(self.id)
+ sampling.vm_stats_cache.remove(self.id)
self.guestAgent.stop()
if self._dom:
result = self._destroyVmGraceful()
--
To view, visit https://gerrit.ovirt.org/40371
To unsubscribe, visit https://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: If38c49f686dfc2bc0994d444ff24c7736f2e951b
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Francesco Romani <fromani(a)redhat.com>
7 years, 10 months
Change in vdsm[master]: sampling: introduce expensive checks
by fromani@redhat.com
Francesco Romani has uploaded a new change for review.
Change subject: sampling: introduce expensive checks
......................................................................
sampling: introduce expensive checks
To improve troubleshooting, introduce more
expensive sanity checks, with a default-off
tunable to enable them.
These sanity checks will make no attempt to recover,
will only add logs to report possible issues.
Change-Id: I7b3bb707dd60de194eedfc2e3de1efbf05574ff7
Signed-off-by: Francesco Romani <fromani(a)redhat.com>
---
M lib/vdsm/config.py.in
M vdsm/virt/sampling.py
2 files changed, 19 insertions(+), 2 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/91/40391/1
diff --git a/lib/vdsm/config.py.in b/lib/vdsm/config.py.in
index e213839..b60e836 100644
--- a/lib/vdsm/config.py.in
+++ b/lib/vdsm/config.py.in
@@ -325,6 +325,10 @@
('periodic_task_per_worker', '100',
'Max number of tasks which can be queued on workers.'
' This is for internal usage and may change without warning'),
+
+ ('expensive_checks', 'false',
+ 'Perform additional sanity checks and does additional debug logs'
+ 'which are expensive performance-wise'),
]),
# Section: [devel]
diff --git a/vdsm/virt/sampling.py b/vdsm/virt/sampling.py
index b0628a4..3c1510f 100644
--- a/vdsm/virt/sampling.py
+++ b/vdsm/virt/sampling.py
@@ -504,6 +504,7 @@
self._skip_doms = ExpiringCache(timeout)
self._sampling = Stage()
self._log = logging.getLogger("sampling.VMBulkSampler")
+ self._extra_check = config.getboolean('sampling', 'expensive_checks')
def __call__(self):
timestamp = self._vm_stats_cache.clock()
@@ -517,7 +518,6 @@
# If everything's ok, we can skip all the costly checks.
bulk_stats = self._conn.getAllDomainStats(
self._stats_flags)
- self._vm_stats_cache.put(_translate(bulk_stats), timestamp)
else:
# A previous call got stuck, or not every domain
# has properly recovered. Thus we must whitelist domains.
@@ -526,7 +526,12 @@
if doms:
bulk_stats = self._conn.domainListGetStats(
doms, self._stats_flags)
- self._vm_stats_cache.put(_translate(bulk_stats), timestamp)
+ else:
+ bulk_stats = []
+
+ stats = _translate(bulk_stats)
+ self._vm_stats_cache.put(stats, timestamp)
+ self._log_missing_vms(self._get_vms(), stats, timestamp)
def _get_responsive_doms(self):
vms = self._get_vms()
@@ -541,6 +546,14 @@
doms.append(vm_obj._dom._dom)
return doms
+ def _log_missing_vms(self, expected_vms, retrieved_stats, timestamp):
+ # costly check. add another layer of check before to embark on it.
+ if self._extra_check:
+ for vm_id in expected_vms:
+ if vm_id not in retrieved_stats:
+ self._log.debug('VM %s not updated in bulk at %f',
+ vm_id, timestamp)
+
class HostStatsThread(threading.Thread):
"""
--
To view, visit https://gerrit.ovirt.org/40391
To unsubscribe, visit https://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I7b3bb707dd60de194eedfc2e3de1efbf05574ff7
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Francesco Romani <fromani(a)redhat.com>
7 years, 10 months
Change in vdsm[master]: sampling: simplify flows
by fromani@redhat.com
Francesco Romani has uploaded a new change for review.
Change subject: sampling: simplify flows
......................................................................
sampling: simplify flows
Executor already has a fallback net for uncaught exceptions.
Failures in SampleVMs.__call__ are expected to be sporadic,
and in these cases we actually want to be as much noisy as we
can.
This patch removes redundant code in SampleVMs.__call__,
to make the flow less convoluted.
Change-Id: Ide103d0ed9a694cc9ddd9b0b382e2d81a1bd48c0
Signed-off-by: Francesco Romani <fromani(a)redhat.com>
---
M vdsm/virt/sampling.py
1 file changed, 17 insertions(+), 21 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/26/40326/1
diff --git a/vdsm/virt/sampling.py b/vdsm/virt/sampling.py
index 09ce2b3..5a205f7 100644
--- a/vdsm/virt/sampling.py
+++ b/vdsm/virt/sampling.py
@@ -507,27 +507,23 @@
# we are deep in the hot path. bool(ExpiringCache)
# *is* costly so we should avoid it if we can.
fast_path = (self._sampling.empty and not self._skip_doms)
- try:
- with self._sampling.performer():
- if fast_path:
- # This is expected to be the common case.
- # If everything's ok, we can skip all the costly checks.
- bulk_stats = self._conn.getAllDomainStats(
- self._stats_flags)
- else:
- # A previous call got stuck, or not every domain
- # has properly recovered. Thus we must whitelist domains.
- doms = self._get_responsive_doms()
- self._log.debug('sampling %d domains', len(doms))
- if doms:
- bulk_stats = self._conn.domainListGetStats(
- doms, self._stats_flags)
- else:
- bulk_stats = []
- except Exception:
- self._log.exception("vm sampling failed")
- else:
- self._stats_cache.put(_translate(bulk_stats), timestamp)
+
+ with self._sampling.performer():
+ if fast_path:
+ # This is expected to be the common case.
+ # If everything's ok, we can skip all the costly checks.
+ bulk_stats = self._conn.getAllDomainStats(
+ self._stats_flags)
+ self._stats_cache.put(_translate(bulk_stats), timestamp)
+ else:
+ # A previous call got stuck, or not every domain
+ # has properly recovered. Thus we must whitelist domains.
+ doms = self._get_responsive_doms()
+ self._log.debug('sampling %d domains', len(doms))
+ if doms:
+ bulk_stats = self._conn.domainListGetStats(
+ doms, self._stats_flags)
+ self._stats_cache.put(_translate(bulk_stats), timestamp)
def _get_responsive_doms(self):
vms = self._get_vms()
--
To view, visit https://gerrit.ovirt.org/40326
To unsubscribe, visit https://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ide103d0ed9a694cc9ddd9b0b382e2d81a1bd48c0
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Francesco Romani <fromani(a)redhat.com>
7 years, 10 months
Change in vdsm[master]: hsm: prepareForShutdown - operations order
by laravot@redhat.com
Liron Aravot has uploaded a new change for review.
Change subject: hsm: prepareForShutdown - operations order
......................................................................
hsm: prepareForShutdown - operations order
Currently cleanupMasterMount() is executed before
taskMgr.prepareForShutdown() and before the domain monitor is stopped,
that causes to errors during the tasks abortion (becasue of failure to
access the path) and to erros when stopping the domain monitoring
thread on that domain.
The solution introduced in that patch is to change the order of the
operations - so that the cleanupMasterMount() will be executed only
after the other operations are executed.
Change-Id: I9edd84317b08a17db80e265053edaf69582c2a51
Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1161934
Signed-off-by: Liron Aravot <laravot(a)redhat.com>
---
M vdsm/storage/hsm.py
1 file changed, 1 insertion(+), 1 deletion(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/62/36162/1
diff --git a/vdsm/storage/hsm.py b/vdsm/storage/hsm.py
index c8aaf93..8f71d71 100644
--- a/vdsm/storage/hsm.py
+++ b/vdsm/storage/hsm.py
@@ -3433,7 +3433,6 @@
# stop spm tasks if spm etc.)
try:
self._connectionMonitor.stopMonitoring()
- sp.StoragePool.cleanupMasterMount()
self.__releaseLocks()
for spUUID in self.pools:
@@ -3454,6 +3453,7 @@
exc_info=True)
self.taskMng.prepareForShutdown()
+ sp.StoragePool.cleanupMasterMount()
except:
pass
--
To view, visit http://gerrit.ovirt.org/36162
To unsubscribe, visit http://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I9edd84317b08a17db80e265053edaf69582c2a51
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Liron Aravot <laravot(a)redhat.com>
7 years, 10 months
Change in vdsm[master]: vdsm: added functionality to teardownImage when disk has bee...
by cshereme@redhat.com
Candace Sheremeta has uploaded a new change for review.
Change subject: vdsm: added functionality to teardownImage when disk has been deleted
......................................................................
vdsm: added functionality to teardownImage when disk has been deleted
added code to teardownImage in hsm.py so that teardownImage reports
"Volume does not exist" for a previously deleted disk, where it
previously simply reported "OK" - teardownImage now checks to see
if volume exists before attempting to delete it
Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1184718
Change-Id: Ia929dfdc78ccaa736033e41a77bce861d5a27769
Signed-off-by: Candace Sheremeta <cshereme(a)redhat.com>
---
M vdsm/storage/hsm.py
1 file changed, 7 insertions(+), 0 deletions(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/41/38241/1
diff --git a/vdsm/storage/hsm.py b/vdsm/storage/hsm.py
index 8c75277..45e6634 100644
--- a/vdsm/storage/hsm.py
+++ b/vdsm/storage/hsm.py
@@ -3271,6 +3271,13 @@
vars.task.getSharedLock(STORAGE, sdUUID)
dom = sdCache.produce(sdUUID)
+ allVols = dom.getAllVolumes()
+ # Filter volumes related to this image
+ imgVolumes = sd.getVolsOfImage(allVols, imgUUID).keys()
+
+ if volUUID not in imgVolumes:
+ raise se.VolumeDoesNotExist(volUUID)
+
dom.deactivateImage(imgUUID)
@public
--
To view, visit https://gerrit.ovirt.org/38241
To unsubscribe, visit https://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: Ia929dfdc78ccaa736033e41a77bce861d5a27769
Gerrit-PatchSet: 1
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Candace Sheremeta <cshereme(a)redhat.com>
7 years, 10 months