Nir Soffer has uploaded a new change for review.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
lvm: Exclude faulty devices from lvm long filter
lvm commands use filter to limit access to relevant devices. When the filter includes a faulty device, lvm commands may block for several minutes (stuck in D state). We have seen getDevicesList command stuck for up to 10 minutes because of faulty devices in the long filter.
We used to build the filter from all multipath devices. Now we build the filter only from devices which have at least one active paths.
# multipath -ll 360060160f4a0300038ed7058b5e9e311 dm-0 DGC ,VRAID size=15G features='0' hwhandler='1 emc' wp=rw |-+- policy='service-time 0' prio=0 status=enabled | `- 4:0:3:0 sdd 8:48 failed faulty running `-+- policy='service-time 0' prio=0 status=enabled `- 4:0:2:0 sdb 8:16 failed faulty running 360060160f4a030003268ab211002e411 dm-1 DGC ,VRAID size=30G features='1 queue_if_no_path' hwhandler='1 emc' wp=rw |-+- policy='service-time 0' prio=4 status=active | `- 4:0:3:1 sde 8:64 active ready running `-+- policy='service-time 0' prio=1 status=enabled `- 4:0:2:1 sdc 8:32 active ready running
Previously, both devices were included in the filter, now only 360060160f4a030003268ab211002e411 will be included in lvm filter.
A faulty device which became active again will be included in lvm filter after the next refresh (every 5 minutes), or after trying edit or create a new storage domain.
lvm uses also short filter, including devices used by the certain vg or lv. It is possible that we also have to exclude such devices from the short filter. This will be handled later if needed.
Change-Id: I6d7a973bcefa95813fdc289847760c0955aca30c Bug-Url: https://bugzilla.redhat.com/880738 Signed-off-by: Nir Soffer nsoffer@redhat.com --- M vdsm/storage/lvm.py M vdsm/storage/multipath.py 2 files changed, 13 insertions(+), 1 deletion(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/75/31875/1
diff --git a/vdsm/storage/lvm.py b/vdsm/storage/lvm.py index 86edf55..9cfad01 100644 --- a/vdsm/storage/lvm.py +++ b/vdsm/storage/lvm.py @@ -244,7 +244,7 @@ if not self._filterStale: return self._extraCfg
- self._extraCfg = _buildConfig(multipath.getMPDevNamesIter()) + self._extraCfg = _buildConfig(multipath.getActiveMPDevNamesIter()) _updateLvmConf(self._extraCfg) self._filterStale = False
diff --git a/vdsm/storage/multipath.py b/vdsm/storage/multipath.py index ba98866..2b30995 100644 --- a/vdsm/storage/multipath.py +++ b/vdsm/storage/multipath.py @@ -382,6 +382,18 @@ yield os.path.join(devicemapper.DMPATH_PREFIX, name)
+def getActiveMPDevNamesIter(): + status = devicemapper.getPathsStatus() + for dmId, guid in getMPDevsIter(): + active = [slave for slave in devicemapper.getSlaves(dmId) + if status.get(slave) == "active"] + if not active: + log.warning("Skipping device %s - no active slave", guid) + continue + log.debug("Found device %s %s", guid, active) + yield os.path.join(devicemapper.DMPATH_PREFIX, guid) + + def getMPDevsIter(): """ Collect the list of all the multipath block devices.
oVirt Jenkins CI Server has posted comments on this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Patch Set 1:
Build Failed
http://jenkins.ovirt.org/job/vdsm_master_unit_tests_gerrit_el/11102/ : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_unit-tests_created/12044/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_master_pep8_gerrit/11891/ : SUCCESS
Nir Soffer has posted comments on this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Patch Set 1:
(1 comment)
http://gerrit.ovirt.org/#/c/31875/1//COMMIT_MSG Commit Message:
Line 34: A faulty device which became active again will be included in lvm filter Line 35: after the next refresh (every 5 minutes), or after trying edit or create Line 36: a new storage domain. Line 37: Line 38: lvm uses also short filter, including devices used by the certain vg or the -> a Line 39: lv. It is possible that we also have to exclude such devices from the Line 40: short filter. This will be handled later if needed. Line 41: Line 42: Change-Id: I6d7a973bcefa95813fdc289847760c0955aca30c
Allon Mureinik has posted comments on this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Patch Set 1:
(2 comments)
http://gerrit.ovirt.org/#/c/31875/1//COMMIT_MSG Commit Message:
Line 11: minutes (stuck in D state). We have seen getDevicesList command stuck Line 12: for up to 10 minutes because of faulty devices in the long filter. Line 13: Line 14: We used to build the filter from all multipath devices. Now we build the Line 15: filter only from devices which have at least one active paths. s/paths/path/ Line 16: Line 17: # multipath -ll Line 18: 360060160f4a0300038ed7058b5e9e311 dm-0 DGC ,VRAID Line 19: size=15G features='0' hwhandler='1 emc' wp=rw
Line 12: for up to 10 minutes because of faulty devices in the long filter. Line 13: Line 14: We used to build the filter from all multipath devices. Now we build the Line 15: filter only from devices which have at least one active paths. Line 16: Consider adding a header here, like "For Exmaple:" Line 17: # multipath -ll Line 18: 360060160f4a0300038ed7058b5e9e311 dm-0 DGC ,VRAID Line 19: size=15G features='0' hwhandler='1 emc' wp=rw Line 20: |-+- policy='service-time 0' prio=0 status=enabled
Nir Soffer has posted comments on this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Patch Set 1:
(2 comments)
http://gerrit.ovirt.org/#/c/31875/1//COMMIT_MSG Commit Message:
Line 11: minutes (stuck in D state). We have seen getDevicesList command stuck Line 12: for up to 10 minutes because of faulty devices in the long filter. Line 13: Line 14: We used to build the filter from all multipath devices. Now we build the Line 15: filter only from devices which have at least one active paths.
s/paths/path/
Done Line 16: Line 17: # multipath -ll Line 18: 360060160f4a0300038ed7058b5e9e311 dm-0 DGC ,VRAID Line 19: size=15G features='0' hwhandler='1 emc' wp=rw
Line 12: for up to 10 minutes because of faulty devices in the long filter. Line 13: Line 14: We used to build the filter from all multipath devices. Now we build the Line 15: filter only from devices which have at least one active paths. Line 16:
Consider adding a header here, like "For Exmaple:"
Done Line 17: # multipath -ll Line 18: 360060160f4a0300038ed7058b5e9e311 dm-0 DGC ,VRAID Line 19: size=15G features='0' hwhandler='1 emc' wp=rw Line 20: |-+- policy='service-time 0' prio=0 status=enabled
Nir Soffer has posted comments on this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Patch Set 2:
Address Allon comments.
oVirt Jenkins CI Server has posted comments on this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Patch Set 2:
Build Failed
http://jenkins.ovirt.org/job/vdsm_master_unit_tests_gerrit_el/11123/ : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_unit-tests_created/12065/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_master_pep8_gerrit/11912/ : SUCCESS
Federico Simoncelli has posted comments on this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Patch Set 2:
It seems pretty much ok. I'll have to go over this once again tomorrow.
Federico Simoncelli has posted comments on this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Patch Set 2:
(1 comment)
http://gerrit.ovirt.org/#/c/31875/2/vdsm/storage/multipath.py File vdsm/storage/multipath.py:
Line 389: if status.get(slave) == "active"] Line 390: if not active: Line 391: log.warning("Skipping device %s - no active slave", guid) Line 392: continue Line 393: log.debug("Found device %s %s", guid, active) Isn't this too verbose? Line 394: yield os.path.join(devicemapper.DMPATH_PREFIX, guid) Line 395: Line 396: Line 397: def getMPDevsIter():
Nir Soffer has posted comments on this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Patch Set 2:
(1 comment)
http://gerrit.ovirt.org/#/c/31875/2/vdsm/storage/multipath.py File vdsm/storage/multipath.py:
Line 389: if status.get(slave) == "active"] Line 390: if not active: Line 391: log.warning("Skipping device %s - no active slave", guid) Line 392: continue Line 393: log.debug("Found device %s %s", guid, active)
Isn't this too verbose?
Yea, we can probably disable this log. Line 394: yield os.path.join(devicemapper.DMPATH_PREFIX, guid) Line 395: Line 396: Line 397: def getMPDevsIter():
Nir Soffer has posted comments on this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Patch Set 2:
Should be replaced by http://gerrit.ovirt.org/32277. Keeping this in case the new approach fails.
Nir Soffer has posted comments on this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Patch Set 2:
This can still be useful even after bug 880738 is fixed, to avoid the much shorter dealy (5-10 seconds) when accessing failed device.
Nir Soffer has posted comments on this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Patch Set 3:
Remove noisy log message and update the commit message.
oVirt Jenkins CI Server has posted comments on this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Patch Set 3:
Build Failed
http://jenkins.ovirt.org/job/vdsm_master_unit_tests_gerrit_el/11458/ : FAILURE
http://jenkins.ovirt.org/job/vdsm_master_unit-tests_created/12402/ : SUCCESS
http://jenkins.ovirt.org/job/vdsm_master_pep8_gerrit/12247/ : SUCCESS
Nir Soffer has posted comments on this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Patch Set 3:
Needs testing.
Jenkins CI RO has posted comments on this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Patch Set 3:
Abandoned due to no activity - please restore if still relevant
Jenkins CI RO has abandoned this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Abandoned
Abandoned due to no activity - please restore if still relevant
gerrit-hooks has posted comments on this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Patch Set 3:
* update_tracker: OK
Nir Soffer has restored this change.
Change subject: lvm: Exclude faulty devices from lvm long filter ......................................................................
Restored
Not fixed yet, until we have a better solution, this patch should be here.
vdsm-patches@lists.fedorahosted.org