September 2014 - vdsm-patches - Fedora Mailing-Lists

Change in vdsm[master]: Live Merge: add reconcileVolumeChain verb

by alitke＠redhat.com

Adam Litke has uploaded a new change for review. Change subject: Live Merge: add reconcileVolumeChain verb ...................................................................... Live Merge: add reconcileVolumeChain verb If a live merge operation is catastrophically interrupted (we completely lose the host on which the VM was running during the merge) engine must have a way to discover what happened by examining storage only. When the active layer was being merged, it's critical to know whether the the pivot completed or not. Choosing the wrong leaf volume could result in data corruption. When an internal volume is being merged the situation is less dire but providing a way for engine to resolve the final status of the merge provides for a substantially better user experience since the snapshot can be unlocked to allow a cold merge to be attempted. This new verb must run on SPM and is only valid for images which are not in use by running VMs. It will use qemu-img to examine the actual volume chain and will correct the vdsm metadata if needed. Finally, the current volume chain is returned. This chain can be used by engine to understand what happened with the interrupted merge command. Change-Id: Ia8927a42d2bc9adcf7c6b26babb327a00e91e49e Signed-off-by: Adam Litke <alitke(a)redhat.com> --- M client/vdsClient.py M vdsm/API.py M vdsm/rpc/BindingXMLRPC.py M vdsm/rpc/Bridge.py M vdsm/rpc/vdsmapi-schema.json M vdsm/storage/hsm.py 6 files changed, 93 insertions(+), 0 deletions(-) git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/88/31788/1 diff --git a/client/vdsClient.py b/client/vdsClient.py index 35f4ca5..783afa6 100644 --- a/client/vdsClient.py +++ b/client/vdsClient.py @@ -1432,6 +1432,17 @@ return ret['status']['code'], ret['status']['message'] + def reconcileVolumeChain(self, args): + if len(args) != 4: + raise ValueError('Wrong number of parameters') + + ret = self.s.reconcileVolumeChain(*args) + if 'volumes' in ret: + for v in ret['volumes']: + print v + return 0, '' + return ret['status']['code'], ret['status']['message'] + def moveMultiImage(self, args): spUUID = args[0] srcDomUUID = args[1] @@ -2544,6 +2555,10 @@ '<spUUID> <sdUUID> <imgUUID> [<volUUID>]', 'Teardown an image, releasing the prepared volumes.' )), + 'reconcileVolumeChain': (serv.reconcileVolumeChain, ( + '<spUUID> <sdUUID> <imgUUID> <leafVolUUID>', + 'Reconcile an image volume chain and return the current chain.' + )), 'moveMultiImage': (serv.moveMultiImage, ('<spUUID> <srcDomUUID> <dstDomUUID> ' '<imgList>({imgUUID=postzero,' diff --git a/vdsm/API.py b/vdsm/API.py index dceec2d..310ec83 100644 --- a/vdsm/API.py +++ b/vdsm/API.py @@ -906,6 +906,10 @@ methodArgs, callback, self._spUUID, self._sdUUID, self._UUID, volUUID) + def reconcileVolumeChain(self, leafVolUUID): + return self._irs.reconcileVolumeChain(self._spUUID, self._sdUUID, + self._UUID, leafVolUUID) + class LVMVolumeGroup(APIBase): ctorArgs = ['lvmvolumegroupID'] diff --git a/vdsm/rpc/BindingXMLRPC.py b/vdsm/rpc/BindingXMLRPC.py index b6ddd1d..a5286ed 100644 --- a/vdsm/rpc/BindingXMLRPC.py +++ b/vdsm/rpc/BindingXMLRPC.py @@ -710,6 +710,10 @@ image = API.Image(imgUUID, spUUID, sdUUID) return image.teardown(volUUID) + def imageReconcileVolumeChain(self, spUUID, sdUUID, imgUUID, leafUUID): + image = API.Image(imgUUID, spUUID, sdUUID) + return image.reconcileVolumeChain(leafUUID) + def poolConnect(self, spUUID, hostID, scsiKey, msdUUID, masterVersion, domainsMap=None, options=None): pool = API.StoragePool(spUUID) @@ -1055,6 +1059,7 @@ (self.imageDownload, 'downloadImage'), (self.imagePrepare, 'prepareImage'), (self.imageTeardown, 'teardownImage'), + (self.imageReconcileVolumeChain, 'reconcileVolumeChain'), (self.poolConnect, 'connectStoragePool'), (self.poolConnectStorageServer, 'connectStorageServer'), (self.poolCreate, 'createStoragePool'), diff --git a/vdsm/rpc/Bridge.py b/vdsm/rpc/Bridge.py index 25c49f9..0a7a214 100644 --- a/vdsm/rpc/Bridge.py +++ b/vdsm/rpc/Bridge.py @@ -397,6 +397,7 @@ 'Image_download': {'ret': 'uuid'}, 'Image_mergeSnapshots': {'ret': 'uuid'}, 'Image_move': {'ret': 'uuid'}, + 'Image_reconcileVolumeChain': {'ret': 'volumes'}, 'ISCSIConnection_discoverSendTargets': {'ret': 'fullTargets'}, 'LVMVolumeGroup_create': {'ret': 'uuid'}, 'LVMVolumeGroup_getInfo': {'ret': 'info'}, diff --git a/vdsm/rpc/vdsmapi-schema.json b/vdsm/rpc/vdsmapi-schema.json index 2ff8e0b..60b4282 100644 --- a/vdsm/rpc/vdsmapi-schema.json +++ b/vdsm/rpc/vdsmapi-schema.json @@ -4351,6 +4351,29 @@ 'data': {'storagepoolID': 'UUID', 'storagedomainID': 'UUID', 'imageID': 'UUID', '*volumeID': 'UUID'}} +## +# @Image.reconcileVolumeChain: +# +# Reconcile an image volume chain and return the current chain. +# +# @storagepoolID: The UUID of the Storage Pool associated with the Image +# +# @storagedomainID: The UUID of the Storage Domain associated with the Image +# +# @imageID: The UUID of the Image +# +# @leafVolID: The UUID of the original leaf volume +# +# Returns: +# A list of volume UUIDs representing the current volume chain +# +# Since: 4.16.0 +## +{'command': {'class': 'Image', 'name': 'reconcileVolumeChain'}, + 'data': {'storagepoolID': 'UUID', 'storagedomainID': 'UUID', + 'imageID': 'UUID', 'leafVolID': 'UUID'} + 'returns': ['UUID']} + ## Category: @LVMVolumeGroup ################################################### ## # @LVMVolumeGroup: diff --git a/vdsm/storage/hsm.py b/vdsm/storage/hsm.py index fc7681c..c2ccc4d 100644 --- a/vdsm/storage/hsm.py +++ b/vdsm/storage/hsm.py @@ -1860,6 +1860,51 @@ subChainTailVol.setLegality(volume.ILLEGAL_VOL) @public + def reconcileVolumeChain(self, spUUID, sdUUID, imgUUID, leafVolUUID): + """ + In some situations (such as when a live merge is interrupted), the + vdsm volume chain could become out of sync with the actual chain as + understood by qemu. This API uses qemu-img to determine the correct + chain and synchronizes vdsm metadata accordingly. Returns the correct + volume chain. NOT for use on images of running VMs. + """ + argsStr = ("spUUID=%s, sdUUID=%s, imgUUID=%s, leafVolUUID=%s" % + (spUUID, sdUUID, imgUUID, leafVolUUID)) + vars.task.setDefaultException(se.StorageException("%s" % argsStr)) + dom = sdCache.produce(sdUUID=sdUUID) + self.prepareImage(sdUUID, spUUID, imgUUID, leafVolUUID) + + # Walk the volume chain using qemu-img. Not safe for running VMs + retVolumes = [] + volUUID = leafVolUUID + while volUUID is not None: + retVolumes.insert(0, volUUID) + vol = dom.produceVolume(imgUUID, volUUID) + qemuImgFormat = volume.fmt2str(vol.getFormat()) + imgInfo = qemuimg.info(vol.volumePath, qemuImgFormat) + backingFile = imgInfo.get('backingfile') + if backingFile is not None: + volUUID = os.path.basename(backingFile) + else: + volUUID = None + + # A merge of the active layer has copy and pivot phases. + # During copy, data is copied from the leaf into its parent. Writes + # are mirrored to both volumes. So even after copying is complete the + # volumes will remain consistent. Finally, the VM is pivoted from the + # old leaf to the new leaf and mirroring to the old leaf ceases. During + # mirroring and before pivoting, we mark the old leaf ILLEGAL so we + # know it's safe to delete in case the operation is interrupted. + vol = dom.produceVolume(imgUUID, leafVolUUID) + if vol.getLegality() == volume.ILLEGAL_VOL: + retVolumes.remove(leafVolUUID) + + # Now that we know the correct volume chain, sync the storge metadata + self.imageSyncVolumeChain(sdUUID, imgUUID, retVolumes[0], retVolumes) + dom.deactivateImage(imgUUID) + return dict(volumes=retVolumes) + + @public def mergeSnapshots(self, sdUUID, spUUID, vmUUID, imgUUID, ancestor, successor, postZero=False): """ -- To view, visit http://gerrit.ovirt.org/31788 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia8927a42d2bc9adcf7c6b26babb327a00e91e49e Gerrit-PatchSet: 1 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Adam Litke <alitke(a)redhat.com>

9 years, 8 months

5
15
0 / 0

Change in vdsm[master]: virt: Resolve snapshot type after prepareVolumePath

by alitke＠redhat.com

Adam Litke has uploaded a new change for review. Change subject: virt: Resolve snapshot type after prepareVolumePath ...................................................................... virt: Resolve snapshot type after prepareVolumePath The Drive.blockDev property depends on the presence of the volume path locally. Therefore, it is not safe to call until after the new snapshot volume path has been prepared. Use a lambda function to delay the call until the information is needed. Change-Id: Ia4726702eb329bf65f1f9c658067f7a01c2179e4 Signed-off-by: Adam Litke <alitke(a)redhat.com> --- M vdsm/virt/vm.py 1 file changed, 5 insertions(+), 2 deletions(-) git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/66/32066/1 diff --git a/vdsm/virt/vm.py b/vdsm/virt/vm.py index c157e4c..ff69488 100644 --- a/vdsm/virt/vm.py +++ b/vdsm/virt/vm.py @@ -4011,7 +4011,10 @@ newDrives[vmDevName]["poolID"] = vmDrive.poolID newDrives[vmDevName]["name"] = vmDevName newDrives[vmDevName]["format"] = "cow" - snapTypes[vmDevName] = ('file', 'block')[vmDrive.blockDev] + + # We cannot evaluate blockDev until after prepareVolumePath + snapTypes[vmDevName] = \ + lambda: 'block' if vmDrive.blockDev else 'file' # If all the drives are the current ones, return success if len(newDrives) == 0: @@ -4035,7 +4038,7 @@ return errCode['snapshotErr'] snapelem = _diskSnapshot(vmDevName, newDrives[vmDevName]["path"], - snapTypes[vmDevName]) + snapTypes[vmDevName]()) disks.appendChild(snapelem) snap.appendChild(disks) -- To view, visit http://gerrit.ovirt.org/32066 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: newchange Gerrit-Change-Id: Ia4726702eb329bf65f1f9c658067f7a01c2179e4 Gerrit-PatchSet: 1 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Adam Litke <alitke(a)redhat.com>

9 years, 8 months

5
15
0 / 0

Change in vdsm[master]: Live Merge: Suspend disk stats collection during pivot

by alitke＠redhat.com

Adam Litke has uploaded a new change for review. Change subject: Live Merge: Suspend disk stats collection during pivot ...................................................................... Live Merge: Suspend disk stats collection during pivot When pivoting to the new volume after an active layer merge, there is a brief period where the vdsm metadata is out of sync. During this time, the disk stats collection thread will raise an exception since the drive cannot be referred to by its old path. Therefore we disable stats collection during this short window until the metadata can be synchronized. Change-Id: I08689aec4d61871a568a6f92661b560fbf4d0b57 Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1109918 Signed-off-by: Adam Litke <alitke(a)redhat.com> --- M vdsm/virt/vm.py 1 file changed, 6 insertions(+), 0 deletions(-) git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/67/31367/1 diff --git a/vdsm/virt/vm.py b/vdsm/virt/vm.py index 1347568..c2d8964 100644 --- a/vdsm/virt/vm.py +++ b/vdsm/virt/vm.py @@ -5545,6 +5545,7 @@ if mode == LiveMergeCleanupThread.MODE_CLEANUP: self.log.info("Live merge job completed (job %s)", jobID) self._syncVolumeChain(drive) + self.startDisksStatsCollection() elif mode == LiveMergeCleanupThread.MODE_PIVOT: # We call imageSyncVolumeChain which will mark the current leaf # ILLEGAL. We do this before requesting a pivot so that we can @@ -5556,6 +5557,11 @@ self.cif.irs.imageSyncVolumeChain(drive.domainID, drive.imageID, drive['volumeID'], newVols) + # A pivot changes the top volume being used for the VM Disk. Until + # we can correct our metadata following the pivot we should not + # attempt to collect disk stats. + self.stopDisksStatsCollection() + self.log.info("Requesting pivot to complete active layer commit " "(job %s)", jobID) flags = libvirt.VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT -- To view, visit http://gerrit.ovirt.org/31367 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I08689aec4d61871a568a6f92661b560fbf4d0b57 Gerrit-PatchSet: 1 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Adam Litke <alitke(a)redhat.com>

9 years, 8 months

6
37
0 / 0

Change in vdsm[master]: Live Merge: Simplify pivot flow

by alitke＠redhat.com

Adam Litke has uploaded a new change for review. Change subject: Live Merge: Simplify pivot flow ...................................................................... Live Merge: Simplify pivot flow The pivot part of a Live Merge can be made a synchronous operation which simplifies our flows. We're already doing cleanup in a thread so just do the pivot and wait for it (very short) to complete and then continue directly to the volume chain sync part of the cleanup. This is much simpler than doing the pivot and waiting another sample interval to try the sync portion. Change-Id: I5e6a9ea7096b5d81f418754d411f135450bf572e Signed-off-by: Adam Litke <alitke(a)redhat.com> --- M vdsm/virt/vm.py 1 file changed, 44 insertions(+), 26 deletions(-) git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/49/31849/1 diff --git a/vdsm/virt/vm.py b/vdsm/virt/vm.py index 4cbcd2c..5890d6a 100644 --- a/vdsm/virt/vm.py +++ b/vdsm/virt/vm.py @@ -5356,8 +5356,8 @@ return False def queryBlockJobs(self): - def startCleanup(job, drive): - t = LiveMergeCleanupThread(self, job['jobID'], drive) + def startCleanup(job, drive, mode): + t = LiveMergeCleanupThread(self, job['jobID'], drive, mode) t.start() self._liveMergeCleanupThreads[job['jobID']] = t @@ -5370,9 +5370,9 @@ jobID = storedJob['jobID'] cleanThread = self._liveMergeCleanupThreads.get(jobID) if cleanThread and cleanThread.isSuccessful(): - # Handle successfully cleaned jobs early because the job - # just needs to be untracked and the stored disk info might - # be stale anyway (ie. after active layer commit). + # Handle successful jobs early because the job just needs + # to be untracked and the stored disk info might be stale + # anyway (ie. after active layer commit). self.untrackBlockJob(jobID) continue @@ -5391,32 +5391,31 @@ jobsRet[jobID] = entry continue + mode = None if liveInfo: entry['bandwidth'] = liveInfo['bandwidth'] entry['cur'] = str(liveInfo['cur']) entry['end'] = str(liveInfo['end']) if self._activeLayerCommitReady(liveInfo): - try: - self.handleBlockJobEvent(jobID, drive, 'pivot') - except Exception: - # Just log it. We will retry next time - self.log.error("Pivot failed for job %s", jobID) + mode = LiveMergeCleanupThread.MODE_PIVOT else: # Libvirt has stopped reporting this job so we know it will # never report it again. + mode = LiveMergeCleanupThread.MODE_CLEANUP storedJob['gone'] = True + if mode: if not cleanThread: # There is no cleanup thread so the job must have just # ended. Spawn an async cleanup. - startCleanup(storedJob, drive) + startCleanup(storedJob, drive, mode) elif cleanThread.isAlive(): # Let previously started cleanup thread continue self.log.debug("Still waiting for block job %s to be " - "cleaned up", jobID) + "synchronized", jobID) elif not cleanThread.isSuccessful(): # At this point we know the thread is not alive and the # cleanup failed. Retry it with a new thread. - startCleanup(storedJob, drive) + startCleanup(storedJob, drive, mode) jobsRet[jobID] = entry return jobsRet @@ -5618,14 +5617,26 @@ device['volumeChain'] = drive.volumeChain = newChain def handleBlockJobEvent(self, jobID, drive, mode): - if mode == 'finished': - self.log.info("Live merge job completed (job %s)", jobID) - self._syncVolumeChain(drive) - elif mode == 'pivot': + if mode == LiveMergeCleanupThread.MODE_PIVOT: + # We call imageSyncVolumeChain which will mark the current leaf + # ILLEGAL. We do this before requesting a pivot so that we can + # properly recover the VM in case we crash. At this point the + # active layer contains the same data as its parent so the ILLEGAL + # flag indicates that the VM should be restarted using the parent. + newVols = [vol['volumeID'] for vol in drive.volumeChain + if vol['volumeID'] != drive.volumeID] + self.cif.irs.imageSyncVolumeChain(drive.domainID, drive.imageID, + drive['volumeID'], newVols) + self.log.info("Requesting pivot to complete active layer commit " "(job %s)", jobID) flags = libvirt.VIR_DOMAIN_BLOCK_JOB_ABORT_PIVOT - self._dom.blockJobAbort(drive.name, flags) + if self._dom.blockJobAbort(drive.name, flags) != 0: + raise RuntimeError("pivot failed") + if mode in (LiveMergeCleanupThread.MODE_CLEANUP, + LiveMergeCleanupThread.MODE_PIVOT): + self.log.info("Live merge job completed (job %s)", jobID) + self._syncVolumeChain(drive) else: raise RuntimeError("Invalid mode: '%s'" % mode) @@ -5666,30 +5677,37 @@ class LiveMergeCleanupThread(threading.Thread): - def __init__(self, vm, jobId, drive): + MODE_CLEANUP = 'cleanup' + MODE_PIVOT = 'pivot' + + def __init__(self, vm, jobId, drive, mode): threading.Thread.__init__(self) self.setDaemon(True) self.vm = vm self.jobId = jobId self.drive = drive + self.mode = mode self.success = False @utils.traceback() def run(self): - self.vm.log.info("Starting live merge cleanup for job %s", - self.jobId) + self.vm.log.info("Starting live merge %s for job %s", + self.mode, self.jobId) try: - self.vm.handleBlockJobEvent(self.jobId, self.drive, 'finished') + self.vm.handleBlockJobEvent(self.jobId, self.drive, self.mode) except Exception: - self.vm.log.warning("Cleanup failed for live merge job %s", - self.jobId) + self.vm.log.warning("%s failed for live merge job %s", + self.mode, self.jobId) raise else: self.success = True - self.vm.log.info("Cleanup completed for live merge job %s", - self.jobId) + self.vm.log.info("%s completed for live merge job %s", + self.mode, self.jobId) def isSuccessful(self): + """ + Returns True if this phase completed successfully. + """ return self.success -- To view, visit http://gerrit.ovirt.org/31849 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I5e6a9ea7096b5d81f418754d411f135450bf572e Gerrit-PatchSet: 1 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Adam Litke <alitke(a)redhat.com>

9 years, 8 months

6
18
0 / 0

Change in vdsm[ovirt-3.5]: getVMFullList hangs into infinite loop when handing many VMs

by Jenkins CI RO

oVirt Jenkins CI Server has posted comments on this change. Change subject: getVMFullList hangs into infinite loop when handing many VMs ...................................................................... Patch Set 5: Build Successful http://jenkins.ovirt.org/job/vdsm_3.5_create-rpms_merged/129/ : SUCCESS -- To view, visit http://gerrit.ovirt.org/32290 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia728691880771388b6179e85a97df3f303aeddf8 Gerrit-PatchSet: 5 Gerrit-Project: vdsm Gerrit-Branch: ovirt-3.5 Gerrit-Owner: Piotr Kliczewski <piotr.kliczewski(a)gmail.com> Gerrit-Reviewer: Barak Azulay <bazulay(a)redhat.com> Gerrit-Reviewer: Dan Kenigsberg <danken(a)redhat.com> Gerrit-Reviewer: Piotr Kliczewski <piotr.kliczewski(a)gmail.com> Gerrit-Reviewer: Saggi Mizrahi <smizrahi(a)redhat.com> Gerrit-Reviewer: Yaniv Bronhaim <ybronhei(a)redhat.com> Gerrit-Reviewer: automation(a)ovirt.org Gerrit-Reviewer: oVirt Jenkins CI Server Gerrit-HasComments: No

9 years, 8 months

1
0
0 / 0

Change in vdsm[master]: Live Merge: Mark active layer ILLEGAL before pivoting

by alitke＠redhat.com

Adam Litke has uploaded a new change for review. Change subject: Live Merge: Mark active layer ILLEGAL before pivoting ...................................................................... Live Merge: Mark active layer ILLEGAL before pivoting A block commit of the active layer takes place in two phases. First, data is copied from the current leaf to its parent. During this time, all writes to the disk are mirrored to both volumes. When this process converges the two volumes are and will remain identical due to ongoing drive mirroring by qemu. At this point we instruct libvirt to perform a pivot from the current leaf to the new leaf. Once this is complete we synchronize our metadata and end the job. If we are catastrophically interrupted (lose contact with the host) between the pivot request and final metadata synchronization there is currently no way to know whether the pivot actually happened. This could be dangerous because restarting the VM on another host with the wrong leaf volume could lead to data corruption. The solution is to mark the current leaf ILLEGAL before requesting a pivot. Since the volumes are identical at this point it will always be safe to rerun the VM using the new leaf. A subsequent patch will take advantage of this ILLEGAL marker to provide engine with a reliable and correct volume chain for this recovery scenario. Change-Id: I97ce8fbdfad7c81181a206f160ffdd647c552d4d Signed-off-by: Adam Litke <alitke(a)redhat.com> --- M vdsm/storage/hsm.py M vdsm/virt/vm.py 2 files changed, 56 insertions(+), 27 deletions(-) git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/87/31787/1 diff --git a/vdsm/storage/hsm.py b/vdsm/storage/hsm.py index 6a53ca7..fc7681c 100644 --- a/vdsm/storage/hsm.py +++ b/vdsm/storage/hsm.py @@ -1850,9 +1850,14 @@ return dstParent = sdDom.produceVolume(imgUUID, subChain[0]).getParent() - children = sdDom.produceVolume(imgUUID, subChain[-1]).getChildren() + subChainTailVol = sdDom.produceVolume(imgUUID, subChain[-1]) + children = subChainTailVol.getChildren() for childID in children: sdDom.produceVolume(imgUUID, childID).setParentMeta(dstParent) + if not children: + self.log.debug("Leaf volume is being removed from the chain. " + "Marking it ILLEGAL to prevent data corruption") + subChainTailVol.setLegality(volume.ILLEGAL_VOL) @public def mergeSnapshots(self, sdUUID, spUUID, vmUUID, imgUUID, ancestor, diff --git a/vdsm/virt/vm.py b/vdsm/virt/vm.py index a3abde8..dacdbf6 100644 --- a/vdsm/virt/vm.py +++ b/vdsm/virt/vm.py @@ -5351,8 +5351,8 @@ return False def queryBlockJobs(self): - def startCleanup(job, drive): - t = LiveMergeCleanupThread(self, job['jobID'], drive) + def startCleanup(job, drive, mode): + t = LiveMergeCleanupThread(self, job['jobID'], drive, mode) t.start() self._liveMergeCleanupThreads[job['jobID']] = t @@ -5364,10 +5364,10 @@ for storedJob in self.conf['_blockJobs'].values(): jobID = storedJob['jobID'] cleanThread = self._liveMergeCleanupThreads.get(jobID) - if cleanThread and cleanThread.isSuccessful(): - # Handle successfully cleaned jobs early because the job - # just needs to be untracked and the stored disk info might - # be stale anyway (ie. after active layer commit). + if cleanThread and cleanThread.isFinalized(): + # Handle finalized jobs early because the job just needs + # to be untracked and the stored disk info might be stale + # anyway (ie. after active layer commit). self.untrackBlockJob(jobID) continue @@ -5386,32 +5386,32 @@ jobsRet[jobID] = entry continue + mode = None if liveInfo: entry['bandwidth'] = liveInfo['bandwidth'] entry['cur'] = str(liveInfo['cur']) entry['end'] = str(liveInfo['end']) if self._activeLayerCommitReady(liveInfo): - try: - self.handleBlockJobEvent(jobID, drive, 'pivot') - except Exception: - # Just log it. We will retry next time - self.log.error("Pivot failed for job %s", jobID) + mode = LiveMergeCleanupThread.MODE_PIVOT else: # Libvirt has stopped reporting this job so we know it will # never report it again. + mode = LiveMergeCleanupThread.MODE_CLEANUP storedJob['gone'] = True + if mode: if not cleanThread: # There is no cleanup thread so the job must have just # ended. Spawn an async cleanup. - startCleanup(storedJob, drive) + startCleanup(storedJob, drive, mode) elif cleanThread.isAlive(): # Let previously started cleanup thread continue self.log.debug("Still waiting for block job %s to be " - "cleaned up", jobID) - elif not cleanThread.isSuccessful(): + "synchronized", jobID) + elif not cleanThread.isFinalized(): # At this point we know the thread is not alive and the - # cleanup failed. Retry it with a new thread. - startCleanup(storedJob, drive) + # cleanup failed or we have another step to complete. + # Continue or retry it with a new thread. + startCleanup(storedJob, drive, mode) jobsRet[jobID] = entry return jobsRet @@ -5616,11 +5616,21 @@ device['volumeChain'] = drive.volumeChain = newChain def handleBlockJobEvent(self, jobID, drive, mode): - if mode == 'finished': + if mode == LiveMergeCleanupThread.MODE_CLEANUP: self.log.info("Live merge job completed (job %s)", jobID) self._syncVolumeChain(drive) self.startDisksStatsCollection() - elif mode == 'pivot': + elif mode == LiveMergeCleanupThread.MODE_PIVOT: + # We call imageSyncVolumeChain which will mark the current leaf + # ILLEGAL. We do this before requesting a pivot so that we can + # properly recover the VM in case we crash. At this point the + # active layer contains the same data as its parent so the ILLEGAL + # flag indicates that the VM should be restarted using the parent. + newVols = [vol['volumeID'] for vol in drive.volumeChain + if vol['volumeID'] != drive.volumeID] + self.cif.irs.imageSyncVolumeChain(drive.domainID, drive.imageID, + drive['volumeID'], newVols) + # A pivot changes the top volume being used for the VM Disk. Until # we can correct our metadata following the pivot we should not # attempt to collect disk stats. @@ -5669,32 +5679,46 @@ class LiveMergeCleanupThread(threading.Thread): - def __init__(self, vm, jobId, drive): + MODE_CLEANUP = 'cleanup' + MODE_PIVOT = 'pivot' + + def __init__(self, vm, jobId, drive, mode): threading.Thread.__init__(self) self.setDaemon(True) self.vm = vm self.jobId = jobId self.drive = drive + self.mode = mode self.success = False @utils.traceback() def run(self): - self.vm.log.info("Starting live merge cleanup for job %s", - self.jobId) + self.vm.log.info("Starting live merge %s for job %s", + self.mode, self.jobId) try: - self.vm.handleBlockJobEvent(self.jobId, self.drive, 'finished') + self.vm.handleBlockJobEvent(self.jobId, self.drive, self.mode) except Exception: - self.vm.log.warning("Cleanup failed for live merge job %s", - self.jobId) + self.vm.log.warning("%s failed for live merge job %s", + self.mode, self.jobId) raise else: self.success = True - self.vm.log.info("Cleanup completed for live merge job %s", - self.jobId) + self.vm.log.info("%s completed for live merge job %s", + self.mode, self.jobId) def isSuccessful(self): + """ + Returns True if this phase completed successfully. + """ return self.success + def isFinalized(self): + """ + Returns True if the last cleanup phase succeeded and the job can be + removed. + """ + return self.success and self.mode == self.MODE_CLEANUP + def _devicesWithAlias(domXML): return vmxml.filter_devices_with_alias(vmxml.all_devices(domXML)) -- To view, visit http://gerrit.ovirt.org/31787 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I97ce8fbdfad7c81181a206f160ffdd647c552d4d Gerrit-PatchSet: 1 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Adam Litke <alitke(a)redhat.com>

9 years, 8 months

6
23
0 / 0

Change in vdsm[master]: Live Merge: Extend internal block volumes during merge

by alitke＠redhat.com

Adam Litke has uploaded a new change for review. Change subject: Live Merge: Extend internal block volumes during merge ...................................................................... Live Merge: Extend internal block volumes during merge Libvirt will be exposing the high write watermark for internal volumes directly in the domain XML. An <allocation/> element will be added to each member of the image chain. This patch implements the missing logic for _getMergeWriteWatermarks using this new API. Change-Id: I3a9e0ebdb9c42df713c40e0fc5782945eb7228a8 Signed-off-by: Adam Litke <alitke(a)redhat.com> Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1109920 --- M vdsm/virt/vm.py 1 file changed, 30 insertions(+), 3 deletions(-) git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/68/31268/1 diff --git a/vdsm/virt/vm.py b/vdsm/virt/vm.py index 25d75eb..19d45e6 100644 --- a/vdsm/virt/vm.py +++ b/vdsm/virt/vm.py @@ -2063,9 +2063,36 @@ self.conf['timeOffset'] = newTimeOffset def _getMergeWriteWatermarks(self): - # TODO: Adopt the future libvirt API when the following RFE is done - # https://bugzilla.redhat.com/show_bug.cgi?id=1041569 - return {} + def findElement(doc, name): + for child in doc.childNodes: + if child.nodeName == name: + return child + return None + + ret = {} + domXML = self._getUnderlyingVmInfo() + for deviceXML, alias in _devicesWithAlias(domXML): + try: + drive = self._lookupDeviceByAlias(DISK_DEVICES, alias) + job = self.getBlockJob(drive) + except LookupError: + continue + + volChain = job['chain'] + stats = [] + volXML = deviceXML + while volXML: + allocXML = findElement(volXML, 'allocation') + if allocXML: + stats.insert(0, int(allocXML.firstChild.nodeValue)) + volXML = findElement(volXML, 'backingStore') + if len(stats) != len(volChain): + self.log.debug("Volume allocation information not provided " + "for drive %s, skipping", alias) + continue + for vol, stat in zip(volChain, stats): + ret[vol] = stat + return ret def _getLiveMergeExtendCandidates(self): ret = {} -- To view, visit http://gerrit.ovirt.org/31268 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I3a9e0ebdb9c42df713c40e0fc5782945eb7228a8 Gerrit-PatchSet: 1 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Adam Litke <alitke(a)redhat.com>

9 years, 8 months

6
30
0 / 0

Change in vdsm[master]: Live Merge: Get volume chain for multiple drives

by alitke＠redhat.com

Adam Litke has uploaded a new change for review. Change subject: Live Merge: Get volume chain for multiple drives ...................................................................... Live Merge: Get volume chain for multiple drives A future patch wants to get volume chain information for all drives in one call. Rather than getting the domXML once per drive, allow _driveGetActualVolumeChain to accept a list of drives and parse the information all at once. Change-Id: I93d5a641b0814e3764c70bea8a6c1910a821adcc Bug-Url: https://bugzilla.redhat.com/show_bug.cgi?id=1109920 Signed-off-by: Adam Litke <alitke(a)redhat.com> --- M vdsm/virt/vm.py 1 file changed, 37 insertions(+), 28 deletions(-) git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/66/31366/1 diff --git a/vdsm/virt/vm.py b/vdsm/virt/vm.py index 3bb9ce0..1347568 100644 --- a/vdsm/virt/vm.py +++ b/vdsm/virt/vm.py @@ -5352,8 +5352,9 @@ return errCode['imageErr'] # Check that libvirt exposes full volume chain information - ret = self._driveGetActualVolumeChain(drive) - if ret is None: + try: + ret = self._driveGetActualVolumeChain([drive])[drive['alias']] + except KeyError: self.log.error("merge: libvirt does not support volume chain " "monitoring. Unable to perform live merge.") return errCode['mergeErr'] @@ -5444,14 +5445,13 @@ return {'status': doneCode} - def _lookupDeviceXMLByAlias(self, targetAlias): - domXML = self._getUnderlyingVmInfo() - for deviceXML, alias in _devicesWithAlias(domXML): - if alias == targetAlias: - return deviceXML - return None + def _driveGetActualVolumeChain(self, drives): + def lookupDeviceXMLByAlias(domXML, targetAlias): + for deviceXML, alias in _devicesWithAlias(domXML): + if alias == targetAlias: + return deviceXML + return None - def _driveGetActualVolumeChain(self, drive): def pathToVolID(drive, path): for vol in drive.volumeChain: if os.path.realpath(vol['path']) == os.path.realpath(path): @@ -5464,23 +5464,31 @@ return child return None - diskXML = self._lookupDeviceXMLByAlias(drive['alias']) - if not diskXML: - return None - volChain = [] - while True: - sourceXML = findElement(diskXML, 'source') - if not sourceXML: - break - sourceAttr = ('file', 'dev')[drive.blockDev] - path = sourceXML.getAttribute(sourceAttr) - info = {'id': pathToVolID(drive, path), 'path': path} - volChain.insert(0, info) - bsXML = findElement(diskXML, 'backingStore') - if not bsXML: - break - diskXML = bsXML - return volChain + ret = {} + domXML = self._getUnderlyingVmInfo() + for drive in drives: + alias = drive['alias'] + diskXML = lookupDeviceXMLByAlias(domXML, alias) + if not diskXML: + continue + volChain = [] + while True: + sourceXML = findElement(diskXML, 'source') + if not sourceXML: + break + sourceAttr = ('file', 'dev')[drive.blockDev] + path = sourceXML.getAttribute(sourceAttr) + info = {'id': pathToVolID(drive, path), 'path': path} + bsXML = findElement(diskXML, 'backingStore') + if not bsXML: + self.log.debug("<backingStore/> missing from backing " + "chain for block device %s", drive.name) + break + diskXML = bsXML + volChain.insert(0, info) + if volChain: + ret[alias] = volChain + return ret def _syncVolumeChain(self, drive): def getVolumeInfo(device, volumeID): @@ -5494,8 +5502,9 @@ return curVols = [x['volumeID'] for x in drive.volumeChain] - ret = self._driveGetActualVolumeChain(drive) - if ret is None: + try: + ret = self._driveGetActualVolumeChain([drive])[drive['alias']] + except KeyError: self.log.debug("Unable to determine volume chain. Skipping volume " "chain synchronization for drive %s", drive.name) return -- To view, visit http://gerrit.ovirt.org/31366 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: newchange Gerrit-Change-Id: I93d5a641b0814e3764c70bea8a6c1910a821adcc Gerrit-PatchSet: 1 Gerrit-Project: vdsm Gerrit-Branch: master Gerrit-Owner: Adam Litke <alitke(a)redhat.com>

9 years, 8 months

6
40
0 / 0

Change in vdsm[ovirt-3.5]: getVMFullList hangs into infinite loop when handing many VMs

by ybronhei＠redhat.com

Yaniv Bronhaim has submitted this change and it was merged. Change subject: getVMFullList hangs into infinite loop when handing many VMs ...................................................................... getVMFullList hangs into infinite loop when handing many VMs When we send 22k response we get -1 during send. After the investigation we found that m2crypto ignores error codes and returns -1. We are not sure which error is returned. It can be one of SSL_ERROR_WANT_WRITE, SSL_ERROR_WANT_READ, SSL_ERROR_NONE, SSL_ERROR_ZERO_RETURN. We use numSent in AsyncDispatcher#handle_write to decide whether we need to process current buffer or we are done. When numSent is -1 we get into infinite loop due to: self._outbuf = data[numSent:] We split the data and attempt to send it one more time but we still get -1. We need to get to the poll loop to continue. Reviewed-On: http://gerrit.ovirt.org/#/c/32263/ Bug-Url: https://bugzilla.redhat.com/1135959 Change-Id: Ia728691880771388b6179e85a97df3f303aeddf8 Signed-off-by: pkliczewski <piotr.kliczewski(a)gmail.com> Reviewed-on: http://gerrit.ovirt.org/32290 Reviewed-by: Dan Kenigsberg <danken(a)redhat.com> Reviewed-by: Yaniv Bronhaim <ybronhei(a)redhat.com> --- M lib/yajsonrpc/betterAsyncore.py 1 file changed, 16 insertions(+), 0 deletions(-) Approvals: Piotr Kliczewski: Verified Yaniv Bronhaim: Looks good to me, approved Dan Kenigsberg: Looks good to me, but someone else must approve -- To view, visit http://gerrit.ovirt.org/32290 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: merged Gerrit-Change-Id: Ia728691880771388b6179e85a97df3f303aeddf8 Gerrit-PatchSet: 5 Gerrit-Project: vdsm Gerrit-Branch: ovirt-3.5 Gerrit-Owner: Piotr Kliczewski <piotr.kliczewski(a)gmail.com> Gerrit-Reviewer: Barak Azulay <bazulay(a)redhat.com> Gerrit-Reviewer: Dan Kenigsberg <danken(a)redhat.com> Gerrit-Reviewer: Piotr Kliczewski <piotr.kliczewski(a)gmail.com> Gerrit-Reviewer: Saggi Mizrahi <smizrahi(a)redhat.com> Gerrit-Reviewer: Yaniv Bronhaim <ybronhei(a)redhat.com> Gerrit-Reviewer: automation(a)ovirt.org

9 years, 8 months

1
0
0 / 0

Change in vdsm[ovirt-3.5]: getVMFullList hangs into infinite loop when handing many VMs

by ybronhei＠redhat.com

Yaniv Bronhaim has posted comments on this change. Change subject: getVMFullList hangs into infinite loop when handing many VMs ...................................................................... Patch Set 4: Code-Review+2 -- To view, visit http://gerrit.ovirt.org/32290 To unsubscribe, visit http://gerrit.ovirt.org/settings Gerrit-MessageType: comment Gerrit-Change-Id: Ia728691880771388b6179e85a97df3f303aeddf8 Gerrit-PatchSet: 4 Gerrit-Project: vdsm Gerrit-Branch: ovirt-3.5 Gerrit-Owner: Piotr Kliczewski <piotr.kliczewski(a)gmail.com> Gerrit-Reviewer: Barak Azulay <bazulay(a)redhat.com> Gerrit-Reviewer: Dan Kenigsberg <danken(a)redhat.com> Gerrit-Reviewer: Piotr Kliczewski <piotr.kliczewski(a)gmail.com> Gerrit-Reviewer: Saggi Mizrahi <smizrahi(a)redhat.com> Gerrit-Reviewer: Yaniv Bronhaim <ybronhei(a)redhat.com> Gerrit-Reviewer: automation(a)ovirt.org Gerrit-HasComments: No

9 years, 8 months

1
0
0 / 0

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

vdsm-patches September 2014