Milan Zamazal has uploaded a new change for review.
Change subject: virt: Destroy VM after post-copy migration failure
......................................................................
virt: Destroy VM after post-copy migration failure
As explained in the source code comment, we don't have currently a
better option than to destroy the VM remnants after a failed post-copy
migration. This may change in future, if a failed post-copy migration
recovery is available in libvirt/QEMU.
Change-Id: I1918e9afce189c8b3f617766e55afa13f1e153f1
Signed-off-by: Milan Zamazal <mzamazal(a)redhat.com>
Bug-Url:
https://bugzilla.redhat.com/1354343
---
M lib/vdsm/virt/vmexitreason.py
M vdsm/virt/vm.py
2 files changed, 24 insertions(+), 1 deletion(-)
git pull ssh://gerrit.ovirt.org:29418/vdsm refs/changes/42/64142/7
diff --git a/lib/vdsm/virt/vmexitreason.py b/lib/vdsm/virt/vmexitreason.py
index 46c092b..494cd28 100644
--- a/lib/vdsm/virt/vmexitreason.py
+++ b/lib/vdsm/virt/vmexitreason.py
@@ -30,6 +30,7 @@
MIGRATION_FAILED = 8
LIBVIRT_DOMAIN_MISSING = 9
DESTROYED_ON_STARTUP = 10
+POSTCOPY_MIGRATION_FAILED = 11
exitReasons = {
@@ -44,4 +45,5 @@
MIGRATION_FAILED: 'VM failed to migrate',
LIBVIRT_DOMAIN_MISSING: 'Failed to find the libvirt domain',
DESTROYED_ON_STARTUP: 'VM destroyed during the startup',
+ POSTCOPY_MIGRATION_FAILED: 'Migration failed in post-copy',
}
diff --git a/vdsm/virt/vm.py b/vdsm/virt/vm.py
index ed60354..e274388 100644
--- a/vdsm/virt/vm.py
+++ b/vdsm/virt/vm.py
@@ -4163,7 +4163,28 @@
else:
hooks.after_vm_pause(domxml, self.conf)
elif detail == libvirt.VIR_DOMAIN_EVENT_SUSPENDED_POSTCOPY_FAILED:
- pass # will be handled in a followup patch
+ # This can happen on both the ends of the migration.
+ # After a failed post-copy migration, the VM remains in a
+ # paused state on both the ends of the migration. There is
+ # currently no way to recover it, since the VM is missing some
+ # memory pages on the destination and the old snapshot at the
+ # source doesn't know about the changes made to the external
+ # world (network, storage, ...) during the post-copy phase.
+ # The best what we can do in such a situation is to destroy the
+ # paused VM instances on both the ends before someone tries to
+ # resume any of them, causing confusion at best or more damages
+ # in the worse case. We must also inform Engine about the
+ # fatal state of the failed migration, so we can't destroy the
+ # VM immediately on the destination (but we can do it on the
+ # source). We report the VM as down on the destination to
+ # Engine and wait for destroy request from it.
+ self.log.warning("Migration failed in post-copy, "
+ "destroying VM: %s" % (self.id,))
+ destroy = self.lastStatus == vmstatus.MIGRATION_SOURCE
+ self.setDownStatus(ERROR,
+ vmexitreason.POSTCOPY_MIGRATION_FAILED)
+ if destroy:
+ self.destroy()
elif event == libvirt.VIR_DOMAIN_EVENT_RESUMED:
self._setGuestCpuRunning(True)
--
To view, visit
https://gerrit.ovirt.org/64142
To unsubscribe, visit
https://gerrit.ovirt.org/settings
Gerrit-MessageType: newchange
Gerrit-Change-Id: I1918e9afce189c8b3f617766e55afa13f1e153f1
Gerrit-PatchSet: 7
Gerrit-Project: vdsm
Gerrit-Branch: master
Gerrit-Owner: Milan Zamazal <mzamazal(a)redhat.com>
Gerrit-Reviewer: Arik Hadas <ahadas(a)redhat.com>
Gerrit-Reviewer: Francesco Romani <fromani(a)redhat.com>
Gerrit-Reviewer: gerrit-hooks <automation(a)ovirt.org>