From Dan Kenigsberg danken@redhat.com:
Dan Kenigsberg has submitted this change and it was merged.
Change subject: migration: make progress reporting reliable ......................................................................
migration: make progress reporting reliable
The rhbz#1414626 demonstrates some issues in our current progress handling. Since we use the progress also for logic inside the SourceThread class, this is a more serious issue than just showing a value.
The first issue is that we update the progress percentage in two places: getStat(), just before reporting it, and finishSuccesfully, to make sure the progress is 100% after migration ended (rounding errors or missing updates may case it to stay at 99%). Unfortunately, getStat() can run concurrently to the migration code, so nothing prevents it from overwrite the value finishSuccessfully set.
The second issue is more complex. First let's point out that there could be legitimate cases for progress going backwards. If a migration is stalling because the guess is making pages dirty too fast, the amount of data_remaining could increase.
However, because both lack of guarantees about monotonic increase and the race on write, we could end up with progress = 99 after migration ended, and this could mistakenly trigger another migration attempt - which indeed happen in rhbz#1414626.
The definitive fix for this harmful retry is provided in change Ie45553bf3ec3db76e520d11a68c67b5b9664dc32
Change-Id: I2663382b6b1b2b58f8e4980a23ace36f4736930d Bug-Url: https://bugzilla.redhat.com/1414626 Backport-To: 4.1 Backport-To: 4.0 Signed-off-by: Francesco Romani fromani@redhat.com --- M lib/vdsm/virt/migration.py M tests/vmmigration_test.py 2 files changed, 88 insertions(+), 7 deletions(-)
Approvals: Dan Kenigsberg: Looks good to me, approved Francesco Romani: Verified; Passed CI tests Martin Polednik: Looks good to me, but someone else must approve Milan Zamazal: Looks good to me, but someone else must approve
vdsm-patches@lists.fedorahosted.org