Blame SOURCES/0042-v2v-rhv-upload-plugin-Fix-waiting-for-finalize.patch

62f9b7
From 419aa23f0c6338b01b644094c3af2c024470e9b2 Mon Sep 17 00:00:00 2001
62f9b7
From: Nir Soffer <nsoffer@redhat.com>
62f9b7
Date: Sat, 10 Jul 2021 02:35:27 +0300
62f9b7
Subject: [PATCH] v2v: rhv-upload-plugin: Fix waiting for finalize
62f9b7
62f9b7
Waiting for image transfer finalize is complex. In the past we tried to
62f9b7
simplify the process by waiting on the disk status, but turns out that
62f9b7
due to the way oVirt lock the disk, this is not reliable.
62f9b7
62f9b7
These is finalize success flow:
62f9b7
62f9b7
1. User asks to finalize the transfer
62f9b7
2. oVirt sets transfer phase to FINALIZING_SUCCESS
62f9b7
3. oVirt finalizes the transfer
62f9b7
4. oVirt sets disk status to OK
62f9b7
5. oVirt unlocks the disk and changes transfer phase to FINISHED_SUCCESS
62f9b7
6. oVirt removes the transfer
62f9b7
62f9b7
In oVirt logs we can see that disk status changes to OK about 3 seconds
62f9b7
before the disk is actually unlocked. This is a very old problem that is
62f9b7
unlikely to be fixed.
62f9b7
62f9b7
The only way to wait for transfer completion is to poll the transfer
62f9b7
phase, but oVirt makes this hard by removing the transfer shortly after
62f9b7
it completes, so we may not be able to get the FINISHED_SUCCESS phase.
62f9b7
If the transfer was removed before we got one of the final phases, we
62f9b7
need to check the disk status to understand the result of transfer.
62f9b7
62f9b7
oVirt 4.4.7 made polling transfer phase easier by keeping the transfer
62f9b7
after completion, but we need to support older versions so we must have
62f9b7
generic code that work with any version.
62f9b7
62f9b7
To make debugging easier, we log the transfer phase during polling. Here
62f9b7
is a typical transfer log when finalizing transfer:
62f9b7
62f9b7
    finalizing transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa
62f9b7
    transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa is finalizing_success
62f9b7
    transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa is finalizing_success
62f9b7
    transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa is finalizing_success
62f9b7
    transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa is finalizing_success
62f9b7
    transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa is finished_success
62f9b7
    transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa finalized in 5.153 seconds
62f9b7
62f9b7
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
62f9b7
Fixes: https://bugzilla.redhat.com/1976024
62f9b7
(cherry picked from commit 79702b28329d15a7485801ed7e915d486fcc0cf4)
62f9b7
---
62f9b7
 v2v/rhv-upload-plugin.py | 98 ++++++++++++++++++++++++++++------------
62f9b7
 1 file changed, 69 insertions(+), 29 deletions(-)
62f9b7
62f9b7
diff --git a/v2v/rhv-upload-plugin.py b/v2v/rhv-upload-plugin.py
62f9b7
index 7cd6dea6..61ade1a8 100644
62f9b7
--- a/v2v/rhv-upload-plugin.py
62f9b7
+++ b/v2v/rhv-upload-plugin.py
62f9b7
@@ -601,17 +601,29 @@ def finalize_transfer(connection, transfer, disk_id):
62f9b7
     """
62f9b7
     Finalize a transfer, making the transfer disk available.
62f9b7
 
62f9b7
-    If finalizing succeeds, transfer's phase will change to FINISHED_SUCCESS
62f9b7
-    and the transer's disk status will change to OK.  On errors, the transfer's
62f9b7
-    phase will change to FINISHED_FAILURE and the disk status will change to
62f9b7
-    ILLEGAL and it will be removed. In both cases the transfer entity will be
62f9b7
-    removed shortly after.
62f9b7
+    If finalizing succeeds, the transfer's disk status will change to OK
62f9b7
+    and transfer's phase will change to FINISHED_SUCCESS. Unfortunately,
62f9b7
+    the disk status is modified before the transfer finishes, and oVirt
62f9b7
+    may still hold a lock on the disk at this point.
62f9b7
 
62f9b7
-    If oVirt fails to finalize the transfer, transfer's phase will change to
62f9b7
-    PAUSED_SYSTEM. In this case the disk's status will change to ILLEGAL and it
62f9b7
-    will not be removed.
62f9b7
+    The only way to make sure that the disk is unlocked, is to wait
62f9b7
+    until the transfer phase switches FINISHED_SUCCESS. Unfortunately
62f9b7
+    oVirt makes this hard to use because the transfer is removed shortly
62f9b7
+    after switching the phase to the final phase. However if the
62f9b7
+    transfer was removed, we can be sure that the disk is not locked,
62f9b7
+    since oVirt releases the locks before removing the transfer.
62f9b7
 
62f9b7
-    For simplicity, we track only disk's status changes.
62f9b7
+    On errors, the transfer's phase will change to FINISHED_FAILURE and
62f9b7
+    the disk status will change to ILLEGAL and it will be removed. Again
62f9b7
+    the transfer will be removed shortly after that.
62f9b7
+
62f9b7
+    If oVirt fails to finalize the transfer, transfer's phase will
62f9b7
+    change to PAUSED_SYSTEM. In this case the disk's status will change
62f9b7
+    to ILLEGAL and it will not be removed.
62f9b7
+
62f9b7
+    oVirt 4.4.7 made waiting for transfer easier by keeping transfers
62f9b7
+    after they complete, but we must support older versions so we have
62f9b7
+    generic code that work with any version.
62f9b7
 
62f9b7
     For more info see:
62f9b7
     - http://ovirt.github.io/ovirt-engine-api-model/4.4/#services/image_transfer
62f9b7
@@ -626,34 +638,62 @@ def finalize_transfer(connection, transfer, disk_id):
62f9b7
 
62f9b7
     transfer_service.finalize()
62f9b7
 
62f9b7
-    disk_service = (connection.system_service()
62f9b7
-                    .disks_service()
62f9b7
-                    .disk_service(disk_id))
62f9b7
-
62f9b7
     while True:
62f9b7
         time.sleep(1)
62f9b7
         try:
62f9b7
-            disk = disk_service.get()
62f9b7
+            transfer = transfer_service.get()
62f9b7
         except sdk.NotFoundError:
62f9b7
-            # Disk verification failed and the system removed the disk.
62f9b7
-            raise RuntimeError(
62f9b7
-                "transfer %s failed: disk %s was removed"
62f9b7
-                % (transfer.id, disk_id))
62f9b7
+            # Transfer was removed (ovirt < 4.4.7). We need to check the
62f9b7
+            # disk status to understand if the transfer was successful.
62f9b7
+            # Due to the way oVirt does locking, we know that the disk
62f9b7
+            # is unlocked at this point so we can check only once.
62f9b7
 
62f9b7
-        if disk.status == types.DiskStatus.ILLEGAL:
62f9b7
-            # Disk verification failed or transfer was paused by the system.
62f9b7
-            raise RuntimeError(
62f9b7
-                "transfer %s failed: disk is ILLEGAL" % transfer.id)
62f9b7
+            debug("transfer %s was removed, checking disk %s status"
62f9b7
+                  % (transfer.id, disk_id))
62f9b7
+
62f9b7
+            disk_service = (connection.system_service()
62f9b7
+                            .disks_service()
62f9b7
+                            .disk_service(disk_id))
62f9b7
 
62f9b7
-        if disk.status == types.DiskStatus.OK:
62f9b7
-            debug("transfer %s finalized in %.3f seconds"
62f9b7
-                  % (transfer.id, time.time() - start))
62f9b7
-            break
62f9b7
+            try:
62f9b7
+                disk = disk_service.get()
62f9b7
+            except sdk.NotFoundError:
62f9b7
+                raise RuntimeError(
62f9b7
+                    "transfer %s failed: disk %s was removed"
62f9b7
+                    % (transfer.id, disk_id))
62f9b7
+
62f9b7
+            debug("disk %s is %s" % (disk_id, disk.status))
62f9b7
+
62f9b7
+            if disk.status == types.DiskStatus.OK:
62f9b7
+                break
62f9b7
 
62f9b7
-        if time.time() > start + timeout:
62f9b7
             raise RuntimeError(
62f9b7
-                "timed out waiting for transfer %s to finalize"
62f9b7
-                % transfer.id)
62f9b7
+                "transfer %s failed: disk is %s" % (transfer.id, disk.status))
62f9b7
+        else:
62f9b7
+            # Transfer exists, check if it reached one of the final
62f9b7
+            # phases, or we timed out.
62f9b7
+
62f9b7
+            debug("transfer %s is %s" % (transfer.id, transfer.phase))
62f9b7
+
62f9b7
+            if transfer.phase == types.ImageTransferPhase.FINISHED_SUCCESS:
62f9b7
+                break
62f9b7
+
62f9b7
+            if transfer.phase == types.ImageTransferPhase.FINISHED_FAILURE:
62f9b7
+                raise RuntimeError(
62f9b7
+                    "transfer %s has failed" % (transfer.id,))
62f9b7
+
62f9b7
+            if transfer.phase == types.ImageTransferPhase.PAUSED_SYSTEM:
62f9b7
+                raise RuntimeError(
62f9b7
+                    "transfer %s was paused by system" % (transfer.id,))
62f9b7
+
62f9b7
+            if time.time() > start + timeout:
62f9b7
+                raise RuntimeError(
62f9b7
+                    "timed out waiting for transfer %s to finalize, "
62f9b7
+                    "transfer is %s"
62f9b7
+                    % (transfer.id, transfer.phase))
62f9b7
+
62f9b7
+    debug("transfer %s finalized in %.3f seconds"
62f9b7
+          % (transfer.id, time.time() - start))
62f9b7
 
62f9b7
 
62f9b7
 def transfer_supports_format():