Blame SOURCES/0042-v2v-rhv-upload-plugin-Fix-waiting-for-finalize.patch

61e9b3
From 419aa23f0c6338b01b644094c3af2c024470e9b2 Mon Sep 17 00:00:00 2001
61e9b3
From: Nir Soffer <nsoffer@redhat.com>
61e9b3
Date: Sat, 10 Jul 2021 02:35:27 +0300
61e9b3
Subject: [PATCH] v2v: rhv-upload-plugin: Fix waiting for finalize
61e9b3
61e9b3
Waiting for image transfer finalize is complex. In the past we tried to
61e9b3
simplify the process by waiting on the disk status, but turns out that
61e9b3
due to the way oVirt lock the disk, this is not reliable.
61e9b3
61e9b3
These is finalize success flow:
61e9b3
61e9b3
1. User asks to finalize the transfer
61e9b3
2. oVirt sets transfer phase to FINALIZING_SUCCESS
61e9b3
3. oVirt finalizes the transfer
61e9b3
4. oVirt sets disk status to OK
61e9b3
5. oVirt unlocks the disk and changes transfer phase to FINISHED_SUCCESS
61e9b3
6. oVirt removes the transfer
61e9b3
61e9b3
In oVirt logs we can see that disk status changes to OK about 3 seconds
61e9b3
before the disk is actually unlocked. This is a very old problem that is
61e9b3
unlikely to be fixed.
61e9b3
61e9b3
The only way to wait for transfer completion is to poll the transfer
61e9b3
phase, but oVirt makes this hard by removing the transfer shortly after
61e9b3
it completes, so we may not be able to get the FINISHED_SUCCESS phase.
61e9b3
If the transfer was removed before we got one of the final phases, we
61e9b3
need to check the disk status to understand the result of transfer.
61e9b3
61e9b3
oVirt 4.4.7 made polling transfer phase easier by keeping the transfer
61e9b3
after completion, but we need to support older versions so we must have
61e9b3
generic code that work with any version.
61e9b3
61e9b3
To make debugging easier, we log the transfer phase during polling. Here
61e9b3
is a typical transfer log when finalizing transfer:
61e9b3
61e9b3
    finalizing transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa
61e9b3
    transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa is finalizing_success
61e9b3
    transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa is finalizing_success
61e9b3
    transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa is finalizing_success
61e9b3
    transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa is finalizing_success
61e9b3
    transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa is finished_success
61e9b3
    transfer 59e545f3-db1f-4a6b-90b1-80ac66572faa finalized in 5.153 seconds
61e9b3
61e9b3
Signed-off-by: Nir Soffer <nsoffer@redhat.com>
61e9b3
Fixes: https://bugzilla.redhat.com/1976024
61e9b3
(cherry picked from commit 79702b28329d15a7485801ed7e915d486fcc0cf4)
61e9b3
---
61e9b3
 v2v/rhv-upload-plugin.py | 98 ++++++++++++++++++++++++++++------------
61e9b3
 1 file changed, 69 insertions(+), 29 deletions(-)
61e9b3
61e9b3
diff --git a/v2v/rhv-upload-plugin.py b/v2v/rhv-upload-plugin.py
61e9b3
index 7cd6dea6..61ade1a8 100644
61e9b3
--- a/v2v/rhv-upload-plugin.py
61e9b3
+++ b/v2v/rhv-upload-plugin.py
61e9b3
@@ -601,17 +601,29 @@ def finalize_transfer(connection, transfer, disk_id):
61e9b3
     """
61e9b3
     Finalize a transfer, making the transfer disk available.
61e9b3
 
61e9b3
-    If finalizing succeeds, transfer's phase will change to FINISHED_SUCCESS
61e9b3
-    and the transer's disk status will change to OK.  On errors, the transfer's
61e9b3
-    phase will change to FINISHED_FAILURE and the disk status will change to
61e9b3
-    ILLEGAL and it will be removed. In both cases the transfer entity will be
61e9b3
-    removed shortly after.
61e9b3
+    If finalizing succeeds, the transfer's disk status will change to OK
61e9b3
+    and transfer's phase will change to FINISHED_SUCCESS. Unfortunately,
61e9b3
+    the disk status is modified before the transfer finishes, and oVirt
61e9b3
+    may still hold a lock on the disk at this point.
61e9b3
 
61e9b3
-    If oVirt fails to finalize the transfer, transfer's phase will change to
61e9b3
-    PAUSED_SYSTEM. In this case the disk's status will change to ILLEGAL and it
61e9b3
-    will not be removed.
61e9b3
+    The only way to make sure that the disk is unlocked, is to wait
61e9b3
+    until the transfer phase switches FINISHED_SUCCESS. Unfortunately
61e9b3
+    oVirt makes this hard to use because the transfer is removed shortly
61e9b3
+    after switching the phase to the final phase. However if the
61e9b3
+    transfer was removed, we can be sure that the disk is not locked,
61e9b3
+    since oVirt releases the locks before removing the transfer.
61e9b3
 
61e9b3
-    For simplicity, we track only disk's status changes.
61e9b3
+    On errors, the transfer's phase will change to FINISHED_FAILURE and
61e9b3
+    the disk status will change to ILLEGAL and it will be removed. Again
61e9b3
+    the transfer will be removed shortly after that.
61e9b3
+
61e9b3
+    If oVirt fails to finalize the transfer, transfer's phase will
61e9b3
+    change to PAUSED_SYSTEM. In this case the disk's status will change
61e9b3
+    to ILLEGAL and it will not be removed.
61e9b3
+
61e9b3
+    oVirt 4.4.7 made waiting for transfer easier by keeping transfers
61e9b3
+    after they complete, but we must support older versions so we have
61e9b3
+    generic code that work with any version.
61e9b3
 
61e9b3
     For more info see:
61e9b3
     - http://ovirt.github.io/ovirt-engine-api-model/4.4/#services/image_transfer
61e9b3
@@ -626,34 +638,62 @@ def finalize_transfer(connection, transfer, disk_id):
61e9b3
 
61e9b3
     transfer_service.finalize()
61e9b3
 
61e9b3
-    disk_service = (connection.system_service()
61e9b3
-                    .disks_service()
61e9b3
-                    .disk_service(disk_id))
61e9b3
-
61e9b3
     while True:
61e9b3
         time.sleep(1)
61e9b3
         try:
61e9b3
-            disk = disk_service.get()
61e9b3
+            transfer = transfer_service.get()
61e9b3
         except sdk.NotFoundError:
61e9b3
-            # Disk verification failed and the system removed the disk.
61e9b3
-            raise RuntimeError(
61e9b3
-                "transfer %s failed: disk %s was removed"
61e9b3
-                % (transfer.id, disk_id))
61e9b3
+            # Transfer was removed (ovirt < 4.4.7). We need to check the
61e9b3
+            # disk status to understand if the transfer was successful.
61e9b3
+            # Due to the way oVirt does locking, we know that the disk
61e9b3
+            # is unlocked at this point so we can check only once.
61e9b3
 
61e9b3
-        if disk.status == types.DiskStatus.ILLEGAL:
61e9b3
-            # Disk verification failed or transfer was paused by the system.
61e9b3
-            raise RuntimeError(
61e9b3
-                "transfer %s failed: disk is ILLEGAL" % transfer.id)
61e9b3
+            debug("transfer %s was removed, checking disk %s status"
61e9b3
+                  % (transfer.id, disk_id))
61e9b3
+
61e9b3
+            disk_service = (connection.system_service()
61e9b3
+                            .disks_service()
61e9b3
+                            .disk_service(disk_id))
61e9b3
 
61e9b3
-        if disk.status == types.DiskStatus.OK:
61e9b3
-            debug("transfer %s finalized in %.3f seconds"
61e9b3
-                  % (transfer.id, time.time() - start))
61e9b3
-            break
61e9b3
+            try:
61e9b3
+                disk = disk_service.get()
61e9b3
+            except sdk.NotFoundError:
61e9b3
+                raise RuntimeError(
61e9b3
+                    "transfer %s failed: disk %s was removed"
61e9b3
+                    % (transfer.id, disk_id))
61e9b3
+
61e9b3
+            debug("disk %s is %s" % (disk_id, disk.status))
61e9b3
+
61e9b3
+            if disk.status == types.DiskStatus.OK:
61e9b3
+                break
61e9b3
 
61e9b3
-        if time.time() > start + timeout:
61e9b3
             raise RuntimeError(
61e9b3
-                "timed out waiting for transfer %s to finalize"
61e9b3
-                % transfer.id)
61e9b3
+                "transfer %s failed: disk is %s" % (transfer.id, disk.status))
61e9b3
+        else:
61e9b3
+            # Transfer exists, check if it reached one of the final
61e9b3
+            # phases, or we timed out.
61e9b3
+
61e9b3
+            debug("transfer %s is %s" % (transfer.id, transfer.phase))
61e9b3
+
61e9b3
+            if transfer.phase == types.ImageTransferPhase.FINISHED_SUCCESS:
61e9b3
+                break
61e9b3
+
61e9b3
+            if transfer.phase == types.ImageTransferPhase.FINISHED_FAILURE:
61e9b3
+                raise RuntimeError(
61e9b3
+                    "transfer %s has failed" % (transfer.id,))
61e9b3
+
61e9b3
+            if transfer.phase == types.ImageTransferPhase.PAUSED_SYSTEM:
61e9b3
+                raise RuntimeError(
61e9b3
+                    "transfer %s was paused by system" % (transfer.id,))
61e9b3
+
61e9b3
+            if time.time() > start + timeout:
61e9b3
+                raise RuntimeError(
61e9b3
+                    "timed out waiting for transfer %s to finalize, "
61e9b3
+                    "transfer is %s"
61e9b3
+                    % (transfer.id, transfer.phase))
61e9b3
+
61e9b3
+    debug("transfer %s finalized in %.3f seconds"
61e9b3
+          % (transfer.id, time.time() - start))
61e9b3
 
61e9b3
 
61e9b3
 def transfer_supports_format():
61e9b3
-- 
61e9b3
2.27.0
61e9b3