Blob Blame History Raw
From 987ddcd2ad7546212d3afed52b56f27a664624d6 Mon Sep 17 00:00:00 2001
From: Nir Soffer <nsoffer@redhat.com>
Date: Thu, 21 Jan 2021 03:40:00 +0200
Subject: [PATCH] v2v: rhv-upload-plugin: Defer imageio connection

When using vddk input with certain vmware version, qemu-img may spend
lot of time getting source image extents. If getting image extents takes
more than 60 seconds, imageio server closes the idle connection, and the
transfer will fail on the first write with:

nbdkit: python[1]: error: /var/tmp/rhvupload.0OKqWA/rhv-upload-plugin.py: pwrite: error:
Traceback (most recent call last):
   File "/var/tmp/rhvupload.0OKqWA/rhv-upload-plugin.py", line 94, in wrapper
    return func(h, *args)
   File "/var/tmp/rhvupload.0OKqWA/rhv-upload-plugin.py", line 230, in pwrite
    r = http.getresponse()
   File "/usr/lib64/python3.6/http/client.py", line 1361, in getresponse
    response.begin()
   File "/usr/lib64/python3.6/http/client.py", line 311, in begin
    version, status, reason = self._read_status()
   File "/usr/lib64/python3.6/http/client.py", line 280, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
 http.client.RemoteDisconnected: Remote end closed connection without response

This happens only when not using unix socket, for example when running
on non-ovirt host, or ovirt host from another data center, or when using
-oo rhv_direct=false

When using unix socket, we close the initial HTTPSConnection, and
created a new UnixHTTPConnection. This connection is not connected to
the server yet. When qemu-img tries to access the server, the connection
is connected automatically.

Fix the issue by closing the initial connection used to get server
options and initialize the handle, and storing a closed connection in
the handle.

Here is the flow with this change:

1. Create HTTPSConnection for getting server options
2. Close the connection[1]
3. If using unix socket, create UnixHTTPConnection.
4. Store the connection in the handle.
5. When qemu-img try to write/zero, the connection is reconnects
   automatically to imageio server[2]

Tested by adding a 300 milliseconds delay in nbdkit file plugin. Due to
the way qemu-img works, this cause more than 2 minutes delay after
open() but before the first pwrite(). Without this change, the import
fails consistently when using rhv_direct=false.

[1] https://github.com/python/cpython/blob/34df10a9a16b38d54421eeeaf73ec89828563be7/Lib/http/client.py#L958
[2] https://github.com/python/cpython/blob/34df10a9a16b38d54421eeeaf73ec89828563be7/Lib/http/client.py#L972

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
(cherry picked from commit 1d5fc257765c444644e5bfc6525e86ff201755f0)
---
 v2v/rhv-upload-plugin.py | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/v2v/rhv-upload-plugin.py b/v2v/rhv-upload-plugin.py
index 471102da..7cd6dea6 100644
--- a/v2v/rhv-upload-plugin.py
+++ b/v2v/rhv-upload-plugin.py
@@ -117,6 +117,15 @@ def open(readonly):
         destination_url = parse_transfer_url(transfer)
         http = create_http(destination_url)
         options = get_options(http, destination_url)
+
+        # Close the initial connection to imageio server. When qemu-img will
+        # try to access the server, HTTPConnection will reconnect
+        # automatically. If we keep this connection idle and qemu-img is too
+        # slow getting image extents, imageio server may close the connection,
+        # and the import will fail on the first write.
+        # See https://bugzilla.redhat.com/1916176.
+        http.close()
+
         http = optimize_http(http, host, options)
     except:
         cancel_transfer(connection, transfer)
-- 
2.27.0