Blob Blame History Raw
From 48136328150bc587178091b5766bda382158cb6c Mon Sep 17 00:00:00 2001
From: Nir Soffer <nsoffer@redhat.com>
Date: Sat, 23 Oct 2021 00:08:31 +0300
Subject: [PATCH] lib/poll.c: Retry poll after EINTR
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

I see a rare random failure when calling BlockStatus via Go binding:

    block_status: nbd_block_status: poll: Interrupted system call

I could not reproduce this with "nbdinfo --map", even after modifying it
to call nbd_block_status() for every 128 MiB.

Fixing this in nbd_unlock_poll() avoids this issue in the entire
library, when we wait for command completion. This seems more useful
that fixing it in all libnbd clients.

Tested using a go client listing all extents in an image, calling
BlockStatus for every 128m with fedora 34 qcow2 image. Without this fix,
this was always failing.

$ hyperfine -r1000 --show-output "./client nbd+unix://?socket=/tmp/nbd.sock > /dev/null"
Benchmark 1: ./client nbd+unix://?socket=/tmp/nbd.sock > /dev/null
  Time (mean ± σ):      31.6 ms ±   3.1 ms    [User: 8.8 ms, System: 7.2 ms]
  Range (min … max):    26.1 ms …  52.3 ms    1000 runs

Signed-off-by: Nir Soffer <nsoffer@redhat.com>
(cherry picked from commit b3440853cdeca0e44ad9c526e71faaa6cf344bfc)
---
 lib/poll.c | 7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/lib/poll.c b/lib/poll.c
index edfcc59..df01d94 100644
--- a/lib/poll.c
+++ b/lib/poll.c
@@ -57,8 +57,11 @@ nbd_unlocked_poll (struct nbd_handle *h, int timeout)
    * would allow other threads to close file descriptors which we have
    * passed to poll.
    */
-  r = poll (fds, 1, timeout);
-  debug (h, "poll end: r=%d revents=%x", r, fds[0].revents);
+  do {
+    r = poll (fds, 1, timeout);
+    debug (h, "poll end: r=%d revents=%x", r, fds[0].revents);
+  } while (r == -1 && errno == EINTR);
+
   if (r == -1) {
     set_error (errno, "poll");
     return -1;
-- 
2.31.1