Blob Blame History Raw
From 550b4bcc2758e8df86570bc1afb29de34a1694fe Mon Sep 17 00:00:00 2001
From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Mon, 13 Nov 2017 16:29:39 +0100
Subject: [PATCH 03/30] util/async: use atomic_mb_set in qemu_bh_cancel

RH-Author: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: <20171113162939.5486-2-stefanha@redhat.com>
Patchwork-id: 77665
O-Subject: [RHV7.5 qemu-kvm-rhev PATCH 1/1] util/async: use atomic_mb_set in qemu_bh_cancel
Bugzilla: 1508886
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
RH-Acked-by: John Snow <jsnow@redhat.com>
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>

From: Sergio Lopez <slp@redhat.com>

Commit b7a745d added a qemu_bh_cancel call to the completion function
as an optimization to prevent it from unnecessarily rescheduling itself.

This completion function is scheduled from worker_thread, after setting
the state of a ThreadPoolElement to THREAD_DONE.

This was considered to be safe, as the completion function restarts the
loop just after the call to qemu_bh_cancel. But, as this loop lacks a HW
memory barrier, the read of req->state may actually happen _before_ the
call, seeing it still as THREAD_QUEUED, and ending the completion
function without having processed a pending TPE linked at pool->head:

         worker thread             |            I/O thread
------------------------------------------------------------------------
                                   | speculatively read req->state
req->state = THREAD_DONE;          |
qemu_bh_schedule(p->completion_bh) |
  bh->scheduled = 1;               |
                                   | qemu_bh_cancel(p->completion_bh)
                                   |   bh->scheduled = 0;
                                   | if (req->state == THREAD_DONE)
                                   |   // sees THREAD_QUEUED

The source of the misunderstanding was that qemu_bh_cancel is now being
used by the _consumer_ rather than the producer, and therefore now needs
to have acquire semantics just like e.g. aio_bh_poll.

In some situations, if there are no other independent requests in the
same aio context that could eventually trigger the scheduling of the
completion function, the omitted TPE and all operations pending on it
will get stuck forever.

[Added Sergio's updated wording about the HW memory barrier.
--Stefan]

Signed-off-by: Sergio Lopez <slp@redhat.com>
Message-id: 20171108063447.2842-1-slp@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

(cherry picked from commit ef6dada8b44e1e7c4bec5c1115903af9af415b50)

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
---
 util/async.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/util/async.c b/util/async.c
index 355af73..0e1bd87 100644
--- a/util/async.c
+++ b/util/async.c
@@ -174,7 +174,7 @@ void qemu_bh_schedule(QEMUBH *bh)
  */
 void qemu_bh_cancel(QEMUBH *bh)
 {
-    bh->scheduled = 0;
+    atomic_mb_set(&bh->scheduled, 0);
 }
 
 /* This func is async.The bottom half will do the delete action at the finial
-- 
1.8.3.1