Blame SOURCES/kvm-util-async-use-atomic_mb_set-in-qemu_bh_cancel.patch

9bac43
From 550b4bcc2758e8df86570bc1afb29de34a1694fe Mon Sep 17 00:00:00 2001
9bac43
From: Stefan Hajnoczi <stefanha@redhat.com>
9bac43
Date: Mon, 13 Nov 2017 16:29:39 +0100
9bac43
Subject: [PATCH 03/30] util/async: use atomic_mb_set in qemu_bh_cancel
9bac43
9bac43
RH-Author: Stefan Hajnoczi <stefanha@redhat.com>
9bac43
Message-id: <20171113162939.5486-2-stefanha@redhat.com>
9bac43
Patchwork-id: 77665
9bac43
O-Subject: [RHV7.5 qemu-kvm-rhev PATCH 1/1] util/async: use atomic_mb_set in qemu_bh_cancel
9bac43
Bugzilla: 1508886
9bac43
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
9bac43
RH-Acked-by: John Snow <jsnow@redhat.com>
9bac43
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>
9bac43
9bac43
From: Sergio Lopez <slp@redhat.com>
9bac43
9bac43
Commit b7a745d added a qemu_bh_cancel call to the completion function
9bac43
as an optimization to prevent it from unnecessarily rescheduling itself.
9bac43
9bac43
This completion function is scheduled from worker_thread, after setting
9bac43
the state of a ThreadPoolElement to THREAD_DONE.
9bac43
9bac43
This was considered to be safe, as the completion function restarts the
9bac43
loop just after the call to qemu_bh_cancel. But, as this loop lacks a HW
9bac43
memory barrier, the read of req->state may actually happen _before_ the
9bac43
call, seeing it still as THREAD_QUEUED, and ending the completion
9bac43
function without having processed a pending TPE linked at pool->head:
9bac43
9bac43
         worker thread             |            I/O thread
9bac43
------------------------------------------------------------------------
9bac43
                                   | speculatively read req->state
9bac43
req->state = THREAD_DONE;          |
9bac43
qemu_bh_schedule(p->completion_bh) |
9bac43
  bh->scheduled = 1;               |
9bac43
                                   | qemu_bh_cancel(p->completion_bh)
9bac43
                                   |   bh->scheduled = 0;
9bac43
                                   | if (req->state == THREAD_DONE)
9bac43
                                   |   // sees THREAD_QUEUED
9bac43
9bac43
The source of the misunderstanding was that qemu_bh_cancel is now being
9bac43
used by the _consumer_ rather than the producer, and therefore now needs
9bac43
to have acquire semantics just like e.g. aio_bh_poll.
9bac43
9bac43
In some situations, if there are no other independent requests in the
9bac43
same aio context that could eventually trigger the scheduling of the
9bac43
completion function, the omitted TPE and all operations pending on it
9bac43
will get stuck forever.
9bac43
9bac43
[Added Sergio's updated wording about the HW memory barrier.
9bac43
--Stefan]
9bac43
9bac43
Signed-off-by: Sergio Lopez <slp@redhat.com>
9bac43
Message-id: 20171108063447.2842-1-slp@redhat.com
9bac43
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9bac43
9bac43
(cherry picked from commit ef6dada8b44e1e7c4bec5c1115903af9af415b50)
9bac43
9bac43
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
9bac43
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
9bac43
---
9bac43
 util/async.c | 2 +-
9bac43
 1 file changed, 1 insertion(+), 1 deletion(-)
9bac43
9bac43
diff --git a/util/async.c b/util/async.c
9bac43
index 355af73..0e1bd87 100644
9bac43
--- a/util/async.c
9bac43
+++ b/util/async.c
9bac43
@@ -174,7 +174,7 @@ void qemu_bh_schedule(QEMUBH *bh)
9bac43
  */
9bac43
 void qemu_bh_cancel(QEMUBH *bh)
9bac43
 {
9bac43
-    bh->scheduled = 0;
9bac43
+    atomic_mb_set(&bh->scheduled, 0);
9bac43
 }
9bac43
 
9bac43
 /* This func is async.The bottom half will do the delete action at the finial
9bac43
-- 
9bac43
1.8.3.1
9bac43