Blame SOURCES/kvm-util-async-use-qemu_aio_coroutine_enter-in-co_schedu.patch

357786
From b700e58ee749512368c40a5f84b01c11d24903b9 Mon Sep 17 00:00:00 2001
357786
From: Kevin Wolf <kwolf@redhat.com>
357786
Date: Fri, 14 Sep 2018 10:55:22 +0200
357786
Subject: [PATCH 31/49] util/async: use qemu_aio_coroutine_enter in
357786
 co_schedule_bh_cb
357786
357786
RH-Author: Kevin Wolf <kwolf@redhat.com>
357786
Message-id: <20180914105540.18077-25-kwolf@redhat.com>
357786
Patchwork-id: 82176
357786
O-Subject: [RHV-7.6 qemu-kvm-rhev PATCH 24/42] util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb
357786
Bugzilla: 1601212
357786
RH-Acked-by: John Snow <jsnow@redhat.com>
357786
RH-Acked-by: Max Reitz <mreitz@redhat.com>
357786
RH-Acked-by: Fam Zheng <famz@redhat.com>
357786
357786
From: Sergio Lopez <slp@redhat.com>
357786
357786
AIO Coroutines shouldn't by managed by an AioContext different than the
357786
one assigned when they are created. aio_co_enter avoids entering a
357786
coroutine from a different AioContext, calling aio_co_schedule instead.
357786
357786
Scheduled coroutines are then entered by co_schedule_bh_cb using
357786
qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the
357786
current AioContext obtained with qemu_get_current_aio_context.
357786
Eventually, co->ctx will be set to the AioContext passed as an argument
357786
to qemu_aio_coroutine_enter.
357786
357786
This means that, if an IO Thread's AioConext is being processed by the
357786
Main Thread (due to aio_poll being called with a BDS AioContext, as it
357786
happens in AIO_WAIT_WHILE among other places), the AioContext from some
357786
coroutines may be wrongly replaced with the one from the Main Thread.
357786
357786
This is the root cause behind some crashes, mainly triggered by the
357786
drain code at block/io.c. The most common are these abort and failed
357786
assertion:
357786
357786
util/async.c:aio_co_schedule
357786
456     if (scheduled) {
357786
457         fprintf(stderr,
357786
458                 "%s: Co-routine was already scheduled in '%s'\n",
357786
459                 __func__, scheduled);
357786
460         abort();
357786
461     }
357786
357786
util/qemu-coroutine-lock.c:
357786
286     assert(mutex->holder == self);
357786
357786
But it's also known to cause random errors at different locations, and
357786
even SIGSEGV with broken coroutine backtraces.
357786
357786
By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can
357786
pass the correct AioContext as an argument, making sure co->ctx is not
357786
wrongly altered.
357786
357786
Signed-off-by: Sergio Lopez <slp@redhat.com>
357786
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
357786
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
357786
---
357786
 util/async.c | 2 +-
357786
 1 file changed, 1 insertion(+), 1 deletion(-)
357786
357786
diff --git a/util/async.c b/util/async.c
357786
index 4dd9d95..5693191 100644
357786
--- a/util/async.c
357786
+++ b/util/async.c
357786
@@ -391,7 +391,7 @@ static void co_schedule_bh_cb(void *opaque)
357786
 
357786
         /* Protected by write barrier in qemu_aio_coroutine_enter */
357786
         atomic_set(&co->scheduled, NULL);
357786
-        qemu_coroutine_enter(co);
357786
+        qemu_aio_coroutine_enter(ctx, co);
357786
         aio_context_release(ctx);
357786
     }
357786
 }
357786
-- 
357786
1.8.3.1
357786