Blame SOURCES/kvm-util-async-use-qemu_aio_coroutine_enter-in-co_schedu.patch

ae23c9
From f29b1e17713739baf416b64eeee9549f07717ea8 Mon Sep 17 00:00:00 2001
ae23c9
From: Kevin Wolf <kwolf@redhat.com>
ae23c9
Date: Wed, 10 Oct 2018 20:21:53 +0100
ae23c9
Subject: [PATCH 27/49] util/async: use qemu_aio_coroutine_enter in
ae23c9
 co_schedule_bh_cb
ae23c9
ae23c9
RH-Author: Kevin Wolf <kwolf@redhat.com>
ae23c9
Message-id: <20181010202213.7372-15-kwolf@redhat.com>
ae23c9
Patchwork-id: 82604
ae23c9
O-Subject: [RHEL-8 qemu-kvm PATCH 24/44] util/async: use qemu_aio_coroutine_enter in co_schedule_bh_cb
ae23c9
Bugzilla: 1637976
ae23c9
RH-Acked-by: Max Reitz <mreitz@redhat.com>
ae23c9
RH-Acked-by: John Snow <jsnow@redhat.com>
ae23c9
RH-Acked-by: Thomas Huth <thuth@redhat.com>
ae23c9
ae23c9
From: Sergio Lopez <slp@redhat.com>
ae23c9
ae23c9
AIO Coroutines shouldn't by managed by an AioContext different than the
ae23c9
one assigned when they are created. aio_co_enter avoids entering a
ae23c9
coroutine from a different AioContext, calling aio_co_schedule instead.
ae23c9
ae23c9
Scheduled coroutines are then entered by co_schedule_bh_cb using
ae23c9
qemu_coroutine_enter, which just calls qemu_aio_coroutine_enter with the
ae23c9
current AioContext obtained with qemu_get_current_aio_context.
ae23c9
Eventually, co->ctx will be set to the AioContext passed as an argument
ae23c9
to qemu_aio_coroutine_enter.
ae23c9
ae23c9
This means that, if an IO Thread's AioConext is being processed by the
ae23c9
Main Thread (due to aio_poll being called with a BDS AioContext, as it
ae23c9
happens in AIO_WAIT_WHILE among other places), the AioContext from some
ae23c9
coroutines may be wrongly replaced with the one from the Main Thread.
ae23c9
ae23c9
This is the root cause behind some crashes, mainly triggered by the
ae23c9
drain code at block/io.c. The most common are these abort and failed
ae23c9
assertion:
ae23c9
ae23c9
util/async.c:aio_co_schedule
ae23c9
456     if (scheduled) {
ae23c9
457         fprintf(stderr,
ae23c9
458                 "%s: Co-routine was already scheduled in '%s'\n",
ae23c9
459                 __func__, scheduled);
ae23c9
460         abort();
ae23c9
461     }
ae23c9
ae23c9
util/qemu-coroutine-lock.c:
ae23c9
286     assert(mutex->holder == self);
ae23c9
ae23c9
But it's also known to cause random errors at different locations, and
ae23c9
even SIGSEGV with broken coroutine backtraces.
ae23c9
ae23c9
By using qemu_aio_coroutine_enter directly in co_schedule_bh_cb, we can
ae23c9
pass the correct AioContext as an argument, making sure co->ctx is not
ae23c9
wrongly altered.
ae23c9
ae23c9
Signed-off-by: Sergio Lopez <slp@redhat.com>
ae23c9
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
ae23c9
(cherry picked from commit 6808ae0417131f8dbe7b051256dff7a16634dc1d)
ae23c9
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
ae23c9
Signed-off-by: Danilo C. L. de Paula <ddepaula@redhat.com>
ae23c9
---
ae23c9
 util/async.c | 2 +-
ae23c9
 1 file changed, 1 insertion(+), 1 deletion(-)
ae23c9
ae23c9
diff --git a/util/async.c b/util/async.c
ae23c9
index 4dd9d95..5693191 100644
ae23c9
--- a/util/async.c
ae23c9
+++ b/util/async.c
ae23c9
@@ -391,7 +391,7 @@ static void co_schedule_bh_cb(void *opaque)
ae23c9
 
ae23c9
         /* Protected by write barrier in qemu_aio_coroutine_enter */
ae23c9
         atomic_set(&co->scheduled, NULL);
ae23c9
-        qemu_coroutine_enter(co);
ae23c9
+        qemu_aio_coroutine_enter(ctx, co);
ae23c9
         aio_context_release(ctx);
ae23c9
     }
ae23c9
 }
ae23c9
-- 
ae23c9
1.8.3.1
ae23c9