Blame SOURCES/kvm-blockjob-Make-block_job_pause_all-keep-a-reference-t.patch

4a2fec
From 03560a7d6e57ca2ba3198d5051acfdd1e345f9a4 Mon Sep 17 00:00:00 2001
4a2fec
From: Jeffrey Cody <jcody@redhat.com>
4a2fec
Date: Tue, 5 Dec 2017 16:03:17 +0100
4a2fec
Subject: [PATCH 12/21] blockjob: Make block_job_pause_all() keep a reference
4a2fec
 to the jobs
4a2fec
4a2fec
RH-Author: Jeffrey Cody <jcody@redhat.com>
4a2fec
Message-id: <e6cd1cf608e4720141f9e3b0d62a5a9721203325.1511985875.git.jcody@redhat.com>
4a2fec
Patchwork-id: 78161
4a2fec
O-Subject: [RHV7.5 qemu-kvm-rhev PATCH 12/11] blockjob: Make block_job_pause_all() keep a reference to the jobs
4a2fec
Bugzilla: 1506531
4a2fec
RH-Acked-by: John Snow <jsnow@redhat.com>
4a2fec
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
4a2fec
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
4a2fec
4a2fec
From: Alberto Garcia <berto@igalia.com>
4a2fec
4a2fec
Starting from commit 40840e419be31e6a32e6ea24511c74b389d5e0e4 we are
4a2fec
pausing all block jobs during bdrv_reopen_multiple() to prevent any of
4a2fec
them from finishing and removing nodes from the graph while they are
4a2fec
being reopened.
4a2fec
4a2fec
It turns out that pausing a block job doesn't necessarily prevent it
4a2fec
from finishing: a paused block job can still run its exit function
4a2fec
from the main loop and call block_job_completed(). The mirror block
4a2fec
job in particular always goes to the main loop while it is paused (by
4a2fec
virtue of the bdrv_drained_begin() call in mirror_run()).
4a2fec
4a2fec
Destroying a paused block job during bdrv_reopen_multiple() has two
4a2fec
consequences:
4a2fec
4a2fec
   1) The references to the nodes involved in the job are released,
4a2fec
      possibly destroying some of them. If those nodes were in the
4a2fec
      reopen queue this would trigger the problem originally described
4a2fec
      in commit 40840e419be, crashing QEMU.
4a2fec
4a2fec
   2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would
4a2fec
      not be doing all necessary bdrv_parent_drained_end() calls.
4a2fec
4a2fec
I can reproduce problem 1) easily with iotest 030 by increasing
4a2fec
STREAM_BUFFER_SIZE from 512KB to 8MB in block/stream.c, or by tweaking
4a2fec
the iotest like in this example:
4a2fec
4a2fec
   https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html
4a2fec
4a2fec
This patch keeps an additional reference to all block jobs between
4a2fec
block_job_pause_all() and block_job_resume_all(), guaranteeing that
4a2fec
they are kept alive.
4a2fec
4a2fec
Signed-off-by: Alberto Garcia <berto@igalia.com>
4a2fec
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
4a2fec
(cherry picked from commit 3d5d319e1221082974711af1d09d82f0755c1698)
4a2fec
Signed-off-by: Jeff Cody <jcody@redhat.com>
4a2fec
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
4a2fec
---
4a2fec
 blockjob.c | 7 +++++--
4a2fec
 1 file changed, 5 insertions(+), 2 deletions(-)
4a2fec
4a2fec
diff --git a/blockjob.c b/blockjob.c
4a2fec
index 84f526a..63aecce 100644
4a2fec
--- a/blockjob.c
4a2fec
+++ b/blockjob.c
4a2fec
@@ -730,6 +730,7 @@ void block_job_pause_all(void)
4a2fec
         AioContext *aio_context = blk_get_aio_context(job->blk);
4a2fec
 
4a2fec
         aio_context_acquire(aio_context);
4a2fec
+        block_job_ref(job);
4a2fec
         block_job_pause(job);
4a2fec
         aio_context_release(aio_context);
4a2fec
     }
4a2fec
@@ -808,12 +809,14 @@ void coroutine_fn block_job_pause_point(BlockJob *job)
4a2fec
 
4a2fec
 void block_job_resume_all(void)
4a2fec
 {
4a2fec
-    BlockJob *job = NULL;
4a2fec
-    while ((job = block_job_next(job))) {
4a2fec
+    BlockJob *job, *next;
4a2fec
+
4a2fec
+    QLIST_FOREACH_SAFE(job, &block_jobs, job_list, next) {
4a2fec
         AioContext *aio_context = blk_get_aio_context(job->blk);
4a2fec
 
4a2fec
         aio_context_acquire(aio_context);
4a2fec
         block_job_resume(job);
4a2fec
+        block_job_unref(job);
4a2fec
         aio_context_release(aio_context);
4a2fec
     }
4a2fec
 }
4a2fec
-- 
4a2fec
1.8.3.1
4a2fec