Blame SOURCES/kvm-blockjob-Make-block_job_pause_all-keep-a-reference-t.patch

9bac43
From 03560a7d6e57ca2ba3198d5051acfdd1e345f9a4 Mon Sep 17 00:00:00 2001
9bac43
From: Jeffrey Cody <jcody@redhat.com>
9bac43
Date: Tue, 5 Dec 2017 16:03:17 +0100
9bac43
Subject: [PATCH 12/21] blockjob: Make block_job_pause_all() keep a reference
9bac43
 to the jobs
9bac43
9bac43
RH-Author: Jeffrey Cody <jcody@redhat.com>
9bac43
Message-id: <e6cd1cf608e4720141f9e3b0d62a5a9721203325.1511985875.git.jcody@redhat.com>
9bac43
Patchwork-id: 78161
9bac43
O-Subject: [RHV7.5 qemu-kvm-rhev PATCH 12/11] blockjob: Make block_job_pause_all() keep a reference to the jobs
9bac43
Bugzilla: 1506531
9bac43
RH-Acked-by: John Snow <jsnow@redhat.com>
9bac43
RH-Acked-by: Paolo Bonzini <pbonzini@redhat.com>
9bac43
RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
9bac43
9bac43
From: Alberto Garcia <berto@igalia.com>
9bac43
9bac43
Starting from commit 40840e419be31e6a32e6ea24511c74b389d5e0e4 we are
9bac43
pausing all block jobs during bdrv_reopen_multiple() to prevent any of
9bac43
them from finishing and removing nodes from the graph while they are
9bac43
being reopened.
9bac43
9bac43
It turns out that pausing a block job doesn't necessarily prevent it
9bac43
from finishing: a paused block job can still run its exit function
9bac43
from the main loop and call block_job_completed(). The mirror block
9bac43
job in particular always goes to the main loop while it is paused (by
9bac43
virtue of the bdrv_drained_begin() call in mirror_run()).
9bac43
9bac43
Destroying a paused block job during bdrv_reopen_multiple() has two
9bac43
consequences:
9bac43
9bac43
   1) The references to the nodes involved in the job are released,
9bac43
      possibly destroying some of them. If those nodes were in the
9bac43
      reopen queue this would trigger the problem originally described
9bac43
      in commit 40840e419be, crashing QEMU.
9bac43
9bac43
   2) At the end of bdrv_reopen_multiple(), bdrv_drain_all_end() would
9bac43
      not be doing all necessary bdrv_parent_drained_end() calls.
9bac43
9bac43
I can reproduce problem 1) easily with iotest 030 by increasing
9bac43
STREAM_BUFFER_SIZE from 512KB to 8MB in block/stream.c, or by tweaking
9bac43
the iotest like in this example:
9bac43
9bac43
   https://lists.gnu.org/archive/html/qemu-block/2017-11/msg00934.html
9bac43
9bac43
This patch keeps an additional reference to all block jobs between
9bac43
block_job_pause_all() and block_job_resume_all(), guaranteeing that
9bac43
they are kept alive.
9bac43
9bac43
Signed-off-by: Alberto Garcia <berto@igalia.com>
9bac43
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
9bac43
(cherry picked from commit 3d5d319e1221082974711af1d09d82f0755c1698)
9bac43
Signed-off-by: Jeff Cody <jcody@redhat.com>
9bac43
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
9bac43
---
9bac43
 blockjob.c | 7 +++++--
9bac43
 1 file changed, 5 insertions(+), 2 deletions(-)
9bac43
9bac43
diff --git a/blockjob.c b/blockjob.c
9bac43
index 84f526a..63aecce 100644
9bac43
--- a/blockjob.c
9bac43
+++ b/blockjob.c
9bac43
@@ -730,6 +730,7 @@ void block_job_pause_all(void)
9bac43
         AioContext *aio_context = blk_get_aio_context(job->blk);
9bac43
 
9bac43
         aio_context_acquire(aio_context);
9bac43
+        block_job_ref(job);
9bac43
         block_job_pause(job);
9bac43
         aio_context_release(aio_context);
9bac43
     }
9bac43
@@ -808,12 +809,14 @@ void coroutine_fn block_job_pause_point(BlockJob *job)
9bac43
 
9bac43
 void block_job_resume_all(void)
9bac43
 {
9bac43
-    BlockJob *job = NULL;
9bac43
-    while ((job = block_job_next(job))) {
9bac43
+    BlockJob *job, *next;
9bac43
+
9bac43
+    QLIST_FOREACH_SAFE(job, &block_jobs, job_list, next) {
9bac43
         AioContext *aio_context = blk_get_aio_context(job->blk);
9bac43
 
9bac43
         aio_context_acquire(aio_context);
9bac43
         block_job_resume(job);
9bac43
+        block_job_unref(job);
9bac43
         aio_context_release(aio_context);
9bac43
     }
9bac43
 }
9bac43
-- 
9bac43
1.8.3.1
9bac43