Tree - rpms/qemu-kvm - CentOS Git server

rpms / qemu-kvm

Blame SOURCES/kvm-block-backup-fix-fleecing-scheme-use-serialized-writ.patch

Blob History Raw

		ae23c9	`From d9a55a5815a040032f85c20020b118dda54bba43 Mon Sep 17 00:00:00 2001`
		ae23c9	`From: John Snow <jsnow@redhat.com>`
		ae23c9	`Date: Wed, 18 Jul 2018 22:54:58 +0200`
		ae23c9	`Subject: [PATCH 240/268] block/backup: fix fleecing scheme: use serialized`
		ae23c9	`writes`
		ae23c9
		ae23c9	`RH-Author: John Snow <jsnow@redhat.com>`
		ae23c9	`Message-id: <20180718225511.14878-23-jsnow@redhat.com>`
		ae23c9	`Patchwork-id: 81396`
		ae23c9	`O-Subject: [RHEL-7.6 qemu-kvm-rhev PATCH 22/35] block/backup: fix fleecing scheme: use serialized writes`
		ae23c9	`Bugzilla: 1207657`
		ae23c9	`RH-Acked-by: Eric Blake <eblake@redhat.com>`
		ae23c9	`RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>`
		ae23c9	`RH-Acked-by: Fam Zheng <famz@redhat.com>`
		ae23c9
		ae23c9	`From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>`
		ae23c9
		ae23c9	`Fleecing scheme works as follows: we want a kind of temporary snapshot`
		ae23c9	`of active drive A. We create temporary image B, with B->backing = A.`
		ae23c9	`Then we start backup(sync=none) from A to B. From this point, B reads`
		ae23c9	`as point-in-time snapshot of A (A continues to be active drive,`
		ae23c9	`accepting guest IO).`
		ae23c9
		ae23c9	`This scheme needs some additional synchronization between reads from B`
		ae23c9	`and backup COW operations, otherwise, the following situation is`
		ae23c9	`theoretically possible:`
		ae23c9
		ae23c9	`(assume B is qcow2, client is NBD client, reading from B)`
		ae23c9
		ae23c9	`1. client starts reading and take qcow2 mutex in qcow2_co_preadv, and`
		ae23c9	`goes up to l2 table loading (assume cache miss)`
		ae23c9
		ae23c9	`2) guest write => backup COW => qcow2 write =>`
		ae23c9	`try to take qcow2 mutex => waiting`
		ae23c9
		ae23c9	`3. l2 table loaded, we see that cluster is UNALLOCATED, go to`
		ae23c9	`"case QCOW2_CLUSTER_UNALLOCATED" and unlock mutex before`
		ae23c9	`bdrv_co_preadv(bs->backing, ...)`
		ae23c9
		ae23c9	`4) aha, mutex unlocked, backup COW continues, and we finally finish`
		ae23c9	`guest write and change cluster in our active disk A`
		ae23c9
		ae23c9	`5. actually, do bdrv_co_preadv(bs->backing, ...) and read`
		ae23c9	`_new updated_ data.`
		ae23c9
		ae23c9	`To avoid this, let's make backup writes serializing, to not intersect`
		ae23c9	`with reads from B.`
		ae23c9
		ae23c9	`Note: we expand range of handled cases from (sync=none and`
		ae23c9	`B->backing = A) to just (A in backing chain of B), to finally allow`
		ae23c9	`safe reading from B during backup for all cases when A in backing chain`
		ae23c9	`of B, i.e. B formally looks like point-in-time snapshot of A.`
		ae23c9
		ae23c9	`Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>`
		ae23c9	`Reviewed-by: Fam Zheng <famz@redhat.com>`
		ae23c9	`Signed-off-by: Kevin Wolf <kwolf@redhat.com>`
		ae23c9	`(cherry picked from commit f8d59dfb40bbc6f5aeea57c8aac1e68c1d2454ee)`
		ae23c9	`Signed-off-by: John Snow <jsnow@redhat.com>`
		ae23c9	`Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>`
		ae23c9	`---`
		ae23c9	`block/backup.c \| 20 ++++++++++++++------`
		ae23c9	`1 file changed, 14 insertions(+), 6 deletions(-)`
		ae23c9
		ae23c9	`diff --git a/block/backup.c b/block/backup.c`
		ae23c9	`index 369155a..4ba1a6a 100644`
		ae23c9	`--- a/block/backup.c`
		ae23c9	`+++ b/block/backup.c`
		ae23c9	`@@ -47,6 +47,8 @@ typedef struct BackupBlockJob {`
		ae23c9	`HBitmap *copy_bitmap;`
		ae23c9	`bool use_copy_range;`
		ae23c9	`int64_t copy_range_size;`
		ae23c9	`+`
		ae23c9	`+ bool serialize_target_writes;`
		ae23c9	`} BackupBlockJob;`
		ae23c9
		ae23c9	`static const BlockJobDriver backup_job_driver;`
		ae23c9	`@@ -102,6 +104,8 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,`
		ae23c9	`QEMUIOVector qiov;`
		ae23c9	`BlockBackend *blk = job->common.blk;`
		ae23c9	`int nbytes;`
		ae23c9	`+ int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;`
		ae23c9	`+ int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;`
		ae23c9
		ae23c9	`hbitmap_reset(job->copy_bitmap, start / job->cluster_size, 1);`
		ae23c9	`nbytes = MIN(job->cluster_size, job->len - start);`
		ae23c9	`@@ -112,8 +116,7 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,`
		ae23c9	`iov.iov_len = nbytes;`
		ae23c9	`qemu_iovec_init_external(&qiov, &iov, 1);`
		ae23c9
		ae23c9	`- ret = blk_co_preadv(blk, start, qiov.size, &qiov,`
		ae23c9	`- is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);`
		ae23c9	`+ ret = blk_co_preadv(blk, start, qiov.size, &qiov, read_flags);`
		ae23c9	`if (ret < 0) {`
		ae23c9	`trace_backup_do_cow_read_fail(job, start, ret);`
		ae23c9	`if (error_is_read) {`
		ae23c9	`@@ -124,11 +127,11 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,`
		ae23c9
		ae23c9	`if (qemu_iovec_is_zero(&qiov)) {`
		ae23c9	`ret = blk_co_pwrite_zeroes(job->target, start,`
		ae23c9	`- qiov.size, BDRV_REQ_MAY_UNMAP);`
		ae23c9	`+ qiov.size, write_flags \| BDRV_REQ_MAY_UNMAP);`
		ae23c9	`} else {`
		ae23c9	`ret = blk_co_pwritev(job->target, start,`
		ae23c9	`- qiov.size, &qiov,`
		ae23c9	`- job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);`
		ae23c9	`+ qiov.size, &qiov, write_flags \|`
		ae23c9	`+ (job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0));`
		ae23c9	`}`
		ae23c9	`if (ret < 0) {`
		ae23c9	`trace_backup_do_cow_write_fail(job, start, ret);`
		ae23c9	`@@ -156,6 +159,8 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,`
		ae23c9	`int nr_clusters;`
		ae23c9	`BlockBackend *blk = job->common.blk;`
		ae23c9	`int nbytes;`
		ae23c9	`+ int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;`
		ae23c9	`+ int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;`
		ae23c9
		ae23c9	`assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size));`
		ae23c9	`nbytes = MIN(job->copy_range_size, end - start);`
		ae23c9	`@@ -163,7 +168,7 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,`
		ae23c9	`hbitmap_reset(job->copy_bitmap, start / job->cluster_size,`
		ae23c9	`nr_clusters);`
		ae23c9	`ret = blk_co_copy_range(blk, start, job->target, start, nbytes,`
		ae23c9	`- is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0, 0);`
		ae23c9	`+ read_flags, write_flags);`
		ae23c9	`if (ret < 0) {`
		ae23c9	`trace_backup_do_cow_copy_range_fail(job, start, ret);`
		ae23c9	`hbitmap_set(job->copy_bitmap, start / job->cluster_size,`
		ae23c9	`@@ -701,6 +706,9 @@ BlockJob backup_job_create(const char job_id, BlockDriverState *bs,`
		ae23c9	`sync_bitmap : NULL;`
		ae23c9	`job->compress = compress;`
		ae23c9
		ae23c9	`+ /* Detect image-fleecing (and similar) schemes */`
		ae23c9	`+ job->serialize_target_writes = bdrv_chain_contains(target, bs);`
		ae23c9	`+`
		ae23c9	`/* If there is no backing file on the target, we cannot rely on COW if our`
		ae23c9	`* backup cluster size is smaller than the target cluster size. Even for`
		ae23c9	`* targets with a backing file, try to avoid COW if possible. */`
		ae23c9	`--`
		ae23c9	`1.8.3.1`
		ae23c9

rpms / qemu-kvm

Source Code

Blame SOURCES/kvm-block-backup-fix-fleecing-scheme-use-serialized-writ.patch