Tree - rpms/qemu-kvm - CentOS Git server

yeahuh / rpms / qemu-kvm

Forked from rpms/qemu-kvm 2 years ago

Source
Stats

Blame SOURCES/kvm-block-backup-fix-fleecing-scheme-use-serialized-writ.patch

Blob History Raw

		26ba25	`From d9a55a5815a040032f85c20020b118dda54bba43 Mon Sep 17 00:00:00 2001`
		26ba25	`From: John Snow <jsnow@redhat.com>`
		26ba25	`Date: Wed, 18 Jul 2018 22:54:58 +0200`
		26ba25	`Subject: [PATCH 240/268] block/backup: fix fleecing scheme: use serialized`
		26ba25	`writes`
		26ba25
		26ba25	`RH-Author: John Snow <jsnow@redhat.com>`
		26ba25	`Message-id: <20180718225511.14878-23-jsnow@redhat.com>`
		26ba25	`Patchwork-id: 81396`
		26ba25	`O-Subject: [RHEL-7.6 qemu-kvm-rhev PATCH 22/35] block/backup: fix fleecing scheme: use serialized writes`
		26ba25	`Bugzilla: 1207657`
		26ba25	`RH-Acked-by: Eric Blake <eblake@redhat.com>`
		26ba25	`RH-Acked-by: Stefan Hajnoczi <stefanha@redhat.com>`
		26ba25	`RH-Acked-by: Fam Zheng <famz@redhat.com>`
		26ba25
		26ba25	`From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>`
		26ba25
		26ba25	`Fleecing scheme works as follows: we want a kind of temporary snapshot`
		26ba25	`of active drive A. We create temporary image B, with B->backing = A.`
		26ba25	`Then we start backup(sync=none) from A to B. From this point, B reads`
		26ba25	`as point-in-time snapshot of A (A continues to be active drive,`
		26ba25	`accepting guest IO).`
		26ba25
		26ba25	`This scheme needs some additional synchronization between reads from B`
		26ba25	`and backup COW operations, otherwise, the following situation is`
		26ba25	`theoretically possible:`
		26ba25
		26ba25	`(assume B is qcow2, client is NBD client, reading from B)`
		26ba25
		26ba25	`1. client starts reading and take qcow2 mutex in qcow2_co_preadv, and`
		26ba25	`goes up to l2 table loading (assume cache miss)`
		26ba25
		26ba25	`2) guest write => backup COW => qcow2 write =>`
		26ba25	`try to take qcow2 mutex => waiting`
		26ba25
		26ba25	`3. l2 table loaded, we see that cluster is UNALLOCATED, go to`
		26ba25	`"case QCOW2_CLUSTER_UNALLOCATED" and unlock mutex before`
		26ba25	`bdrv_co_preadv(bs->backing, ...)`
		26ba25
		26ba25	`4) aha, mutex unlocked, backup COW continues, and we finally finish`
		26ba25	`guest write and change cluster in our active disk A`
		26ba25
		26ba25	`5. actually, do bdrv_co_preadv(bs->backing, ...) and read`
		26ba25	`_new updated_ data.`
		26ba25
		26ba25	`To avoid this, let's make backup writes serializing, to not intersect`
		26ba25	`with reads from B.`
		26ba25
		26ba25	`Note: we expand range of handled cases from (sync=none and`
		26ba25	`B->backing = A) to just (A in backing chain of B), to finally allow`
		26ba25	`safe reading from B during backup for all cases when A in backing chain`
		26ba25	`of B, i.e. B formally looks like point-in-time snapshot of A.`
		26ba25
		26ba25	`Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>`
		26ba25	`Reviewed-by: Fam Zheng <famz@redhat.com>`
		26ba25	`Signed-off-by: Kevin Wolf <kwolf@redhat.com>`
		26ba25	`(cherry picked from commit f8d59dfb40bbc6f5aeea57c8aac1e68c1d2454ee)`
		26ba25	`Signed-off-by: John Snow <jsnow@redhat.com>`
		26ba25	`Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>`
		26ba25	`---`
		26ba25	`block/backup.c \| 20 ++++++++++++++------`
		26ba25	`1 file changed, 14 insertions(+), 6 deletions(-)`
		26ba25
		26ba25	`diff --git a/block/backup.c b/block/backup.c`
		26ba25	`index 369155a..4ba1a6a 100644`
		26ba25	`--- a/block/backup.c`
		26ba25	`+++ b/block/backup.c`
		26ba25	`@@ -47,6 +47,8 @@ typedef struct BackupBlockJob {`
		26ba25	`HBitmap *copy_bitmap;`
		26ba25	`bool use_copy_range;`
		26ba25	`int64_t copy_range_size;`
		26ba25	`+`
		26ba25	`+ bool serialize_target_writes;`
		26ba25	`} BackupBlockJob;`
		26ba25
		26ba25	`static const BlockJobDriver backup_job_driver;`
		26ba25	`@@ -102,6 +104,8 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,`
		26ba25	`QEMUIOVector qiov;`
		26ba25	`BlockBackend *blk = job->common.blk;`
		26ba25	`int nbytes;`
		26ba25	`+ int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;`
		26ba25	`+ int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;`
		26ba25
		26ba25	`hbitmap_reset(job->copy_bitmap, start / job->cluster_size, 1);`
		26ba25	`nbytes = MIN(job->cluster_size, job->len - start);`
		26ba25	`@@ -112,8 +116,7 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,`
		26ba25	`iov.iov_len = nbytes;`
		26ba25	`qemu_iovec_init_external(&qiov, &iov, 1);`
		26ba25
		26ba25	`- ret = blk_co_preadv(blk, start, qiov.size, &qiov,`
		26ba25	`- is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0);`
		26ba25	`+ ret = blk_co_preadv(blk, start, qiov.size, &qiov, read_flags);`
		26ba25	`if (ret < 0) {`
		26ba25	`trace_backup_do_cow_read_fail(job, start, ret);`
		26ba25	`if (error_is_read) {`
		26ba25	`@@ -124,11 +127,11 @@ static int coroutine_fn backup_cow_with_bounce_buffer(BackupBlockJob *job,`
		26ba25
		26ba25	`if (qemu_iovec_is_zero(&qiov)) {`
		26ba25	`ret = blk_co_pwrite_zeroes(job->target, start,`
		26ba25	`- qiov.size, BDRV_REQ_MAY_UNMAP);`
		26ba25	`+ qiov.size, write_flags \| BDRV_REQ_MAY_UNMAP);`
		26ba25	`} else {`
		26ba25	`ret = blk_co_pwritev(job->target, start,`
		26ba25	`- qiov.size, &qiov,`
		26ba25	`- job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0);`
		26ba25	`+ qiov.size, &qiov, write_flags \|`
		26ba25	`+ (job->compress ? BDRV_REQ_WRITE_COMPRESSED : 0));`
		26ba25	`}`
		26ba25	`if (ret < 0) {`
		26ba25	`trace_backup_do_cow_write_fail(job, start, ret);`
		26ba25	`@@ -156,6 +159,8 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,`
		26ba25	`int nr_clusters;`
		26ba25	`BlockBackend *blk = job->common.blk;`
		26ba25	`int nbytes;`
		26ba25	`+ int read_flags = is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0;`
		26ba25	`+ int write_flags = job->serialize_target_writes ? BDRV_REQ_SERIALISING : 0;`
		26ba25
		26ba25	`assert(QEMU_IS_ALIGNED(job->copy_range_size, job->cluster_size));`
		26ba25	`nbytes = MIN(job->copy_range_size, end - start);`
		26ba25	`@@ -163,7 +168,7 @@ static int coroutine_fn backup_cow_with_offload(BackupBlockJob *job,`
		26ba25	`hbitmap_reset(job->copy_bitmap, start / job->cluster_size,`
		26ba25	`nr_clusters);`
		26ba25	`ret = blk_co_copy_range(blk, start, job->target, start, nbytes,`
		26ba25	`- is_write_notifier ? BDRV_REQ_NO_SERIALISING : 0, 0);`
		26ba25	`+ read_flags, write_flags);`
		26ba25	`if (ret < 0) {`
		26ba25	`trace_backup_do_cow_copy_range_fail(job, start, ret);`
		26ba25	`hbitmap_set(job->copy_bitmap, start / job->cluster_size,`
		26ba25	`@@ -701,6 +706,9 @@ BlockJob backup_job_create(const char job_id, BlockDriverState *bs,`
		26ba25	`sync_bitmap : NULL;`
		26ba25	`job->compress = compress;`
		26ba25
		26ba25	`+ /* Detect image-fleecing (and similar) schemes */`
		26ba25	`+ job->serialize_target_writes = bdrv_chain_contains(target, bs);`
		26ba25	`+`
		26ba25	`/* If there is no backing file on the target, we cannot rely on COW if our`
		26ba25	`* backup cluster size is smaller than the target cluster size. Even for`
		26ba25	`* targets with a backing file, try to avoid COW if possible. */`
		26ba25	`--`
		26ba25	`1.8.3.1`
		26ba25

yeahuh / rpms / qemu-kvm

Source Code

Blame SOURCES/kvm-block-backup-fix-fleecing-scheme-use-serialized-writ.patch