Tree - rpms/libnbd - CentOS Git server

rpms / libnbd

Blame SOURCES/0007-copy-Use-preferred-block-size-for-copying.patch

Blob History Raw

		64392a	`From c8626acc63c4ae1c6cf5d1505e0209ac10f44e81 Mon Sep 17 00:00:00 2001`
		64392a	`From: "Richard W.M. Jones" <rjones@redhat.com>`
		64392a	`Date: Tue, 28 Jun 2022 21:58:55 +0100`
		64392a	`Subject: [PATCH] copy: Use preferred block size for copying`
		64392a
		64392a	`You're not supposed to read or write NBD servers at a granularity less`
		64392a	`than the advertised minimum block size. nbdcopy has ignored this`
		64392a	`requirement, and this is usually fine because the NBD servers we care`
		64392a	`about support 512-byte sector granularity, and never advertise sizes /`
		64392a	`extents less granular than sectors (even if it's a bit suboptimal in a`
		64392a	`few cases).`
		64392a
		64392a	`However there is one new case where we do care: When writing to a`
		64392a	`compressed qcow2 file, qemu advertises a minimum and preferred block`
		64392a	`size of 64K, and it really means it. You cannot write blocks smaller`
		64392a	`than this because of the way qcow2 compression is implemented.`
		64392a
		64392a	`This commit attempts to do the least work possible to fix this.`
		64392a
		64392a	`The previous multi-thread-copying loop was driven by the extent map`
		64392a	`received from the source. I have modified the loop so that it`
		64392a	`iterates over request_size blocks. request_size is set from the`
		64392a	`command line (--request-size) but will be adjusted upwards if either`
		64392a	`the source or destination preferred block size is larger. So this`
		64392a	`will always copy blocks which are at least the preferred block size`
		64392a	`(except for the very last block of the disk).`
		64392a
		64392a	`While copying these blocks we consult the source extent map. If it`
		64392a	`contains only zero regions covering the whole block (only_zeroes`
		64392a	`function) then we can skip straight to zeroing the target`
		64392a	`(fill_dst_range_with_zeroes), else we do read + write as before.`
		64392a
		64392a	`I only modified the multi-thread-copying loop, not the synchronous`
		64392a	`loop. That should be updated in the same way later.`
		64392a
		64392a	`One side effect of this change is it always makes larger requests,`
		64392a	`even for regions we know are sparse. This is clear in the`
		64392a	`copy-sparse.sh and copy-sparse-allocated.sh tests which were`
		64392a	`previously driven by the 32K sparse map granularity of the source.`
		64392a	`Without changing these tests, they would make make 256K reads & writes`
		64392a	`(and also read from areas of the disk even though we know they are`
		64392a	`sparse). I adjusted these tests to use --request-size=32768 to force`
		64392a	`the existing behaviour.`
		64392a
		64392a	`Note this doesn't attempt to limit the maximum block size when reading`
		64392a	`or writing. That is for future work.`
		64392a
		64392a	`This is a partial fix for https://bugzilla.redhat.com/2047660.`
		64392a	`Further changes will be required in virt-v2v.`
		64392a
		64392a	`Link: https://lists.gnu.org/archive/html/qemu-block/2022-01/threads.html#00729`
		64392a	`Link: https://bugzilla.redhat.com/show_bug.cgi?id=2047660`
		64392a	`(cherry picked from commit 4058fe1ff03fb41156b67302ba1006b9d06b0218)`
		64392a	`---`
		64392a	`TODO \| 4 +-`
		64392a	`copy/Makefile.am \| 6 +-`
		64392a	`copy/copy-file-to-qcow2-compressed.sh \| 64 +++++++++++`
		64392a	`copy/copy-sparse-allocated.sh \| 4 +-`
		64392a	`copy/copy-sparse.sh \| 7 +-`
		64392a	`copy/main.c \| 13 +++`
		64392a	`copy/multi-thread-copying.c \| 149 +++++++++++++++++++-------`
		64392a	`copy/nbdcopy.pod \| 5 +-`
		64392a	`8 files changed, 202 insertions(+), 50 deletions(-)`
		64392a	`create mode 100755 copy/copy-file-to-qcow2-compressed.sh`
		64392a
		64392a	`diff --git a/TODO b/TODO`
		64392a	`index 7c9c15e..bc38d70 100644`
		64392a	`--- a/TODO`
		64392a	`+++ b/TODO`
		64392a	`@@ -28,7 +28,9 @@ Performance: Chart it over various buffer sizes and threads, as that`
		64392a	`Examine other fuzzers: https://gitlab.com/akihe/radamsa`
		64392a
		64392a	`nbdcopy:`
		64392a	`- - Minimum/preferred/maximum block size.`
		64392a	`+ - Enforce maximum block size.`
		64392a	`+ - Synchronous loop should be adjusted to take into account`
		64392a	`+ the NBD preferred block size, as was done for multi-thread loop.`
		64392a	`- Benchmark.`
		64392a	`- Better page cache usage, see nbdkit-file-plugin options`
		64392a	`fadvise=sequential cache=none.`
		64392a	`diff --git a/copy/Makefile.am b/copy/Makefile.am`
		64392a	`index e729f86..25f75c5 100644`
		64392a	`--- a/copy/Makefile.am`
		64392a	`+++ b/copy/Makefile.am`
		64392a	`@@ -23,6 +23,7 @@ EXTRA_DIST = \`
		64392a	`copy-file-to-nbd.sh \`
		64392a	`copy-file-to-null.sh \`
		64392a	`copy-file-to-qcow2.sh \`
		64392a	`+ copy-file-to-qcow2-compressed.sh \`
		64392a	`copy-nbd-to-block.sh \`
		64392a	`copy-nbd-to-file.sh \`
		64392a	`copy-nbd-to-hexdump.sh \`
		64392a	`@@ -142,7 +143,10 @@ TESTS += \`
		64392a	`$(NULL)`
		64392a
		64392a	`if HAVE_QEMU_NBD`
		64392a	`-TESTS += copy-file-to-qcow2.sh`
		64392a	`+TESTS += \`
		64392a	`+ copy-file-to-qcow2.sh \`
		64392a	`+ copy-file-to-qcow2-compressed.sh \`
		64392a	`+ $(NULL)`
		64392a	`endif`
		64392a
		64392a	`if HAVE_GNUTLS`
		64392a	`diff --git a/copy/copy-file-to-qcow2-compressed.sh b/copy/copy-file-to-qcow2-compressed.sh`
		64392a	`new file mode 100755`
		64392a	`index 0000000..dfe4fa5`
		64392a	`--- /dev/null`
		64392a	`+++ b/copy/copy-file-to-qcow2-compressed.sh`
		64392a	`@@ -0,0 +1,64 @@`
		64392a	`+#!/usr/bin/env bash`
		64392a	`+# nbd client library in userspace`
		64392a	`+# Copyright (C) 2020-2022 Red Hat Inc.`
		64392a	`+#`
		64392a	`+# This library is free software; you can redistribute it and/or`
		64392a	`+# modify it under the terms of the GNU Lesser General Public`
		64392a	`+# License as published by the Free Software Foundation; either`
		64392a	`+# version 2 of the License, or (at your option) any later version.`
		64392a	`+#`
		64392a	`+# This library is distributed in the hope that it will be useful,`
		64392a	`+# but WITHOUT ANY WARRANTY; without even the implied warranty of`
		64392a	`+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU`
		64392a	`+# Lesser General Public License for more details.`
		64392a	`+#`
		64392a	`+# You should have received a copy of the GNU Lesser General Public`
		64392a	`+# License along with this library; if not, write to the Free Software`
		64392a	`+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA`
		64392a	`+`
		64392a	`+. ../tests/functions.sh`
		64392a	`+`
		64392a	`+set -e`
		64392a	`+set -x`
		64392a	`+`
		64392a	`+requires $QEMU_NBD --version`
		64392a	`+requires nbdkit --exit-with-parent --version`
		64392a	`+requires nbdkit sparse-random --dump-plugin`
		64392a	`+requires qemu-img --version`
		64392a	`+requires stat --version`
		64392a	`+`
		64392a	`+file1=copy-file-to-qcow2-compressed.file1`
		64392a	`+file2=copy-file-to-qcow2-compressed.file2`
		64392a	`+rm -f $file1 $file2`
		64392a	`+cleanup_fn rm -f $file1 $file2`
		64392a	`+`
		64392a	`+size=1G`
		64392a	`+seed=$RANDOM`
		64392a	`+`
		64392a	`+# Create a compressed qcow2 file1.`
		64392a	`+#`
		64392a	`+# sparse-random files should compress easily because by default each`
		64392a	`+# block uses repeated bytes.`
		64392a	`+qemu-img create -f qcow2 $file1 $size`
		64392a	`+nbdcopy -- [ nbdkit --exit-with-parent sparse-random $size seed=$seed ] \`
		64392a	`+ [ $QEMU_NBD --image-opts driver=compress,file.driver=qcow2,file.file.driver=file,file.file.filename=$file1 ]`
		64392a	`+`
		64392a	`+ls -l $file1`
		64392a	`+`
		64392a	`+# Create an uncompressed qcow2 file2 with the same data.`
		64392a	`+qemu-img create -f qcow2 $file2 $size`
		64392a	`+nbdcopy -- [ nbdkit --exit-with-parent sparse-random $size seed=$seed ] \`
		64392a	`+ [ $QEMU_NBD --image-opts driver=qcow2,file.driver=file,file.filename=$file2 ]`
		64392a	`+`
		64392a	`+ls -l $file2`
		64392a	`+`
		64392a	`+# file1 < file2 (shows the compression is having some effect).`
		64392a	`+size1="$( stat -c %s $file1 )"`
		64392a	`+size2="$( stat -c %s $file2 )"`
		64392a	`+if [ $size1 -ge $size2 ]; then`
		64392a	`+ echo "$0: qcow2 compression did not make the file smaller"`
		64392a	`+ exit 1`
		64392a	`+fi`
		64392a	`+`
		64392a	`+# Logical content of the files should be identical.`
		64392a	`+qemu-img compare -f qcow2 $file1 -F qcow2 $file2`
		64392a	`diff --git a/copy/copy-sparse-allocated.sh b/copy/copy-sparse-allocated.sh`
		64392a	`index 203c3b9..465e347 100755`
		64392a	`--- a/copy/copy-sparse-allocated.sh`
		64392a	`+++ b/copy/copy-sparse-allocated.sh`
		64392a	`@@ -17,8 +17,6 @@`
		64392a	`# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA`
		64392a
		64392a	`# Adapted from copy-sparse.sh.`
		64392a	`-#`
		64392a	`-# This test depends on the nbdkit default sparse block size (32K).`
		64392a
		64392a	`. ../tests/functions.sh`
		64392a
		64392a	`@@ -33,7 +31,7 @@ requires nbdkit eval --version`
		64392a	`out=copy-sparse-allocated.out`
		64392a	`cleanup_fn rm -f $out`
		64392a
		64392a	`-$VG nbdcopy --allocated -- \`
		64392a	`+$VG nbdcopy --allocated --request-size=32768 -- \`
		64392a	`[ nbdkit --exit-with-parent data data='`
		64392a	`1`
		64392a	`@1073741823 1`
		64392a	`diff --git a/copy/copy-sparse.sh b/copy/copy-sparse.sh`
		64392a	`index 1a6da86..7912a21 100755`
		64392a	`--- a/copy/copy-sparse.sh`
		64392a	`+++ b/copy/copy-sparse.sh`
		64392a	`@@ -16,8 +16,6 @@`
		64392a	`# License along with this library; if not, write to the Free Software`
		64392a	`# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA`
		64392a
		64392a	`-# This test depends on the nbdkit default sparse block size (32K).`
		64392a	`-`
		64392a	`. ../tests/functions.sh`
		64392a
		64392a	`set -e`
		64392a	`@@ -34,8 +32,9 @@ cleanup_fn rm -f $out`
		64392a	`# Copy from a sparse data disk to an nbdkit-eval-plugin instance which`
		64392a	`# is logging everything. This allows us to see exactly what nbdcopy`
		64392a	`# is writing, to ensure it is writing and zeroing the target as`
		64392a	`-# expected.`
		64392a	`-$VG nbdcopy -S 0 -- \`
		64392a	`+# expected. Force request size to match nbdkit default sparse`
		64392a	`+# allocator block size (32K).`
		64392a	`+$VG nbdcopy -S 0 --request-size=32768 -- \`
		64392a	`[ nbdkit --exit-with-parent data data='`
		64392a	`1`
		64392a	`@1073741823 1`
		64392a	`diff --git a/copy/main.c b/copy/main.c`
		64392a	`index 19ec384..0e27db8 100644`
		64392a	`--- a/copy/main.c`
		64392a	`+++ b/copy/main.c`
		64392a	`@@ -40,6 +40,7 @@`
		64392a
		64392a	`#include "ispowerof2.h"`
		64392a	`#include "human-size.h"`
		64392a	`+#include "minmax.h"`
		64392a	`#include "version.h"`
		64392a	`#include "nbdcopy.h"`
		64392a
		64392a	`@@ -379,10 +380,22 @@ main (int argc, char *argv[])`
		64392a	`if (threads < connections)`
		64392a	`connections = threads;`
		64392a
		64392a	`+ /* request_size must always be at least as large as the preferred`
		64392a	`+ * size of source & destination.`
		64392a	`+ */`
		64392a	`+ request_size = MAX (request_size, src->preferred);`
		64392a	`+ request_size = MAX (request_size, dst->preferred);`
		64392a	`+`
		64392a	`/* Adapt queue to size to request size if needed. */`
		64392a	`if (request_size > queue_size)`
		64392a	`queue_size = request_size;`
		64392a
		64392a	`+ /* Sparse size (if using) must not be smaller than the destination`
		64392a	`+ * preferred size, otherwise we end up creating too small requests.`
		64392a	`+ */`
		64392a	`+ if (sparse_size > 0 && sparse_size < dst->preferred)`
		64392a	`+ sparse_size = dst->preferred;`
		64392a	`+`
		64392a	`/* Truncate the destination to the same size as the source. Only`
		64392a	`* has an effect on regular files.`
		64392a	`*/`
		64392a	`diff --git a/copy/multi-thread-copying.c b/copy/multi-thread-copying.c`
		64392a	`index 06cdb8e..9267545 100644`
		64392a	`--- a/copy/multi-thread-copying.c`
		64392a	`+++ b/copy/multi-thread-copying.c`
		64392a	`@@ -166,6 +166,62 @@ decrease_queue_size (struct worker *worker, size_t len)`
		64392a	`worker->queue_size -= len;`
		64392a	`}`
		64392a
		64392a	`+/* Using the extents map 'exts', check if the region`
		64392a	`+ * [offset..offset+len-1] intersects only with zero extents.`
		64392a	`+ *`
		64392a	`+ * The invariant for '*i' is always an extent which starts before or`
		64392a	`+ * equal to the current offset.`
		64392a	`+ */`
		64392a	`+static bool`
		64392a	`+only_zeroes (const extent_list exts, size_t *i,`
		64392a	`+ uint64_t offset, unsigned len)`
		64392a	`+{`
		64392a	`+ size_t j;`
		64392a	`+`
		64392a	`+ /* Invariant. */`
		64392a	`+ assert (*i < exts.len);`
		64392a	`+ assert (exts.ptr[*i].offset <= offset);`
		64392a	`+`
		64392a	`+ /* Update the invariant. Search for the last possible extent in the`
		64392a	`+ * list which is <= offset.`
		64392a	`+ */`
		64392a	`+ for (j = *i + 1; j < exts.len; ++j) {`
		64392a	`+ if (exts.ptr[j].offset <= offset)`
		64392a	`+ *i = j;`
		64392a	`+ else`
		64392a	`+ break;`
		64392a	`+ }`
		64392a	`+`
		64392a	`+ /* Check invariant again. */`
		64392a	`+ assert (*i < exts.len);`
		64392a	`+ assert (exts.ptr[*i].offset <= offset);`
		64392a	`+`
		64392a	`+ /* If *i is not the last extent, then the next extent starts`
		64392a	`+ * strictly beyond our current offset.`
		64392a	`+ */`
		64392a	`+ assert (i == exts.len - 1 \|\| exts.ptr[i + 1].offset > offset);`
		64392a	`+`
		64392a	`+ /* Search forward, look for any non-zero extents overlapping the region. */`
		64392a	`+ for (j = *i; j < exts.len; ++j) {`
		64392a	`+ uint64_t start, end;`
		64392a	`+`
		64392a	`+ /* [start..end-1] is the current extent. */`
		64392a	`+ start = exts.ptr[j].offset;`
		64392a	`+ end = exts.ptr[j].offset + exts.ptr[j].length;`
		64392a	`+`
		64392a	`+ assert (end > offset);`
		64392a	`+`
		64392a	`+ if (start >= offset + len)`
		64392a	`+ break;`
		64392a	`+`
		64392a	`+ /* Non-zero extent covering this region => test failed. */`
		64392a	`+ if (!exts.ptr[j].zero)`
		64392a	`+ return false;`
		64392a	`+ }`
		64392a	`+`
		64392a	`+ return true;`
		64392a	`+}`
		64392a	`+`
		64392a	`/* There are 'threads' worker threads, each copying work ranges from`
		64392a	`* src to dst until there are no more work ranges.`
		64392a	`*/`
		64392a	`@@ -177,7 +233,10 @@ worker_thread (void *wp)`
		64392a	`extent_list exts = empty_vector;`
		64392a
		64392a	`while (get_next_offset (&offset, &count)) {`
		64392a	`- size_t i;`
		64392a	`+ struct command *command;`
		64392a	`+ size_t extent_index;`
		64392a	`+ bool is_zeroing = false;`
		64392a	`+ uint64_t zeroing_start = 0; /* initialized to avoid bogus GCC warning */`
		64392a
		64392a	`assert (0 < count && count <= THREAD_WORK_SIZE);`
		64392a	`if (extents)`
		64392a	`@@ -185,52 +244,64 @@ worker_thread (void *wp)`
		64392a	`else`
		64392a	`default_get_extents (src, w->index, offset, count, &exts);`
		64392a
		64392a	`- for (i = 0; i < exts.len; ++i) {`
		64392a	`- struct command *command;`
		64392a	`- size_t len;`
		64392a	`+ extent_index = 0; // index into extents array used to optimize only_zeroes`
		64392a	`+ while (count) {`
		64392a	`+ const size_t len = MIN (count, request_size);`
		64392a
		64392a	`- if (exts.ptr[i].zero) {`
		64392a	`+ if (only_zeroes (exts, &extent_index, offset, len)) {`
		64392a	`/* The source is zero so we can proceed directly to skipping,`
		64392a	`- * fast zeroing, or writing zeroes at the destination.`
		64392a	`+ * fast zeroing, or writing zeroes at the destination. Defer`
		64392a	`+ * zeroing so we can send it as a single large command.`
		64392a	`*/`
		64392a	`- command = create_command (exts.ptr[i].offset, exts.ptr[i].length,`
		64392a	`- true, w);`
		64392a	`- fill_dst_range_with_zeroes (command);`
		64392a	`+ if (!is_zeroing) {`
		64392a	`+ is_zeroing = true;`
		64392a	`+ zeroing_start = offset;`
		64392a	`+ }`
		64392a	`}`
		64392a	`-`
		64392a	`else /* data */ {`
		64392a	`- /* As the extent might be larger than permitted for a single`
		64392a	`- * command, we may have to split this into multiple read`
		64392a	`- * requests.`
		64392a	`- */`
		64392a	`- while (exts.ptr[i].length > 0) {`
		64392a	`- len = exts.ptr[i].length;`
		64392a	`- if (len > request_size)`
		64392a	`- len = request_size;`
		64392a	`-`
		64392a	`- command = create_command (exts.ptr[i].offset, len,`
		64392a	`- false, w);`
		64392a	`-`
		64392a	`- wait_for_request_slots (w);`
		64392a	`-`
		64392a	`- /* NOTE: Must increase the queue size after waiting. */`
		64392a	`- increase_queue_size (w, len);`
		64392a	`-`
		64392a	`- /* Begin the asynch read operation. */`
		64392a	`- src->ops->asynch_read (src, command,`
		64392a	`- (nbd_completion_callback) {`
		64392a	`- .callback = finished_read,`
		64392a	`- .user_data = command,`
		64392a	`- });`
		64392a	`-`
		64392a	`- exts.ptr[i].offset += len;`
		64392a	`- exts.ptr[i].length -= len;`
		64392a	`+ /* If we were in the middle of deferred zeroing, do it now. */`
		64392a	`+ if (is_zeroing) {`
		64392a	`+ /* Note that offset-zeroing_start can never exceed`
		64392a	`+ * THREAD_WORK_SIZE, so there is no danger of overflowing`
		64392a	`+ * size_t.`
		64392a	`+ */`
		64392a	`+ command = create_command (zeroing_start, offset-zeroing_start,`
		64392a	`+ true, w);`
		64392a	`+ fill_dst_range_with_zeroes (command);`
		64392a	`+ is_zeroing = false;`
		64392a	`}`
		64392a	`+`
		64392a	`+ /* Issue the asynchronous read command. */`
		64392a	`+ command = create_command (offset, len, false, w);`
		64392a	`+`
		64392a	`+ wait_for_request_slots (w);`
		64392a	`+`
		64392a	`+ /* NOTE: Must increase the queue size after waiting. */`
		64392a	`+ increase_queue_size (w, len);`
		64392a	`+`
		64392a	`+ /* Begin the asynch read operation. */`
		64392a	`+ src->ops->asynch_read (src, command,`
		64392a	`+ (nbd_completion_callback) {`
		64392a	`+ .callback = finished_read,`
		64392a	`+ .user_data = command,`
		64392a	`+ });`
		64392a	`}`
		64392a
		64392a	`- offset += count;`
		64392a	`- count = 0;`
		64392a	`- } /* for extents */`
		64392a	`+ offset += len;`
		64392a	`+ count -= len;`
		64392a	`+ } /* while (count) */`
		64392a	`+`
		64392a	`+ /* If we were in the middle of deferred zeroing, do it now. */`
		64392a	`+ if (is_zeroing) {`
		64392a	`+ /* Note that offset-zeroing_start can never exceed`
		64392a	`+ * THREAD_WORK_SIZE, so there is no danger of overflowing`
		64392a	`+ * size_t.`
		64392a	`+ */`
		64392a	`+ command = create_command (zeroing_start, offset - zeroing_start,`
		64392a	`+ true, w);`
		64392a	`+ fill_dst_range_with_zeroes (command);`
		64392a	`+ is_zeroing = false;`
		64392a	`+ }`
		64392a	`}`
		64392a
		64392a	`/* Wait for in flight NBD requests to finish. */`
		64392a	`diff --git a/copy/nbdcopy.pod b/copy/nbdcopy.pod`
		64392a	`index fd10f7c..f06d112 100644`
		64392a	`--- a/copy/nbdcopy.pod`
		64392a	`+++ b/copy/nbdcopy.pod`
		64392a	`@@ -182,8 +182,9 @@ Set the maximum number of requests in flight per NBD connection.`
		64392a	`=item B<--sparse=>N`
		64392a
		64392a	`Detect all zero blocks of size N (bytes) and make them sparse on the`
		64392a	`-output. You can also turn off sparse detection using S<I<-S 0>>.`
		64392a	`-The default is 4096 bytes.`
		64392a	`+output. You can also turn off sparse detection using S<I<-S 0>>. The`
		64392a	`+default is 4096 bytes, or the destination preferred block size,`
		64392a	`+whichever is larger.`
		64392a
		64392a	`=item B<--synchronous>`
		64392a
		64392a	`--`
		64392a	`2.31.1`
		64392a

rpms / libnbd

Source Code

Blame SOURCES/0007-copy-Use-preferred-block-size-for-copying.patch