render / rpms / libvirt

Forked from rpms/libvirt 10 months ago
Clone
a41c76
From 02714666a525ea4dd8756f66fae28163fb685d05 Mon Sep 17 00:00:00 2001
a41c76
Message-Id: <02714666a525ea4dd8756f66fae28163fb685d05@dist-git>
a41c76
From: Peter Krempa <pkrempa@redhat.com>
a41c76
Date: Tue, 23 Jun 2020 12:24:06 +0200
a41c76
Subject: [PATCH] kbase: Add document outlining internals of incremental backup
a41c76
 in qemu
a41c76
MIME-Version: 1.0
a41c76
Content-Type: text/plain; charset=UTF-8
a41c76
Content-Transfer-Encoding: 8bit
a41c76
a41c76
Outline the basics and how to integrate with externally created
a41c76
overlays. Other topics will continue later.
a41c76
a41c76
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
a41c76
Reviewed-by: Eric Blake <eblake@redhat.com>
a41c76
(cherry picked from commit da5e5a1e154836abe83077cf477c104b8f98b1d2)
a41c76
https://bugzilla.redhat.com/show_bug.cgi?id=1804593
a41c76
a41c76
Conflicts: docs/kbase.html.in: real time kvm article not backported
a41c76
Message-Id: <e0fea1e270856b642c34827eb3af1c0b01afd510.1592906423.git.pkrempa@redhat.com>
a41c76
Reviewed-by: Ján Tomko <jtomko@redhat.com>
a41c76
---
a41c76
 docs/kbase.html.in                        |   3 +
a41c76
 docs/kbase/incrementalbackupinternals.rst | 217 ++++++++++++++++++++++
a41c76
 2 files changed, 220 insertions(+)
a41c76
 create mode 100644 docs/kbase/incrementalbackupinternals.rst
a41c76
a41c76
diff --git a/docs/kbase.html.in b/docs/kbase.html.in
a41c76
index 7d6caf3cb1..f2975960f6 100644
a41c76
--- a/docs/kbase.html.in
a41c76
+++ b/docs/kbase.html.in
a41c76
@@ -32,6 +32,9 @@
a41c76
 
a41c76
         
Virtio-FS
a41c76
         
Share a filesystem between the guest and the host
a41c76
+
a41c76
+        
Incremental backup internals
a41c76
+        
Incremental backup implementation details relevant for users
a41c76
       
a41c76
     
a41c76
 
a41c76
diff --git a/docs/kbase/incrementalbackupinternals.rst b/docs/kbase/incrementalbackupinternals.rst
a41c76
new file mode 100644
a41c76
index 0000000000..0c4b4f7486
a41c76
--- /dev/null
a41c76
+++ b/docs/kbase/incrementalbackupinternals.rst
a41c76
@@ -0,0 +1,217 @@
a41c76
+================================================
a41c76
+Internals of incremental backup handling in qemu
a41c76
+================================================
a41c76
+
a41c76
+.. contents::
a41c76
+
a41c76
+Libvirt's implementation of incremental backups in the ``qemu`` driver uses
a41c76
+qemu's ``block-dirty-bitmaps`` under the hood to track the guest visible disk
a41c76
+state changes corresponding to the points in time described by a libvirt
a41c76
+checkpoint.
a41c76
+
a41c76
+There are some semantica implications with how libvirt creates and manages the
a41c76
+bitmaps which de-facto become API as they are written into the disk images, and
a41c76
+this document will try to summarize them.
a41c76
+
a41c76
+Glossary
a41c76
+========
a41c76
+
a41c76
+See the knowledge base article on
a41c76
+`domain state capture <https://libvirt.org/kbase/domainstatecapture.html>`_ for
a41c76
+a deeper explanation of some of the concepts.
a41c76
+
a41c76
+Checkpoint
a41c76
+
a41c76
+    A libvirt object which represents a named point in time of the life of the
a41c76
+    vm where libvirt tracks writes the VM has done, thereby allowing a backup of
a41c76
+    only the blocks which changed. Note that state of the VM memory is _not_
a41c76
+    captured.
a41c76
+
a41c76
+    A checkpoint can be created either explicitly via the corresponding API
a41c76
+    (although this isn't very useful on its own), or simultaneously with an
a41c76
+    incremental or full backup of the VM using the ``virDomainBackupBegin`` API
a41c76
+    which allows a next backup to only copy the differences.
a41c76
+
a41c76
+Backup
a41c76
+
a41c76
+    A copy of either all blocks of selected disks (full backup) or blocks changed
a41c76
+    since a checkpoint (incremental backup) at the time the backup job was
a41c76
+    started. (Blocks modified while the backup job is running are not part of the
a41c76
+    backup!)
a41c76
+
a41c76
+Snapshot
a41c76
+
a41c76
+    Similarly to a checkpoint it's a point in time in the lifecycle of the VM
a41c76
+    but the state of the VM including memory is captured at that point allowing
a41c76
+    returning to the state later.
a41c76
+
a41c76
+Blockjob
a41c76
+
a41c76
+    A long running job which modifies the shape and/or location of the disk
a41c76
+    backing chain (images storing the disk contents). Libvirt supports
a41c76
+    ``block pull`` where data is moved up the chain towards the active layer,
a41c76
+    ``block commit`` where data is moved down the chain towards the base/oldest
a41c76
+    image. These blockjobs always remove images from the backing chain. Lastly
a41c76
+    ``block copy`` where image is moved to a different location (and possibly
a41c76
+    collapsed moving all of the data into the new location into the one image).
a41c76
+
a41c76
+block-dirty-bitmap (bitmap)
a41c76
+
a41c76
+    A data structure in qemu tracking which blocks were written by the guest
a41c76
+    OS since the bitmap was created.
a41c76
+
a41c76
+Relationships of bitmaps, checkpoints and VM disks
a41c76
+==================================================
a41c76
+
a41c76
+When a checkpoint is created libvirt creates a block-dirty-bitmap for every
a41c76
+configured VM disk named the same way as the chcheckpoint. The bitmap is
a41c76
+actively recording which blocks were changed by the guest OS from that point on.
a41c76
+Other bitmaps are not impacted by any way as they are self-contained:
a41c76
+
a41c76
+::
a41c76
+
a41c76
+ +----------------+       +----------------+
a41c76
+ | disk: vda      |       | disk: vdb      |
a41c76
+ +--------+-------+       +--------+-------+
a41c76
+          |                        |
a41c76
+ +--------v-------+       +--------v-------+
a41c76
+ | vda-1.qcow2    |       | vdb-1.qcow2    |
a41c76
+ |                |       |                |
a41c76
+ | bitmaps: chk-a |       | bitmaps: chk-a |
a41c76
+ |          chk-b |       |          chk-b |
a41c76
+ |                |       |                |
a41c76
+ +----------------+       +----------------+
a41c76
+
a41c76
+Bitmaps are created at the same time to track changes to all disks in sync and
a41c76
+are active and persisted in the QCOW2 image. Other formats currently don't
a41c76
+support this feature.
a41c76
+
a41c76
+Modification of bitmaps outside of libvirt is not recommended, but when adhering
a41c76
+to the same semantics which the document will describe it should be safe to do
a41c76
+so, even if we obviously can't guarantee that.
a41c76
+
a41c76
+
a41c76
+Integration with external snapshots
a41c76
+===================================
a41c76
+
a41c76
+Handling of bitmaps
a41c76
+-------------------
a41c76
+
a41c76
+Creating an external snapshot involves adding a new layer to the backing chain
a41c76
+on top of the previous chain. In this step there are no new bitmaps created by
a41c76
+default, which would mean that backups become impossible after this step.
a41c76
+
a41c76
+To prevent this from happening we need to re-create the active bitmaps in the
a41c76
+new top/active layer of the backing chain which allows us to continue tracking
a41c76
+the changes with same granularity as before and also allows libvirt to stitch
a41c76
+together all the corresponding bitmaps to do a backup across snapshots.
a41c76
+
a41c76
+After taking a snapshot of the ``vda`` disk from the example above placed into
a41c76
+``vda-2.qcow2`` the following topology will be created:
a41c76
+
a41c76
+::
a41c76
+
a41c76
+   +----------------+
a41c76
+   | disk: vda      |
a41c76
+   +-------+--------+
a41c76
+           |
a41c76
+   +-------v--------+    +----------------+
a41c76
+   | vda-2.qcow2    |    | vda-1.qcow2    |
a41c76
+   |                |    |                |
a41c76
+   | bitmaps: chk-a +----> bitmaps: chk-a |
a41c76
+   |          chk-b |    |          chk-b |
a41c76
+   |                |    |                |
a41c76
+   +----------------+    +----------------+
a41c76
+
a41c76
+Checking bitmap health
a41c76
+----------------------
a41c76
+
a41c76
+QEMU optimizes disk writes by only updating the bitmaps in certain cases. This
a41c76
+also can cause problems in cases when e.g. QEMU crashes.
a41c76
+
a41c76
+For a chain of corresponding bitmaps in a backing chain to be considered valid
a41c76
+and eligible for use with ``virDomainBackupBegin`` it must conform to the
a41c76
+following rules:
a41c76
+
a41c76
+1) Top image must contain the bitmap
a41c76
+2) If any of the backing images in the chain contain the bitmap too, all
a41c76
+   contiguous images must have the bitmap (no gaps)
a41c76
+3) all of the above bitmaps must be marked as active
a41c76
+   (``auto`` flag in ``qemu-img`` output, ``recording`` in qemu)
a41c76
+4) none of the above bitmaps can be inconsistent
a41c76
+   (``in-use`` flag in ``qemu-img`` provided that it's not used on image which
a41c76
+   is currently in use by a qemu instance, or ``inconsistent`` in qemu)
a41c76
+
a41c76
+::
a41c76
+
a41c76
+ # check that image has bitmaps
a41c76
+  $ qemu-img info vda-1.qcow2
a41c76
+   image: vda-1.qcow2
a41c76
+   file format: qcow2
a41c76
+   virtual size: 100 MiB (104857600 bytes)
a41c76
+   disk size: 220 KiB
a41c76
+   cluster_size: 65536
a41c76
+   Format specific information:
a41c76
+       compat: 1.1
a41c76
+       compression type: zlib
a41c76
+       lazy refcounts: false
a41c76
+       bitmaps:
a41c76
+           [0]:
a41c76
+               flags:
a41c76
+                   [0]: in-use
a41c76
+                   [1]: auto
a41c76
+               name: chk-a
a41c76
+               granularity: 65536
a41c76
+           [1]:
a41c76
+               flags:
a41c76
+                   [0]: auto
a41c76
+               name: chk-b
a41c76
+               granularity: 65536
a41c76
+       refcount bits: 16
a41c76
+       corrupt: false
a41c76
+
a41c76
+(See also the ``qemuBlockBitmapChainIsValid`` helper method in
a41c76
+``src/qemu/qemu_block.c``)
a41c76
+
a41c76
+Creating external snapshots manually
a41c76
+--------------------------------------
a41c76
+
a41c76
+To create the same topology outside of libvirt (e.g when doing snapshots offline)
a41c76
+a new ``qemu-img`` which supports the ``bitmap`` subcommand is recommended. The
a41c76
+following algorithm then ensures that the new image after snapshot will work
a41c76
+with backups (note that ``jq`` is a JSON processor):
a41c76
+
a41c76
+::
a41c76
+
a41c76
+  #!/bin/bash
a41c76
+
a41c76
+  # arguments
a41c76
+  SNAP_IMG="vda-2.qcow2"
a41c76
+  BACKING_IMG="vda-1.qcow2"
a41c76
+
a41c76
+  # constants - snapshots and bitmaps work only with qcow2
a41c76
+  SNAP_FMT="qcow2"
a41c76
+  BACKING_IMG_FMT="qcow2"
a41c76
+
a41c76
+  # create snapshot overlay
a41c76
+  qemu-img create -f "$SNAP_FMT" -F "$BACKING_IMG_FMT" -b "$BACKING_IMG" "$SNAP_IMG"
a41c76
+
a41c76
+  BACKING_IMG_INFO=$(qemu-img info --output=json -f "$BACKING_IMG_FMT" "$BACKING_IMG")
a41c76
+  BACKING_BITMAPS=$(jq '."format-specific".data.bitmaps' <<< "$BACKING_IMG_INFO")
a41c76
+
a41c76
+  if [ "x$BACKING_BITMAPS" = "xnull" ]; then
a41c76
+      exit 0
a41c76
+  fi
a41c76
+
a41c76
+  for BACKING_BITMAP_ in $(jq -c '.[]' <<< "$BACKING_BITMAPS"); do
a41c76
+      BITMAP_FLAGS=$(jq -c -r '.flags[]' <<< "$BACKING_BITMAP_")
a41c76
+      BITMAP_NAME=$(jq -r '.name' <<< "$BACKING_BITMAP_")
a41c76
+
a41c76
+      if grep 'in-use' <<< "$BITMAP_FLAGS" ||
a41c76
+         grep -v 'auto' <<< "$BITMAP_FLAGS"; then
a41c76
+         continue
a41c76
+      fi
a41c76
+
a41c76
+      qemu-img bitmap -f "$SNAP_FMT" "$SNAP_IMG" --add "$BITMAP_NAME"
a41c76
+
a41c76
+  done
a41c76
-- 
a41c76
2.27.0
a41c76