render / rpms / libvirt

Forked from rpms/libvirt 10 months ago
Clone
b971b8
From 02714666a525ea4dd8756f66fae28163fb685d05 Mon Sep 17 00:00:00 2001
b971b8
Message-Id: <02714666a525ea4dd8756f66fae28163fb685d05@dist-git>
b971b8
From: Peter Krempa <pkrempa@redhat.com>
b971b8
Date: Tue, 23 Jun 2020 12:24:06 +0200
b971b8
Subject: [PATCH] kbase: Add document outlining internals of incremental backup
b971b8
 in qemu
b971b8
MIME-Version: 1.0
b971b8
Content-Type: text/plain; charset=UTF-8
b971b8
Content-Transfer-Encoding: 8bit
b971b8
b971b8
Outline the basics and how to integrate with externally created
b971b8
overlays. Other topics will continue later.
b971b8
b971b8
Signed-off-by: Peter Krempa <pkrempa@redhat.com>
b971b8
Reviewed-by: Eric Blake <eblake@redhat.com>
b971b8
(cherry picked from commit da5e5a1e154836abe83077cf477c104b8f98b1d2)
b971b8
https://bugzilla.redhat.com/show_bug.cgi?id=1804593
b971b8
b971b8
Conflicts: docs/kbase.html.in: real time kvm article not backported
b971b8
Message-Id: <e0fea1e270856b642c34827eb3af1c0b01afd510.1592906423.git.pkrempa@redhat.com>
b971b8
Reviewed-by: Ján Tomko <jtomko@redhat.com>
b971b8
---
b971b8
 docs/kbase.html.in                        |   3 +
b971b8
 docs/kbase/incrementalbackupinternals.rst | 217 ++++++++++++++++++++++
b971b8
 2 files changed, 220 insertions(+)
b971b8
 create mode 100644 docs/kbase/incrementalbackupinternals.rst
b971b8
b971b8
diff --git a/docs/kbase.html.in b/docs/kbase.html.in
b971b8
index 7d6caf3cb1..f2975960f6 100644
b971b8
--- a/docs/kbase.html.in
b971b8
+++ b/docs/kbase.html.in
b971b8
@@ -32,6 +32,9 @@
b971b8
 
b971b8
         
Virtio-FS
b971b8
         
Share a filesystem between the guest and the host
b971b8
+
b971b8
+        
Incremental backup internals
b971b8
+        
Incremental backup implementation details relevant for users
b971b8
       
b971b8
     
b971b8
 
b971b8
diff --git a/docs/kbase/incrementalbackupinternals.rst b/docs/kbase/incrementalbackupinternals.rst
b971b8
new file mode 100644
b971b8
index 0000000000..0c4b4f7486
b971b8
--- /dev/null
b971b8
+++ b/docs/kbase/incrementalbackupinternals.rst
b971b8
@@ -0,0 +1,217 @@
b971b8
+================================================
b971b8
+Internals of incremental backup handling in qemu
b971b8
+================================================
b971b8
+
b971b8
+.. contents::
b971b8
+
b971b8
+Libvirt's implementation of incremental backups in the ``qemu`` driver uses
b971b8
+qemu's ``block-dirty-bitmaps`` under the hood to track the guest visible disk
b971b8
+state changes corresponding to the points in time described by a libvirt
b971b8
+checkpoint.
b971b8
+
b971b8
+There are some semantica implications with how libvirt creates and manages the
b971b8
+bitmaps which de-facto become API as they are written into the disk images, and
b971b8
+this document will try to summarize them.
b971b8
+
b971b8
+Glossary
b971b8
+========
b971b8
+
b971b8
+See the knowledge base article on
b971b8
+`domain state capture <https://libvirt.org/kbase/domainstatecapture.html>`_ for
b971b8
+a deeper explanation of some of the concepts.
b971b8
+
b971b8
+Checkpoint
b971b8
+
b971b8
+    A libvirt object which represents a named point in time of the life of the
b971b8
+    vm where libvirt tracks writes the VM has done, thereby allowing a backup of
b971b8
+    only the blocks which changed. Note that state of the VM memory is _not_
b971b8
+    captured.
b971b8
+
b971b8
+    A checkpoint can be created either explicitly via the corresponding API
b971b8
+    (although this isn't very useful on its own), or simultaneously with an
b971b8
+    incremental or full backup of the VM using the ``virDomainBackupBegin`` API
b971b8
+    which allows a next backup to only copy the differences.
b971b8
+
b971b8
+Backup
b971b8
+
b971b8
+    A copy of either all blocks of selected disks (full backup) or blocks changed
b971b8
+    since a checkpoint (incremental backup) at the time the backup job was
b971b8
+    started. (Blocks modified while the backup job is running are not part of the
b971b8
+    backup!)
b971b8
+
b971b8
+Snapshot
b971b8
+
b971b8
+    Similarly to a checkpoint it's a point in time in the lifecycle of the VM
b971b8
+    but the state of the VM including memory is captured at that point allowing
b971b8
+    returning to the state later.
b971b8
+
b971b8
+Blockjob
b971b8
+
b971b8
+    A long running job which modifies the shape and/or location of the disk
b971b8
+    backing chain (images storing the disk contents). Libvirt supports
b971b8
+    ``block pull`` where data is moved up the chain towards the active layer,
b971b8
+    ``block commit`` where data is moved down the chain towards the base/oldest
b971b8
+    image. These blockjobs always remove images from the backing chain. Lastly
b971b8
+    ``block copy`` where image is moved to a different location (and possibly
b971b8
+    collapsed moving all of the data into the new location into the one image).
b971b8
+
b971b8
+block-dirty-bitmap (bitmap)
b971b8
+
b971b8
+    A data structure in qemu tracking which blocks were written by the guest
b971b8
+    OS since the bitmap was created.
b971b8
+
b971b8
+Relationships of bitmaps, checkpoints and VM disks
b971b8
+==================================================
b971b8
+
b971b8
+When a checkpoint is created libvirt creates a block-dirty-bitmap for every
b971b8
+configured VM disk named the same way as the chcheckpoint. The bitmap is
b971b8
+actively recording which blocks were changed by the guest OS from that point on.
b971b8
+Other bitmaps are not impacted by any way as they are self-contained:
b971b8
+
b971b8
+::
b971b8
+
b971b8
+ +----------------+       +----------------+
b971b8
+ | disk: vda      |       | disk: vdb      |
b971b8
+ +--------+-------+       +--------+-------+
b971b8
+          |                        |
b971b8
+ +--------v-------+       +--------v-------+
b971b8
+ | vda-1.qcow2    |       | vdb-1.qcow2    |
b971b8
+ |                |       |                |
b971b8
+ | bitmaps: chk-a |       | bitmaps: chk-a |
b971b8
+ |          chk-b |       |          chk-b |
b971b8
+ |                |       |                |
b971b8
+ +----------------+       +----------------+
b971b8
+
b971b8
+Bitmaps are created at the same time to track changes to all disks in sync and
b971b8
+are active and persisted in the QCOW2 image. Other formats currently don't
b971b8
+support this feature.
b971b8
+
b971b8
+Modification of bitmaps outside of libvirt is not recommended, but when adhering
b971b8
+to the same semantics which the document will describe it should be safe to do
b971b8
+so, even if we obviously can't guarantee that.
b971b8
+
b971b8
+
b971b8
+Integration with external snapshots
b971b8
+===================================
b971b8
+
b971b8
+Handling of bitmaps
b971b8
+-------------------
b971b8
+
b971b8
+Creating an external snapshot involves adding a new layer to the backing chain
b971b8
+on top of the previous chain. In this step there are no new bitmaps created by
b971b8
+default, which would mean that backups become impossible after this step.
b971b8
+
b971b8
+To prevent this from happening we need to re-create the active bitmaps in the
b971b8
+new top/active layer of the backing chain which allows us to continue tracking
b971b8
+the changes with same granularity as before and also allows libvirt to stitch
b971b8
+together all the corresponding bitmaps to do a backup across snapshots.
b971b8
+
b971b8
+After taking a snapshot of the ``vda`` disk from the example above placed into
b971b8
+``vda-2.qcow2`` the following topology will be created:
b971b8
+
b971b8
+::
b971b8
+
b971b8
+   +----------------+
b971b8
+   | disk: vda      |
b971b8
+   +-------+--------+
b971b8
+           |
b971b8
+   +-------v--------+    +----------------+
b971b8
+   | vda-2.qcow2    |    | vda-1.qcow2    |
b971b8
+   |                |    |                |
b971b8
+   | bitmaps: chk-a +----> bitmaps: chk-a |
b971b8
+   |          chk-b |    |          chk-b |
b971b8
+   |                |    |                |
b971b8
+   +----------------+    +----------------+
b971b8
+
b971b8
+Checking bitmap health
b971b8
+----------------------
b971b8
+
b971b8
+QEMU optimizes disk writes by only updating the bitmaps in certain cases. This
b971b8
+also can cause problems in cases when e.g. QEMU crashes.
b971b8
+
b971b8
+For a chain of corresponding bitmaps in a backing chain to be considered valid
b971b8
+and eligible for use with ``virDomainBackupBegin`` it must conform to the
b971b8
+following rules:
b971b8
+
b971b8
+1) Top image must contain the bitmap
b971b8
+2) If any of the backing images in the chain contain the bitmap too, all
b971b8
+   contiguous images must have the bitmap (no gaps)
b971b8
+3) all of the above bitmaps must be marked as active
b971b8
+   (``auto`` flag in ``qemu-img`` output, ``recording`` in qemu)
b971b8
+4) none of the above bitmaps can be inconsistent
b971b8
+   (``in-use`` flag in ``qemu-img`` provided that it's not used on image which
b971b8
+   is currently in use by a qemu instance, or ``inconsistent`` in qemu)
b971b8
+
b971b8
+::
b971b8
+
b971b8
+ # check that image has bitmaps
b971b8
+  $ qemu-img info vda-1.qcow2
b971b8
+   image: vda-1.qcow2
b971b8
+   file format: qcow2
b971b8
+   virtual size: 100 MiB (104857600 bytes)
b971b8
+   disk size: 220 KiB
b971b8
+   cluster_size: 65536
b971b8
+   Format specific information:
b971b8
+       compat: 1.1
b971b8
+       compression type: zlib
b971b8
+       lazy refcounts: false
b971b8
+       bitmaps:
b971b8
+           [0]:
b971b8
+               flags:
b971b8
+                   [0]: in-use
b971b8
+                   [1]: auto
b971b8
+               name: chk-a
b971b8
+               granularity: 65536
b971b8
+           [1]:
b971b8
+               flags:
b971b8
+                   [0]: auto
b971b8
+               name: chk-b
b971b8
+               granularity: 65536
b971b8
+       refcount bits: 16
b971b8
+       corrupt: false
b971b8
+
b971b8
+(See also the ``qemuBlockBitmapChainIsValid`` helper method in
b971b8
+``src/qemu/qemu_block.c``)
b971b8
+
b971b8
+Creating external snapshots manually
b971b8
+--------------------------------------
b971b8
+
b971b8
+To create the same topology outside of libvirt (e.g when doing snapshots offline)
b971b8
+a new ``qemu-img`` which supports the ``bitmap`` subcommand is recommended. The
b971b8
+following algorithm then ensures that the new image after snapshot will work
b971b8
+with backups (note that ``jq`` is a JSON processor):
b971b8
+
b971b8
+::
b971b8
+
b971b8
+  #!/bin/bash
b971b8
+
b971b8
+  # arguments
b971b8
+  SNAP_IMG="vda-2.qcow2"
b971b8
+  BACKING_IMG="vda-1.qcow2"
b971b8
+
b971b8
+  # constants - snapshots and bitmaps work only with qcow2
b971b8
+  SNAP_FMT="qcow2"
b971b8
+  BACKING_IMG_FMT="qcow2"
b971b8
+
b971b8
+  # create snapshot overlay
b971b8
+  qemu-img create -f "$SNAP_FMT" -F "$BACKING_IMG_FMT" -b "$BACKING_IMG" "$SNAP_IMG"
b971b8
+
b971b8
+  BACKING_IMG_INFO=$(qemu-img info --output=json -f "$BACKING_IMG_FMT" "$BACKING_IMG")
b971b8
+  BACKING_BITMAPS=$(jq '."format-specific".data.bitmaps' <<< "$BACKING_IMG_INFO")
b971b8
+
b971b8
+  if [ "x$BACKING_BITMAPS" = "xnull" ]; then
b971b8
+      exit 0
b971b8
+  fi
b971b8
+
b971b8
+  for BACKING_BITMAP_ in $(jq -c '.[]' <<< "$BACKING_BITMAPS"); do
b971b8
+      BITMAP_FLAGS=$(jq -c -r '.flags[]' <<< "$BACKING_BITMAP_")
b971b8
+      BITMAP_NAME=$(jq -r '.name' <<< "$BACKING_BITMAP_")
b971b8
+
b971b8
+      if grep 'in-use' <<< "$BITMAP_FLAGS" ||
b971b8
+         grep -v 'auto' <<< "$BITMAP_FLAGS"; then
b971b8
+         continue
b971b8
+      fi
b971b8
+
b971b8
+      qemu-img bitmap -f "$SNAP_FMT" "$SNAP_IMG" --add "$BITMAP_NAME"
b971b8
+
b971b8
+  done
b971b8
-- 
b971b8
2.27.0
b971b8