Blame SOURCES/kvm-vfio-Inhibit-ballooning-based-on-group-attachment-to.patch

7711c0
From d171459ebce617bcd42e7c4a5932b3b0f3fa36d2 Mon Sep 17 00:00:00 2001
7711c0
From: Alex Williamson <alex.williamson@redhat.com>
7711c0
Date: Mon, 3 Dec 2018 21:53:07 +0100
7711c0
Subject: [PATCH 20/34] vfio: Inhibit ballooning based on group attachment to a
7711c0
 container
7711c0
7711c0
RH-Author: Alex Williamson <alex.williamson@redhat.com>
7711c0
Message-id: <154387398693.26945.4148583224290166798.stgit@gimli.home>
7711c0
Patchwork-id: 83229
7711c0
O-Subject: [RHEL-7.7 qemu-kvm-rhev PATCH 3/7] vfio: Inhibit ballooning based on group attachment to a container
7711c0
Bugzilla: 1619778
7711c0
RH-Acked-by: Peter Xu <peterx@redhat.com>
7711c0
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
7711c0
RH-Acked-by: Auger Eric <eric.auger@redhat.com>
7711c0
RH-Acked-by: David Hildenbrand <david@redhat.com>
7711c0
7711c0
Bugzilla: 1619778
7711c0
7711c0
We use a VFIOContainer to associate an AddressSpace to one or more
7711c0
VFIOGroups.  The VFIOContainer represents the DMA context for that
7711c0
AdressSpace for those VFIOGroups and is synchronized to changes in
7711c0
that AddressSpace via a MemoryListener.  For IOMMU backed devices,
7711c0
maintaining the DMA context for a VFIOGroup generally involves
7711c0
pinning a host virtual address in order to create a stable host
7711c0
physical address and then mapping a translation from the associated
7711c0
guest physical address to that host physical address into the IOMMU.
7711c0
7711c0
While the above maintains the VFIOContainer synchronized to the QEMU
7711c0
memory API of the VM, memory ballooning occurs outside of that API.
7711c0
Inflating the memory balloon (ie. cooperatively capturing pages from
7711c0
the guest for use by the host) simply uses MADV_DONTNEED to "zap"
7711c0
pages from QEMU's host virtual address space.  The page pinning and
7711c0
IOMMU mapping above remains in place, negating the host's ability to
7711c0
reuse the page, but the host virtual to host physical mapping of the
7711c0
page is invalidated outside of QEMU's memory API.
7711c0
7711c0
When the balloon is later deflated, attempting to cooperatively
7711c0
return pages to the guest, the page is simply freed by the guest
7711c0
balloon driver, allowing it to be used in the guest and incurring a
7711c0
page fault when that occurs.  The page fault maps a new host physical
7711c0
page backing the existing host virtual address, meanwhile the
7711c0
VFIOContainer still maintains the translation to the original host
7711c0
physical address.  At this point the guest vCPU and any assigned
7711c0
devices will map different host physical addresses to the same guest
7711c0
physical address.  Badness.
7711c0
7711c0
The IOMMU typically does not have page level granularity with which
7711c0
it can track this mapping without also incurring inefficiencies in
7711c0
using page size mappings throughout.  MMU notifiers in the host
7711c0
kernel also provide indicators for invalidating the mapping on
7711c0
balloon inflation, not for updating the mapping when the balloon is
7711c0
deflated.  For these reasons we assume a default behavior that the
7711c0
mapping of each VFIOGroup into the VFIOContainer is incompatible
7711c0
with memory ballooning and increment the balloon inhibitor to match
7711c0
the attached VFIOGroups.
7711c0
7711c0
Reviewed-by: Peter Xu <peterx@redhat.com>
7711c0
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
7711c0
(cherry picked from commit c65ee433153b5925e183a00ebf568e160077c694)
7711c0
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
7711c0
---
7711c0
 hw/vfio/common.c | 30 ++++++++++++++++++++++++++++++
7711c0
 1 file changed, 30 insertions(+)
7711c0
7711c0
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
7711c0
index 07ffa0b..7e8f289 100644
7711c0
--- a/hw/vfio/common.c
7711c0
+++ b/hw/vfio/common.c
7711c0
@@ -32,6 +32,7 @@
7711c0
 #include "hw/hw.h"
7711c0
 #include "qemu/error-report.h"
7711c0
 #include "qemu/range.h"
7711c0
+#include "sysemu/balloon.h"
7711c0
 #include "sysemu/kvm.h"
7711c0
 #include "trace.h"
7711c0
 #include "qapi/error.h"
7711c0
@@ -1039,6 +1040,33 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
7711c0
 
7711c0
     space = vfio_get_address_space(as);
7711c0
 
7711c0
+    /*
7711c0
+     * VFIO is currently incompatible with memory ballooning insofar as the
7711c0
+     * madvise to purge (zap) the page from QEMU's address space does not
7711c0
+     * interact with the memory API and therefore leaves stale virtual to
7711c0
+     * physical mappings in the IOMMU if the page was previously pinned.  We
7711c0
+     * therefore add a balloon inhibit for each group added to a container,
7711c0
+     * whether the container is used individually or shared.  This provides
7711c0
+     * us with options to allow devices within a group to opt-in and allow
7711c0
+     * ballooning, so long as it is done consistently for a group (for instance
7711c0
+     * if the device is an mdev device where it is known that the host vendor
7711c0
+     * driver will never pin pages outside of the working set of the guest
7711c0
+     * driver, which would thus not be ballooning candidates).
7711c0
+     *
7711c0
+     * The first opportunity to induce pinning occurs here where we attempt to
7711c0
+     * attach the group to existing containers within the AddressSpace.  If any
7711c0
+     * pages are already zapped from the virtual address space, such as from a
7711c0
+     * previous ballooning opt-in, new pinning will cause valid mappings to be
7711c0
+     * re-established.  Likewise, when the overall MemoryListener for a new
7711c0
+     * container is registered, a replay of mappings within the AddressSpace
7711c0
+     * will occur, re-establishing any previously zapped pages as well.
7711c0
+     *
7711c0
+     * NB. Balloon inhibiting does not currently block operation of the
7711c0
+     * balloon driver or revoke previously pinned pages, it only prevents
7711c0
+     * calling madvise to modify the virtual mapping of ballooned pages.
7711c0
+     */
7711c0
+    qemu_balloon_inhibit(true);
7711c0
+
7711c0
     QLIST_FOREACH(container, &space->containers, next) {
7711c0
         if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
7711c0
             group->container = container;
7711c0
@@ -1227,6 +1255,7 @@ close_fd_exit:
7711c0
     close(fd);
7711c0
 
7711c0
 put_space_exit:
7711c0
+    qemu_balloon_inhibit(false);
7711c0
     vfio_put_address_space(space);
7711c0
 
7711c0
     return ret;
7711c0
@@ -1347,6 +1376,7 @@ void vfio_put_group(VFIOGroup *group)
7711c0
         return;
7711c0
     }
7711c0
 
7711c0
+    qemu_balloon_inhibit(false);
7711c0
     vfio_kvm_device_del_group(group);
7711c0
     vfio_disconnect_container(group);
7711c0
     QLIST_REMOVE(group, next);
7711c0
-- 
7711c0
1.8.3.1
7711c0