Blame SOURCES/kvm-vfio-Inhibit-ballooning-based-on-group-attachment-to.patch

383d26
From d171459ebce617bcd42e7c4a5932b3b0f3fa36d2 Mon Sep 17 00:00:00 2001
383d26
From: Alex Williamson <alex.williamson@redhat.com>
383d26
Date: Mon, 3 Dec 2018 21:53:07 +0100
383d26
Subject: [PATCH 20/34] vfio: Inhibit ballooning based on group attachment to a
383d26
 container
383d26
383d26
RH-Author: Alex Williamson <alex.williamson@redhat.com>
383d26
Message-id: <154387398693.26945.4148583224290166798.stgit@gimli.home>
383d26
Patchwork-id: 83229
383d26
O-Subject: [RHEL-7.7 qemu-kvm-rhev PATCH 3/7] vfio: Inhibit ballooning based on group attachment to a container
383d26
Bugzilla: 1619778
383d26
RH-Acked-by: Peter Xu <peterx@redhat.com>
383d26
RH-Acked-by: Cornelia Huck <cohuck@redhat.com>
383d26
RH-Acked-by: Auger Eric <eric.auger@redhat.com>
383d26
RH-Acked-by: David Hildenbrand <david@redhat.com>
383d26
383d26
Bugzilla: 1619778
383d26
383d26
We use a VFIOContainer to associate an AddressSpace to one or more
383d26
VFIOGroups.  The VFIOContainer represents the DMA context for that
383d26
AdressSpace for those VFIOGroups and is synchronized to changes in
383d26
that AddressSpace via a MemoryListener.  For IOMMU backed devices,
383d26
maintaining the DMA context for a VFIOGroup generally involves
383d26
pinning a host virtual address in order to create a stable host
383d26
physical address and then mapping a translation from the associated
383d26
guest physical address to that host physical address into the IOMMU.
383d26
383d26
While the above maintains the VFIOContainer synchronized to the QEMU
383d26
memory API of the VM, memory ballooning occurs outside of that API.
383d26
Inflating the memory balloon (ie. cooperatively capturing pages from
383d26
the guest for use by the host) simply uses MADV_DONTNEED to "zap"
383d26
pages from QEMU's host virtual address space.  The page pinning and
383d26
IOMMU mapping above remains in place, negating the host's ability to
383d26
reuse the page, but the host virtual to host physical mapping of the
383d26
page is invalidated outside of QEMU's memory API.
383d26
383d26
When the balloon is later deflated, attempting to cooperatively
383d26
return pages to the guest, the page is simply freed by the guest
383d26
balloon driver, allowing it to be used in the guest and incurring a
383d26
page fault when that occurs.  The page fault maps a new host physical
383d26
page backing the existing host virtual address, meanwhile the
383d26
VFIOContainer still maintains the translation to the original host
383d26
physical address.  At this point the guest vCPU and any assigned
383d26
devices will map different host physical addresses to the same guest
383d26
physical address.  Badness.
383d26
383d26
The IOMMU typically does not have page level granularity with which
383d26
it can track this mapping without also incurring inefficiencies in
383d26
using page size mappings throughout.  MMU notifiers in the host
383d26
kernel also provide indicators for invalidating the mapping on
383d26
balloon inflation, not for updating the mapping when the balloon is
383d26
deflated.  For these reasons we assume a default behavior that the
383d26
mapping of each VFIOGroup into the VFIOContainer is incompatible
383d26
with memory ballooning and increment the balloon inhibitor to match
383d26
the attached VFIOGroups.
383d26
383d26
Reviewed-by: Peter Xu <peterx@redhat.com>
383d26
Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
383d26
(cherry picked from commit c65ee433153b5925e183a00ebf568e160077c694)
383d26
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
383d26
---
383d26
 hw/vfio/common.c | 30 ++++++++++++++++++++++++++++++
383d26
 1 file changed, 30 insertions(+)
383d26
383d26
diff --git a/hw/vfio/common.c b/hw/vfio/common.c
383d26
index 07ffa0b..7e8f289 100644
383d26
--- a/hw/vfio/common.c
383d26
+++ b/hw/vfio/common.c
383d26
@@ -32,6 +32,7 @@
383d26
 #include "hw/hw.h"
383d26
 #include "qemu/error-report.h"
383d26
 #include "qemu/range.h"
383d26
+#include "sysemu/balloon.h"
383d26
 #include "sysemu/kvm.h"
383d26
 #include "trace.h"
383d26
 #include "qapi/error.h"
383d26
@@ -1039,6 +1040,33 @@ static int vfio_connect_container(VFIOGroup *group, AddressSpace *as,
383d26
 
383d26
     space = vfio_get_address_space(as);
383d26
 
383d26
+    /*
383d26
+     * VFIO is currently incompatible with memory ballooning insofar as the
383d26
+     * madvise to purge (zap) the page from QEMU's address space does not
383d26
+     * interact with the memory API and therefore leaves stale virtual to
383d26
+     * physical mappings in the IOMMU if the page was previously pinned.  We
383d26
+     * therefore add a balloon inhibit for each group added to a container,
383d26
+     * whether the container is used individually or shared.  This provides
383d26
+     * us with options to allow devices within a group to opt-in and allow
383d26
+     * ballooning, so long as it is done consistently for a group (for instance
383d26
+     * if the device is an mdev device where it is known that the host vendor
383d26
+     * driver will never pin pages outside of the working set of the guest
383d26
+     * driver, which would thus not be ballooning candidates).
383d26
+     *
383d26
+     * The first opportunity to induce pinning occurs here where we attempt to
383d26
+     * attach the group to existing containers within the AddressSpace.  If any
383d26
+     * pages are already zapped from the virtual address space, such as from a
383d26
+     * previous ballooning opt-in, new pinning will cause valid mappings to be
383d26
+     * re-established.  Likewise, when the overall MemoryListener for a new
383d26
+     * container is registered, a replay of mappings within the AddressSpace
383d26
+     * will occur, re-establishing any previously zapped pages as well.
383d26
+     *
383d26
+     * NB. Balloon inhibiting does not currently block operation of the
383d26
+     * balloon driver or revoke previously pinned pages, it only prevents
383d26
+     * calling madvise to modify the virtual mapping of ballooned pages.
383d26
+     */
383d26
+    qemu_balloon_inhibit(true);
383d26
+
383d26
     QLIST_FOREACH(container, &space->containers, next) {
383d26
         if (!ioctl(group->fd, VFIO_GROUP_SET_CONTAINER, &container->fd)) {
383d26
             group->container = container;
383d26
@@ -1227,6 +1255,7 @@ close_fd_exit:
383d26
     close(fd);
383d26
 
383d26
 put_space_exit:
383d26
+    qemu_balloon_inhibit(false);
383d26
     vfio_put_address_space(space);
383d26
 
383d26
     return ret;
383d26
@@ -1347,6 +1376,7 @@ void vfio_put_group(VFIOGroup *group)
383d26
         return;
383d26
     }
383d26
 
383d26
+    qemu_balloon_inhibit(false);
383d26
     vfio_kvm_device_del_group(group);
383d26
     vfio_disconnect_container(group);
383d26
     QLIST_REMOVE(group, next);
383d26
-- 
383d26
1.8.3.1
383d26