Blame SOURCES/kvm-target-i386-kvm-Demand-nested-migration-kernel-capab.patch

Pablo Greco e6a3ae
From 2427e21de274cf7b56ef79e4a7ba78a08def7a58 Mon Sep 17 00:00:00 2001
Pablo Greco e6a3ae
From: Paolo Bonzini <pbonzini@redhat.com>
Pablo Greco e6a3ae
Date: Mon, 22 Jul 2019 18:22:18 +0100
Pablo Greco e6a3ae
Subject: [PATCH 37/39] target/i386: kvm: Demand nested migration kernel
Pablo Greco e6a3ae
 capabilities only when vCPU may have enabled VMX
Pablo Greco e6a3ae
Pablo Greco e6a3ae
RH-Author: Paolo Bonzini <pbonzini@redhat.com>
Pablo Greco e6a3ae
Message-id: <20190722182220.19374-17-pbonzini@redhat.com>
Pablo Greco e6a3ae
Patchwork-id: 89634
Pablo Greco e6a3ae
O-Subject: [RHEL-8.1.0 PATCH qemu-kvm v3 16/18] target/i386: kvm: Demand nested migration kernel capabilities only when vCPU may have enabled VMX
Pablo Greco e6a3ae
Bugzilla: 1689269
Pablo Greco e6a3ae
RH-Acked-by: Peter Xu <zhexu@redhat.com>
Pablo Greco e6a3ae
RH-Acked-by: Laurent Vivier <lvivier@redhat.com>
Pablo Greco e6a3ae
RH-Acked-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Pablo Greco e6a3ae
Pablo Greco e6a3ae
From: Liran Alon <liran.alon@oracle.com>
Pablo Greco e6a3ae
Pablo Greco e6a3ae
Previous to this change, a vCPU exposed with VMX running on a kernel
Pablo Greco e6a3ae
without KVM_CAP_NESTED_STATE or KVM_CAP_EXCEPTION_PAYLOAD resulted in
Pablo Greco e6a3ae
adding a migration blocker. This was because when the code was written
Pablo Greco e6a3ae
it was thought there is no way to reliably know if a vCPU is utilising
Pablo Greco e6a3ae
VMX or not at runtime. However, it turns out that this can be known to
Pablo Greco e6a3ae
some extent:
Pablo Greco e6a3ae
Pablo Greco e6a3ae
In order for a vCPU to enter VMX operation it must have CR4.VMXE set.
Pablo Greco e6a3ae
Since it was set, CR4.VMXE must remain set as long as the vCPU is in
Pablo Greco e6a3ae
VMX operation. This is because CR4.VMXE is one of the bits set
Pablo Greco e6a3ae
in MSR_IA32_VMX_CR4_FIXED1.
Pablo Greco e6a3ae
There is one exception to the above statement when vCPU enters SMM mode.
Pablo Greco e6a3ae
When a vCPU enters SMM mode, it temporarily exits VMX operation and
Pablo Greco e6a3ae
may also reset CR4.VMXE during execution in SMM mode.
Pablo Greco e6a3ae
When the vCPU exits SMM mode, vCPU state is restored to be in VMX operation
Pablo Greco e6a3ae
and CR4.VMXE is restored to its original state of being set.
Pablo Greco e6a3ae
Therefore, when the vCPU is not in SMM mode, we can infer whether
Pablo Greco e6a3ae
VMX is being used by examining CR4.VMXE. Otherwise, we cannot
Pablo Greco e6a3ae
know for certain but assume the worse that vCPU may utilise VMX.
Pablo Greco e6a3ae
Pablo Greco e6a3ae
Summaring all the above, a vCPU may have enabled VMX in case
Pablo Greco e6a3ae
CR4.VMXE is set or vCPU is in SMM mode.
Pablo Greco e6a3ae
Pablo Greco e6a3ae
Therefore, remove migration blocker and check before migration
Pablo Greco e6a3ae
(cpu_pre_save()) if the vCPU may have enabled VMX. If true, only then
Pablo Greco e6a3ae
require relevant kernel capabilities.
Pablo Greco e6a3ae
Pablo Greco e6a3ae
While at it, demand KVM_CAP_EXCEPTION_PAYLOAD only when the vCPU is in
Pablo Greco e6a3ae
guest-mode and there is a pending/injected exception. Otherwise, this
Pablo Greco e6a3ae
kernel capability is not required for proper migration.
Pablo Greco e6a3ae
Pablo Greco e6a3ae
Reviewed-by: Joao Martins <joao.m.martins@oracle.com>
Pablo Greco e6a3ae
Signed-off-by: Liran Alon <liran.alon@oracle.com>
Pablo Greco e6a3ae
Reviewed-by: Maran Wilson <maran.wilson@oracle.com>
Pablo Greco e6a3ae
Tested-by: Maran Wilson <maran.wilson@oracle.com>
Pablo Greco e6a3ae
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Pablo Greco e6a3ae
(cherry picked from commit 79a197ab180e75838523c58973b1221ad7bf51eb)
Pablo Greco e6a3ae
Signed-off-by: Danilo C. L. de Paula <ddepaula@redhat.com>
Pablo Greco e6a3ae
---
Pablo Greco e6a3ae
 target/i386/cpu.h      | 22 ++++++++++++++++++++++
Pablo Greco e6a3ae
 target/i386/kvm.c      | 26 ++++++--------------------
Pablo Greco e6a3ae
 target/i386/kvm_i386.h |  1 +
Pablo Greco e6a3ae
 target/i386/machine.c  | 24 ++++++++++++++++++++----
Pablo Greco e6a3ae
 4 files changed, 49 insertions(+), 24 deletions(-)
Pablo Greco e6a3ae
Pablo Greco e6a3ae
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
Pablo Greco e6a3ae
index d120f62..273c90b 100644
Pablo Greco e6a3ae
--- a/target/i386/cpu.h
Pablo Greco e6a3ae
+++ b/target/i386/cpu.h
Pablo Greco e6a3ae
@@ -1848,6 +1848,28 @@ static inline bool cpu_has_vmx(CPUX86State *env)
Pablo Greco e6a3ae
     return env->features[FEAT_1_ECX] & CPUID_EXT_VMX;
Pablo Greco e6a3ae
 }
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
+/*
Pablo Greco e6a3ae
+ * In order for a vCPU to enter VMX operation it must have CR4.VMXE set.
Pablo Greco e6a3ae
+ * Since it was set, CR4.VMXE must remain set as long as vCPU is in
Pablo Greco e6a3ae
+ * VMX operation. This is because CR4.VMXE is one of the bits set
Pablo Greco e6a3ae
+ * in MSR_IA32_VMX_CR4_FIXED1.
Pablo Greco e6a3ae
+ *
Pablo Greco e6a3ae
+ * There is one exception to above statement when vCPU enters SMM mode.
Pablo Greco e6a3ae
+ * When a vCPU enters SMM mode, it temporarily exit VMX operation and
Pablo Greco e6a3ae
+ * may also reset CR4.VMXE during execution in SMM mode.
Pablo Greco e6a3ae
+ * When vCPU exits SMM mode, vCPU state is restored to be in VMX operation
Pablo Greco e6a3ae
+ * and CR4.VMXE is restored to it's original value of being set.
Pablo Greco e6a3ae
+ *
Pablo Greco e6a3ae
+ * Therefore, when vCPU is not in SMM mode, we can infer whether
Pablo Greco e6a3ae
+ * VMX is being used by examining CR4.VMXE. Otherwise, we cannot
Pablo Greco e6a3ae
+ * know for certain.
Pablo Greco e6a3ae
+ */
Pablo Greco e6a3ae
+static inline bool cpu_vmx_maybe_enabled(CPUX86State *env)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    return cpu_has_vmx(env) &&
Pablo Greco e6a3ae
+           ((env->cr[4] & CR4_VMXE_MASK) || (env->hflags & HF_SMM_MASK));
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
 /* fpu_helper.c */
Pablo Greco e6a3ae
 void update_fp_status(CPUX86State *env);
Pablo Greco e6a3ae
 void update_mxcsr_status(CPUX86State *env);
Pablo Greco e6a3ae
diff --git a/target/i386/kvm.c b/target/i386/kvm.c
Pablo Greco e6a3ae
index 0619aba..0bd286e 100644
Pablo Greco e6a3ae
--- a/target/i386/kvm.c
Pablo Greco e6a3ae
+++ b/target/i386/kvm.c
Pablo Greco e6a3ae
@@ -127,6 +127,11 @@ bool kvm_has_adjust_clock_stable(void)
Pablo Greco e6a3ae
     return (ret == KVM_CLOCK_TSC_STABLE);
Pablo Greco e6a3ae
 }
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
+bool kvm_has_exception_payload(void)
Pablo Greco e6a3ae
+{
Pablo Greco e6a3ae
+    return has_exception_payload;
Pablo Greco e6a3ae
+}
Pablo Greco e6a3ae
+
Pablo Greco e6a3ae
 bool kvm_allows_irq0_override(void)
Pablo Greco e6a3ae
 {
Pablo Greco e6a3ae
     return !kvm_irqchip_in_kernel() || kvm_has_gsi_routing();
Pablo Greco e6a3ae
@@ -814,7 +819,6 @@ static int hyperv_handle_properties(CPUState *cs)
Pablo Greco e6a3ae
 }
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
 static Error *invtsc_mig_blocker;
Pablo Greco e6a3ae
-static Error *nested_virt_mig_blocker;
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
 #define KVM_MAX_CPUID_ENTRIES  100
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
@@ -1159,22 +1163,6 @@ int kvm_arch_init_vcpu(CPUState *cs)
Pablo Greco e6a3ae
                                   !!(c->ecx & CPUID_EXT_SMX);
Pablo Greco e6a3ae
     }
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
-    if (cpu_has_vmx(env) && !nested_virt_mig_blocker &&
Pablo Greco e6a3ae
-        ((kvm_max_nested_state_length() <= 0) || !has_exception_payload)) {
Pablo Greco e6a3ae
-        error_setg(&nested_virt_mig_blocker,
Pablo Greco e6a3ae
-                   "Kernel do not provide required capabilities for "
Pablo Greco e6a3ae
-                   "nested virtualization migration. "
Pablo Greco e6a3ae
-                   "(CAP_NESTED_STATE=%d, CAP_EXCEPTION_PAYLOAD=%d)",
Pablo Greco e6a3ae
-                   kvm_max_nested_state_length() > 0,
Pablo Greco e6a3ae
-                   has_exception_payload);
Pablo Greco e6a3ae
-        r = migrate_add_blocker(nested_virt_mig_blocker, &local_err);
Pablo Greco e6a3ae
-        if (local_err) {
Pablo Greco e6a3ae
-            error_report_err(local_err);
Pablo Greco e6a3ae
-            error_free(nested_virt_mig_blocker);
Pablo Greco e6a3ae
-            return r;
Pablo Greco e6a3ae
-        }
Pablo Greco e6a3ae
-    }
Pablo Greco e6a3ae
-
Pablo Greco e6a3ae
     if (env->mcg_cap & MCG_LMCE_P) {
Pablo Greco e6a3ae
         has_msr_mcg_ext_ctl = has_msr_feature_control = true;
Pablo Greco e6a3ae
     }
Pablo Greco e6a3ae
@@ -1190,7 +1178,7 @@ int kvm_arch_init_vcpu(CPUState *cs)
Pablo Greco e6a3ae
             if (local_err) {
Pablo Greco e6a3ae
                 error_report_err(local_err);
Pablo Greco e6a3ae
                 error_free(invtsc_mig_blocker);
Pablo Greco e6a3ae
-                goto fail2;
Pablo Greco e6a3ae
+                return r;
Pablo Greco e6a3ae
             }
Pablo Greco e6a3ae
             /* for savevm */
Pablo Greco e6a3ae
             vmstate_x86_cpu.unmigratable = 1;
Pablo Greco e6a3ae
@@ -1256,8 +1244,6 @@ int kvm_arch_init_vcpu(CPUState *cs)
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
  fail:
Pablo Greco e6a3ae
     migrate_del_blocker(invtsc_mig_blocker);
Pablo Greco e6a3ae
- fail2:
Pablo Greco e6a3ae
-    migrate_del_blocker(nested_virt_mig_blocker);
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
     return r;
Pablo Greco e6a3ae
 }
Pablo Greco e6a3ae
diff --git a/target/i386/kvm_i386.h b/target/i386/kvm_i386.h
Pablo Greco e6a3ae
index 1de9876..df9bbf3 100644
Pablo Greco e6a3ae
--- a/target/i386/kvm_i386.h
Pablo Greco e6a3ae
+++ b/target/i386/kvm_i386.h
Pablo Greco e6a3ae
@@ -41,6 +41,7 @@
Pablo Greco e6a3ae
 bool kvm_allows_irq0_override(void);
Pablo Greco e6a3ae
 bool kvm_has_smm(void);
Pablo Greco e6a3ae
 bool kvm_has_adjust_clock_stable(void);
Pablo Greco e6a3ae
+bool kvm_has_exception_payload(void);
Pablo Greco e6a3ae
 void kvm_synchronize_all_tsc(void);
Pablo Greco e6a3ae
 void kvm_arch_reset_vcpu(X86CPU *cs);
Pablo Greco e6a3ae
 void kvm_arch_do_init_vcpu(X86CPU *cs);
Pablo Greco e6a3ae
diff --git a/target/i386/machine.c b/target/i386/machine.c
Pablo Greco e6a3ae
index 5ffee8f..8d90d98 100644
Pablo Greco e6a3ae
--- a/target/i386/machine.c
Pablo Greco e6a3ae
+++ b/target/i386/machine.c
Pablo Greco e6a3ae
@@ -7,6 +7,7 @@
Pablo Greco e6a3ae
 #include "hw/i386/pc.h"
Pablo Greco e6a3ae
 #include "hw/isa/isa.h"
Pablo Greco e6a3ae
 #include "migration/cpu.h"
Pablo Greco e6a3ae
+#include "kvm_i386.h"
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
 #include "sysemu/kvm.h"
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
@@ -231,10 +232,25 @@ static int cpu_pre_save(void *opaque)
Pablo Greco e6a3ae
     }
Pablo Greco e6a3ae
 
Pablo Greco e6a3ae
 #ifdef CONFIG_KVM
Pablo Greco e6a3ae
-    /* Verify we have nested virtualization state from kernel if required */
Pablo Greco e6a3ae
-    if (kvm_enabled() && cpu_has_vmx(env) && !env->nested_state) {
Pablo Greco e6a3ae
-        error_report("Guest enabled nested virtualization but kernel "
Pablo Greco e6a3ae
-                "does not support saving of nested state");
Pablo Greco e6a3ae
+    /*
Pablo Greco e6a3ae
+     * In case vCPU may have enabled VMX, we need to make sure kernel have
Pablo Greco e6a3ae
+     * required capabilities in order to perform migration correctly:
Pablo Greco e6a3ae
+     *
Pablo Greco e6a3ae
+     * 1) We must be able to extract vCPU nested-state from KVM.
Pablo Greco e6a3ae
+     *
Pablo Greco e6a3ae
+     * 2) In case vCPU is running in guest-mode and it has a pending exception,
Pablo Greco e6a3ae
+     * we must be able to determine if it's in a pending or injected state.
Pablo Greco e6a3ae
+     * Note that in case KVM don't have required capability to do so,
Pablo Greco e6a3ae
+     * a pending/injected exception will always appear as an
Pablo Greco e6a3ae
+     * injected exception.
Pablo Greco e6a3ae
+     */
Pablo Greco e6a3ae
+    if (kvm_enabled() && cpu_vmx_maybe_enabled(env) &&
Pablo Greco e6a3ae
+        (!env->nested_state ||
Pablo Greco e6a3ae
+         (!kvm_has_exception_payload() && (env->hflags & HF_GUEST_MASK) &&
Pablo Greco e6a3ae
+          env->exception_injected))) {
Pablo Greco e6a3ae
+        error_report("Guest maybe enabled nested virtualization but kernel "
Pablo Greco e6a3ae
+                "does not support required capabilities to save vCPU "
Pablo Greco e6a3ae
+                "nested state");
Pablo Greco e6a3ae
         return -EINVAL;
Pablo Greco e6a3ae
     }
Pablo Greco e6a3ae
 #endif
Pablo Greco e6a3ae
-- 
Pablo Greco e6a3ae
1.8.3.1
Pablo Greco e6a3ae