Blame SOURCES/edk2-UefiCpuPkg-PiSmmCpuDxeSmm-pause-in-WaitForSemaphore-.patch

2fc9c1
From 70c9d989107c6ac964bb437c5a4ea6ffe3214e45 Mon Sep 17 00:00:00 2001
2fc9c1
From: Miroslav Rezanina <mrezanin@redhat.com>
2fc9c1
Date: Mon, 10 Aug 2020 07:52:28 +0200
2fc9c1
Subject: [PATCH] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before
2fc9c1
 re-fetch
2fc9c1
MIME-Version: 1.0
2fc9c1
Content-Type: text/plain; charset=UTF-8
2fc9c1
Content-Transfer-Encoding: 8bit
2fc9c1
2fc9c1
RH-Author: Laszlo Ersek <lersek@redhat.com>
2fc9c1
Message-id: <20200731141037.1941-2-lersek@redhat.com>
2fc9c1
Patchwork-id: 98121
2fc9c1
O-Subject: [RHEL-8.3.0 edk2 PATCH 1/1] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before re-fetch
2fc9c1
Bugzilla: 1861718
2fc9c1
RH-Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
2fc9c1
RH-Acked-by: Eduardo Habkost <ehabkost@redhat.com>
2fc9c1
2fc9c1
Most busy waits (spinlocks) in "UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c"
2fc9c1
already call CpuPause() in their loop bodies; see SmmWaitForApArrival(),
2fc9c1
APHandler(), and SmiRendezvous(). However, the "main wait" within
2fc9c1
APHandler():
2fc9c1
2fc9c1
>     //
2fc9c1
>     // Wait for something to happen
2fc9c1
>     //
2fc9c1
>     WaitForSemaphore (mSmmMpSyncData->CpuData[CpuIndex].Run);
2fc9c1
2fc9c1
doesn't do so, as WaitForSemaphore() keeps trying to acquire the semaphore
2fc9c1
without pausing.
2fc9c1
2fc9c1
The performance impact is especially notable in QEMU/KVM + OVMF
2fc9c1
virtualization with CPU overcommit (that is, when the guest has
2fc9c1
significantly more VCPUs than the host has physical CPUs). The guest BSP
2fc9c1
is working heavily in:
2fc9c1
2fc9c1
  BSPHandler()                  [MpService.c]
2fc9c1
    PerformRemainingTasks()     [PiSmmCpuDxeSmm.c]
2fc9c1
      SetUefiMemMapAttributes() [SmmCpuMemoryManagement.c]
2fc9c1
2fc9c1
while the many guest APs are spinning in the "Wait for something to
2fc9c1
happen" semaphore acquisition, in APHandler(). The guest APs are
2fc9c1
generating useless memory traffic and saturating host CPUs, hindering the
2fc9c1
guest BSP's progress in SetUefiMemMapAttributes().
2fc9c1
2fc9c1
Rework the loop in WaitForSemaphore(): call CpuPause() in every iteration
2fc9c1
after the first check fails. Due to Pause Loop Exiting (known as Pause
2fc9c1
Filter on AMD), the host scheduler can favor the guest BSP over the guest
2fc9c1
APs.
2fc9c1
2fc9c1
Running a 16 GB RAM + 512 VCPU guest on a 448 PCPU host, this patch
2fc9c1
reduces OVMF boot time (counted until reaching grub) from 20-30 minutes to
2fc9c1
less than 4 minutes.
2fc9c1
2fc9c1
The patch should benefit physical machines as well -- according to the
2fc9c1
Intel SDM, PAUSE "Improves the performance of spin-wait loops". Adding
2fc9c1
PAUSE to the generic WaitForSemaphore() function is considered a general
2fc9c1
improvement.
2fc9c1
2fc9c1
Cc: Eric Dong <eric.dong@intel.com>
2fc9c1
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
2fc9c1
Cc: Rahul Kumar <rahul1.kumar@intel.com>
2fc9c1
Cc: Ray Ni <ray.ni@intel.com>
2fc9c1
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1861718
2fc9c1
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
2fc9c1
Message-Id: <20200729185217.10084-1-lersek@redhat.com>
2fc9c1
Reviewed-by: Eric Dong <eric.dong@intel.com>
2fc9c1
(cherry picked from commit 9001b750df64b25b14ec45a2efa1361a7b96c00a)
2fc9c1
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
2fc9c1
---
2fc9c1
 UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 18 +++++++++++-------
2fc9c1
 1 file changed, 11 insertions(+), 7 deletions(-)
2fc9c1
2fc9c1
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
2fc9c1
index 57e788c..4bcd217 100644
2fc9c1
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
2fc9c1
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
2fc9c1
@@ -40,14 +40,18 @@ WaitForSemaphore (
2fc9c1
 {
2fc9c1
   UINT32                            Value;
2fc9c1
 
2fc9c1
-  do {
2fc9c1
+  for (;;) {
2fc9c1
     Value = *Sem;
2fc9c1
-  } while (Value == 0 ||
2fc9c1
-           InterlockedCompareExchange32 (
2fc9c1
-             (UINT32*)Sem,
2fc9c1
-             Value,
2fc9c1
-             Value - 1
2fc9c1
-             ) != Value);
2fc9c1
+    if (Value != 0 &&
2fc9c1
+        InterlockedCompareExchange32 (
2fc9c1
+          (UINT32*)Sem,
2fc9c1
+          Value,
2fc9c1
+          Value - 1
2fc9c1
+          ) == Value) {
2fc9c1
+      break;
2fc9c1
+    }
2fc9c1
+    CpuPause ();
2fc9c1
+  }
2fc9c1
   return Value - 1;
2fc9c1
 }
2fc9c1
 
2fc9c1
-- 
2fc9c1
1.8.3.1
2fc9c1