Blame SOURCES/edk2-UefiCpuPkg-PiSmmCpuDxeSmm-pause-in-WaitForSemaphore-.patch

82dd91
From 70c9d989107c6ac964bb437c5a4ea6ffe3214e45 Mon Sep 17 00:00:00 2001
82dd91
From: Miroslav Rezanina <mrezanin@redhat.com>
82dd91
Date: Mon, 10 Aug 2020 07:52:28 +0200
82dd91
Subject: [PATCH] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before
82dd91
 re-fetch
82dd91
MIME-Version: 1.0
82dd91
Content-Type: text/plain; charset=UTF-8
82dd91
Content-Transfer-Encoding: 8bit
82dd91
82dd91
RH-Author: Laszlo Ersek <lersek@redhat.com>
82dd91
Message-id: <20200731141037.1941-2-lersek@redhat.com>
82dd91
Patchwork-id: 98121
82dd91
O-Subject: [RHEL-8.3.0 edk2 PATCH 1/1] UefiCpuPkg/PiSmmCpuDxeSmm: pause in WaitForSemaphore() before re-fetch
82dd91
Bugzilla: 1861718
82dd91
RH-Acked-by: Vitaly Kuznetsov <vkuznets@redhat.com>
82dd91
RH-Acked-by: Eduardo Habkost <ehabkost@redhat.com>
82dd91
82dd91
Most busy waits (spinlocks) in "UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c"
82dd91
already call CpuPause() in their loop bodies; see SmmWaitForApArrival(),
82dd91
APHandler(), and SmiRendezvous(). However, the "main wait" within
82dd91
APHandler():
82dd91
82dd91
>     //
82dd91
>     // Wait for something to happen
82dd91
>     //
82dd91
>     WaitForSemaphore (mSmmMpSyncData->CpuData[CpuIndex].Run);
82dd91
82dd91
doesn't do so, as WaitForSemaphore() keeps trying to acquire the semaphore
82dd91
without pausing.
82dd91
82dd91
The performance impact is especially notable in QEMU/KVM + OVMF
82dd91
virtualization with CPU overcommit (that is, when the guest has
82dd91
significantly more VCPUs than the host has physical CPUs). The guest BSP
82dd91
is working heavily in:
82dd91
82dd91
  BSPHandler()                  [MpService.c]
82dd91
    PerformRemainingTasks()     [PiSmmCpuDxeSmm.c]
82dd91
      SetUefiMemMapAttributes() [SmmCpuMemoryManagement.c]
82dd91
82dd91
while the many guest APs are spinning in the "Wait for something to
82dd91
happen" semaphore acquisition, in APHandler(). The guest APs are
82dd91
generating useless memory traffic and saturating host CPUs, hindering the
82dd91
guest BSP's progress in SetUefiMemMapAttributes().
82dd91
82dd91
Rework the loop in WaitForSemaphore(): call CpuPause() in every iteration
82dd91
after the first check fails. Due to Pause Loop Exiting (known as Pause
82dd91
Filter on AMD), the host scheduler can favor the guest BSP over the guest
82dd91
APs.
82dd91
82dd91
Running a 16 GB RAM + 512 VCPU guest on a 448 PCPU host, this patch
82dd91
reduces OVMF boot time (counted until reaching grub) from 20-30 minutes to
82dd91
less than 4 minutes.
82dd91
82dd91
The patch should benefit physical machines as well -- according to the
82dd91
Intel SDM, PAUSE "Improves the performance of spin-wait loops". Adding
82dd91
PAUSE to the generic WaitForSemaphore() function is considered a general
82dd91
improvement.
82dd91
82dd91
Cc: Eric Dong <eric.dong@intel.com>
82dd91
Cc: Philippe Mathieu-Daudé <philmd@redhat.com>
82dd91
Cc: Rahul Kumar <rahul1.kumar@intel.com>
82dd91
Cc: Ray Ni <ray.ni@intel.com>
82dd91
Ref: https://bugzilla.redhat.com/show_bug.cgi?id=1861718
82dd91
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
82dd91
Message-Id: <20200729185217.10084-1-lersek@redhat.com>
82dd91
Reviewed-by: Eric Dong <eric.dong@intel.com>
82dd91
(cherry picked from commit 9001b750df64b25b14ec45a2efa1361a7b96c00a)
82dd91
Signed-off-by: Miroslav Rezanina <mrezanin@redhat.com>
82dd91
---
82dd91
 UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c | 18 +++++++++++-------
82dd91
 1 file changed, 11 insertions(+), 7 deletions(-)
82dd91
82dd91
diff --git a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
82dd91
index 57e788c..4bcd217 100644
82dd91
--- a/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
82dd91
+++ b/UefiCpuPkg/PiSmmCpuDxeSmm/MpService.c
82dd91
@@ -40,14 +40,18 @@ WaitForSemaphore (
82dd91
 {
82dd91
   UINT32                            Value;
82dd91
 
82dd91
-  do {
82dd91
+  for (;;) {
82dd91
     Value = *Sem;
82dd91
-  } while (Value == 0 ||
82dd91
-           InterlockedCompareExchange32 (
82dd91
-             (UINT32*)Sem,
82dd91
-             Value,
82dd91
-             Value - 1
82dd91
-             ) != Value);
82dd91
+    if (Value != 0 &&
82dd91
+        InterlockedCompareExchange32 (
82dd91
+          (UINT32*)Sem,
82dd91
+          Value,
82dd91
+          Value - 1
82dd91
+          ) == Value) {
82dd91
+      break;
82dd91
+    }
82dd91
+    CpuPause ();
82dd91
+  }
82dd91
   return Value - 1;
82dd91
 }
82dd91
 
82dd91
-- 
82dd91
1.8.3.1
82dd91