Blame SOURCES/0199-ieee1275-claim-more-memory.patch

8e15ce
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
8e15ce
From: Daniel Axtens <dja@axtens.net>
8e15ce
Date: Wed, 15 Apr 2020 23:28:29 +1000
8e15ce
Subject: [PATCH] ieee1275: claim more memory
8e15ce
8e15ce
On powerpc-ieee1275, we are running out of memory trying to verify
8e15ce
anything. This is because:
8e15ce
8e15ce
 - we have to load an entire file into memory to verify it. This is
8e15ce
   extremely difficult to change with appended signatures.
8e15ce
 - We only have 32MB of heap.
8e15ce
 - Distro kernels are now often around 30MB.
8e15ce
8e15ce
So we want to claim more memory from OpenFirmware for our heap.
8e15ce
8e15ce
There are some complications:
8e15ce
8e15ce
 - The grub mm code isn't the only thing that will make claims on
8e15ce
   memory from OpenFirmware:
8e15ce
8e15ce
    * PFW/SLOF will have claimed some for their own use.
8e15ce
8e15ce
    * The ieee1275 loader will try to find other bits of memory that we
8e15ce
      haven't claimed to place the kernel and initrd when we go to boot.
8e15ce
8e15ce
    * Once we load Linux, it will also try to claim memory. It claims
8e15ce
      memory without any reference to /memory/available, it just starts
8e15ce
      at min(top of RMO, 768MB) and works down. So we need to avoid this
8e15ce
      area. See arch/powerpc/kernel/prom_init.c as of v5.11.
8e15ce
8e15ce
 - The smallest amount of memory a ppc64 KVM guest can have is 256MB.
8e15ce
   It doesn't work with distro kernels but can work with custom kernels.
8e15ce
   We should maintain support for that. (ppc32 can boot with even less,
8e15ce
   and we shouldn't break that either.)
8e15ce
8e15ce
 - Even if a VM has more memory, the memory OpenFirmware makes available
8e15ce
   as Real Memory Area can be restricted. A freshly created LPAR on a
8e15ce
   PowerVM machine is likely to have only 256MB available to OpenFirmware
8e15ce
   even if it has many gigabytes of memory allocated.
8e15ce
8e15ce
EFI systems will attempt to allocate 1/4th of the available memory,
8e15ce
clamped to between 1M and 1600M. That seems like a good sort of
8e15ce
approach, we just need to figure out if 1/4 is the right fraction
8e15ce
for us.
8e15ce
8e15ce
We don't know in advance how big the kernel and initrd are going to be,
8e15ce
which makes figuring out how much memory we can take a bit tricky.
8e15ce
8e15ce
To figure out how much memory we should leave unused, I looked at:
8e15ce
8e15ce
 - an Ubuntu 20.04.1 ppc64le pseries KVM guest:
8e15ce
    vmlinux: ~30MB
8e15ce
    initrd:  ~50MB
8e15ce
8e15ce
 - a RHEL8.2 ppc64le pseries KVM guest:
8e15ce
    vmlinux: ~30MB
8e15ce
    initrd:  ~30MB
8e15ce
8e15ce
Ubuntu VMs struggle to boot with just 256MB under SLOF.
8e15ce
RHEL likewise has a higher minimum supported memory figure.
8e15ce
So lets first consider a distro kernel and 512MB of addressible memory.
8e15ce
(This is the default case for anything booting under PFW.) Say we lose
8e15ce
131MB to PFW (based on some tests). This leaves us 381MB. 1/4 of 381MB
8e15ce
is ~95MB. That should be enough to verify a 30MB vmlinux and should
8e15ce
leave plenty of space to load Linux and the initrd.
8e15ce
8e15ce
If we consider 256MB of RMA under PFW, we have just 125MB remaining. 1/4
8e15ce
of that is a smidge under 32MB, which gives us very poor odds of verifying
8e15ce
a distro-sized kernel. However, if we need 80MB just to put the kernel
8e15ce
and initrd in memory, we can't claim any more than 45MB anyway. So 1/4
8e15ce
will do. We'll come back to this later.
8e15ce
8e15ce
grub is always built as a 32-bit binary, even if it's loading a ppc64
8e15ce
kernel. So we can't address memory beyond 4GB. This gives a natural cap
8e15ce
of 1GB for powerpc-ieee1275.
8e15ce
8e15ce
Also apply this 1/4 approach to i386-ieee1275, but keep the 32MB cap.
8e15ce
8e15ce
make check still works for both i386 and powerpc and I've booted
8e15ce
powerpc grub with this change under SLOF and PFW.
8e15ce
8e15ce
Signed-off-by: Daniel Axtens <dja@axtens.net>
8e15ce
---
8e15ce
 grub-core/kern/ieee1275/init.c | 81 +++++++++++++++++++++++++++++++++---------
8e15ce
 docs/grub-dev.texi             |  6 ++--
8e15ce
 2 files changed, 69 insertions(+), 18 deletions(-)
8e15ce
8e15ce
diff --git a/grub-core/kern/ieee1275/init.c b/grub-core/kern/ieee1275/init.c
8e15ce
index 0dcd114ce54..c61d91a0285 100644
8e15ce
--- a/grub-core/kern/ieee1275/init.c
8e15ce
+++ b/grub-core/kern/ieee1275/init.c
8e15ce
@@ -46,11 +46,12 @@
8e15ce
 #endif
8e15ce
 #include <grub/lockdown.h>
8e15ce
 
8e15ce
-/* The maximum heap size we're going to claim */
8e15ce
+/* The maximum heap size we're going to claim. Not used by sparc.
8e15ce
+   We allocate 1/4 of the available memory under 4G, up to this limit. */
8e15ce
 #ifdef __i386__
8e15ce
 #define HEAP_MAX_SIZE		(unsigned long) (64 * 1024 * 1024)
8e15ce
-#else
8e15ce
-#define HEAP_MAX_SIZE		(unsigned long) (32 * 1024 * 1024)
8e15ce
+#else // __powerpc__
8e15ce
+#define HEAP_MAX_SIZE		(unsigned long) (1 * 1024 * 1024 * 1024)
8e15ce
 #endif
8e15ce
 
8e15ce
 extern char _end[];
8e15ce
@@ -147,16 +148,45 @@ grub_claim_heap (void)
8e15ce
 				 + GRUB_KERNEL_MACHINE_STACK_SIZE), 0x200000);
8e15ce
 }
8e15ce
 #else
8e15ce
-/* Helper for grub_claim_heap.  */
8e15ce
+/* Helper for grub_claim_heap on powerpc. */
8e15ce
+static int
8e15ce
+heap_size (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
8e15ce
+	   void *data)
8e15ce
+{
8e15ce
+  grub_uint32_t total = *(grub_uint32_t *)data;
8e15ce
+
8e15ce
+  if (type != GRUB_MEMORY_AVAILABLE)
8e15ce
+    return 0;
8e15ce
+
8e15ce
+  /* Do not consider memory beyond 4GB */
8e15ce
+  if (addr > 0xffffffffUL)
8e15ce
+    return 0;
8e15ce
+
8e15ce
+  if (addr + len > 0xffffffffUL)
8e15ce
+    len = 0xffffffffUL - addr;
8e15ce
+
8e15ce
+  total += len;
8e15ce
+  *(grub_uint32_t *)data = total;
8e15ce
+
8e15ce
+  return 0;
8e15ce
+}
8e15ce
+
8e15ce
 static int
8e15ce
 heap_init (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
8e15ce
 	   void *data)
8e15ce
 {
8e15ce
-  unsigned long *total = data;
8e15ce
+  grub_uint32_t total = *(grub_uint32_t *)data;
8e15ce
 
8e15ce
   if (type != GRUB_MEMORY_AVAILABLE)
8e15ce
     return 0;
8e15ce
 
8e15ce
+  /* Do not consider memory beyond 4GB */
8e15ce
+  if (addr > 0xffffffffUL)
8e15ce
+    return 0;
8e15ce
+
8e15ce
+  if (addr + len > 0xffffffffUL)
8e15ce
+    len = 0xffffffffUL - addr;
8e15ce
+
8e15ce
   if (grub_ieee1275_test_flag (GRUB_IEEE1275_FLAG_NO_PRE1_5M_CLAIM))
8e15ce
     {
8e15ce
       if (addr + len <= 0x180000)
8e15ce
@@ -170,10 +200,6 @@ heap_init (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
8e15ce
     }
8e15ce
   len -= 1; /* Required for some firmware.  */
8e15ce
 
8e15ce
-  /* Never exceed HEAP_MAX_SIZE  */
8e15ce
-  if (*total + len > HEAP_MAX_SIZE)
8e15ce
-    len = HEAP_MAX_SIZE - *total;
8e15ce
-
8e15ce
   /* In theory, firmware should already prevent this from happening by not
8e15ce
      listing our own image in /memory/available.  The check below is intended
8e15ce
      as a safeguard in case that doesn't happen.  However, it doesn't protect
8e15ce
@@ -185,6 +211,18 @@ heap_init (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
8e15ce
       len = 0;
8e15ce
     }
8e15ce
 
8e15ce
+  /* If this block contains 0x30000000 (768MB), do not claim below that.
8e15ce
+     Linux likes to claim memory at min(RMO top, 768MB) and works down
8e15ce
+     without reference to /memory/available. */
8e15ce
+  if ((addr < 0x30000000) && ((addr + len) > 0x30000000))
8e15ce
+    {
8e15ce
+      len = len - (0x30000000 - addr);
8e15ce
+      addr = 0x30000000;
8e15ce
+    }
8e15ce
+
8e15ce
+  if (len > total)
8e15ce
+    len = total;
8e15ce
+
8e15ce
   if (len)
8e15ce
     {
8e15ce
       grub_err_t err;
8e15ce
@@ -193,10 +231,12 @@ heap_init (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
8e15ce
       if (err)
8e15ce
 	return err;
8e15ce
       grub_mm_init_region ((void *) (grub_addr_t) addr, len);
8e15ce
+      total -= len;
8e15ce
     }
8e15ce
 
8e15ce
-  *total += len;
8e15ce
-  if (*total >= HEAP_MAX_SIZE)
8e15ce
+  *(grub_uint32_t *)data = total;
8e15ce
+
8e15ce
+  if (total == 0)
8e15ce
     return 1;
8e15ce
 
8e15ce
   return 0;
8e15ce
@@ -205,13 +245,22 @@ heap_init (grub_uint64_t addr, grub_uint64_t len, grub_memory_type_t type,
8e15ce
 static void 
8e15ce
 grub_claim_heap (void)
8e15ce
 {
8e15ce
-  unsigned long total = 0;
8e15ce
+  grub_uint32_t total = 0;
8e15ce
 
8e15ce
   if (grub_ieee1275_test_flag (GRUB_IEEE1275_FLAG_FORCE_CLAIM))
8e15ce
-    heap_init (GRUB_IEEE1275_STATIC_HEAP_START, GRUB_IEEE1275_STATIC_HEAP_LEN,
8e15ce
-	       1, &total);
8e15ce
-  else
8e15ce
-    grub_machine_mmap_iterate (heap_init, &total);
8e15ce
+    {
8e15ce
+      heap_init (GRUB_IEEE1275_STATIC_HEAP_START, GRUB_IEEE1275_STATIC_HEAP_LEN,
8e15ce
+		 1, &total);
8e15ce
+      return;
8e15ce
+    }
8e15ce
+
8e15ce
+  grub_machine_mmap_iterate (heap_size, &total);
8e15ce
+
8e15ce
+  total = total / 4;
8e15ce
+  if (total > HEAP_MAX_SIZE)
8e15ce
+    total = HEAP_MAX_SIZE;
8e15ce
+
8e15ce
+  grub_machine_mmap_iterate (heap_init, &total);
8e15ce
 }
8e15ce
 #endif
8e15ce
 
8e15ce
diff --git a/docs/grub-dev.texi b/docs/grub-dev.texi
8e15ce
index 19f708ee662..90083772c8a 100644
8e15ce
--- a/docs/grub-dev.texi
8e15ce
+++ b/docs/grub-dev.texi
8e15ce
@@ -1047,7 +1047,9 @@ space is limited to 4GiB. GRUB allocates pages from EFI for its heap, at most
8e15ce
 1.6 GiB.
8e15ce
 
8e15ce
 On i386-ieee1275 and powerpc-ieee1275 GRUB uses same stack as IEEE1275.
8e15ce
-It allocates at most 32MiB for its heap.
8e15ce
+
8e15ce
+On i386-ieee1275, GRUB allocates at most 32MiB for its heap. On
8e15ce
+powerpc-ieee1275, GRUB allocates up to 1GiB.
8e15ce
 
8e15ce
 On sparc64-ieee1275 stack is 256KiB and heap is 2MiB.
8e15ce
 
8e15ce
@@ -1075,7 +1077,7 @@ In short:
8e15ce
 @item i386-qemu               @tab 60 KiB  @tab < 4 GiB
8e15ce
 @item *-efi                   @tab ?       @tab < 1.6 GiB
8e15ce
 @item i386-ieee1275           @tab ?       @tab < 32 MiB
8e15ce
-@item powerpc-ieee1275        @tab ?       @tab < 32 MiB
8e15ce
+@item powerpc-ieee1275        @tab ?       @tab < 1 GiB
8e15ce
 @item sparc64-ieee1275        @tab 256KiB  @tab 2 MiB
8e15ce
 @item arm-uboot               @tab 256KiB  @tab 2 MiB
8e15ce
 @item mips(el)-qemu_mips      @tab 2MiB    @tab 253 MiB