Blob Blame History Raw
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Jeremy Cline <jcline@redhat.com>
Date: Tue, 23 Jul 2019 15:24:30 +0000
Subject: [PATCH] kdump: add support for crashkernel=auto

Rebased for v5.3-rc1 because the documentation has moved.

    Message-id: <20180604013831.574215750@redhat.com>
    Patchwork-id: 8166
    O-Subject: [kernel team] [PATCH RHEL8.0 V2 2/2] kdump: add support for crashkernel=auto
    Bugzilla: 1507353
    RH-Acked-by: Don Zickus <dzickus@redhat.com>
    RH-Acked-by: Baoquan He <bhe@redhat.com>
    RH-Acked-by: Pingfan Liu <piliu@redhat.com>

    Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1507353
    Build: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16534135
    Tested: ppc64le, x86_64 with several memory sizes.
            kdump qe tested 160M on various x86 machines in lab.

    We continue to provide crashkernel=auto like we did in RHEL6
    and RHEL7,  this will simplify the kdump deployment for common
    use cases that kdump just works with the auto reserved values.
    But this is still a best effort estimation, we can not know the
    exact memory requirement because it depends on a lot of different
    factors.

    The implementation of crashkernel=auto is simplified as a wrapper
    to use below kernel cmdline:
    x86_64: crashkernel=1G-64G:160M,64G-1T:256M,1T-:512M
    s390x:  crashkernel=4G-64G:160M,64G-1T:256M,1T-:512M
    arm64:  crashkernel=2G-:512M
    ppc64:  crashkernel=2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G

    The difference between this way and the old implementation in
    RHEL6/7 is we do not scale the crash reserved memory size according
    to system memory size anymore.

    Latest effort to move upstream is below thread:
    https://lkml.org/lkml/2018/5/20/262
    But unfortunately it is still unlikely to be accepted, thus we
    will still use a RHEL only patch in RHEL8.

    Copied old patch description about the history reason see below:
    '''
        Non-upstream explanations:
        Besides "crashkenrel=X@Y" format, upstream also has advanced
        "crashkernel=range1:size1[,range2:size2,...][@offset]", and
        "crashkernel=X,high{low}" formats, but they need more careful
        manual configuration, and have different values for different
        architectures.

        Most of the distributions use the standard "crashkernel=X@Y"
        upstream format, and use crashkernel range format for advanced
        scenarios, heavily relying on the user's involvement.

        While "crashkernel=auto" is redhat's special feature, it exists
        and has been used as the default boot cmdline since 2008 rhel6.
        It does not require users to figure out how many crash memory
        size for their systems, also has been proved to be able to work
        pretty well for common scenarios.

        "crashkernel=auto" was tested/based on rhel-related products, as
        we have stable kernel configurations which means more or less
        stable memory consumption. In 2014 we tried to post them again to
        upstream but NACKed by people because they think it's not general
        and unnecessary, users can specify their own values or do that by
        scripts. However our customers insist on having it added to rhel.

        Also see one previous discussion related to this backport to Pegas:
        On 10/17/2016 at 10:15 PM, Don Zickus wrote:
        > On Fri, Oct 14, 2016 at 10:57:41AM +0800, Dave Young wrote:
        >> Don, agree with you we should evaluate them instead of just inherit
        >> them blindly. Below is what I think about kdump auto memory:
        >> There are two issues for crashkernel=auto in upstream:
        >> 1) It will be seen as a policy which should not go to kernel
        >> 2) It is hard to get a good number for the crash reserved size,
        >> considering various different kernel config options one can setups.
        >> In RHEL we are easier because our supported Kconfig is limited.
        >> I digged the upstream mail archive, but I'm not sure I got all the
        >> information, at least Michael Ellerman was objecting the series for
        >> 1).
        > Yes, I know.  Vivek and I have argued about this for years.  :-)
        >
        > I had hoped all the changes internally to the makedumpfile would allow
        > the memory configuration to stabilize at a number like 192M or 128M and
        > only in the rare cases extend beyond that.
        >
        > So I always treated that as a temporary hack until things were better.
        > With the hope of every new RHEL release we get smarter and better. :-)
        > Ideally it would be great if we could get the number down to 64M for most
        > cases and just turn it on in Fedora.  Maybe someday.... ;-)
        >
        > We can have this conversation when the patch gets reposted/refreshed
        > for upstream on rhkl?
        >
        > Cheers,
        > Don

        We had proposed to drop the historic crashkernel=auto code and move
        to use crashkernel=range:size format and pass them in anaconda.

        The initial reason is crashkernel=range:size works just fine because
        we do not need complex algorithm to scale crashkernel reserved size
        any more.  The old linear scaling is mainly for old makedumpfile
        requirements, now it is not necessary.

        But With the new approach, backward compatibility is potentially at risk.
        For e.g. let's consider the following cases:
        1) When we upgrade from an older distribution like rhel-alt-7.4(which
        uses crashkernel=auto) to rhel-alt-7.5 (which uses the crashkernel=xY
        format)
        In this case we can use anaconda scripts for checking
        'crashkernel=auto' in kernel spec and update to the new
        'crashkernel=range:size' format.
        2) When we upgrade from rhel-alt-7.5(which uses crashkernel=xY format)
        to rhel-alt-7.6(which uses crashkernel=xY format), but the x and/or Y
        values are changed in rhel-alt-7.6.
        For example from crashkernel=2G-:160M to crashkernel=2G-:192M, then we have
        no way to determine if the X and/or Y values were distribution
        provided or user specified ones.
        Since it is recommended to give precedence to user-specified values,
        so we cannot do an upgrade in such a case."

        Thus turn back to resolve it in kernel, and add a simpler version
        which just hacks to use the range:size style in code, and make
        rhel-only code easily to maintain.
    '''

    Signed-off-by: Dave Young <dyoung@redhat.com>
    Signed-off-by: Herton R. Krzesinski <herton@redhat.com>

Upstream Status: RHEL only
Signed-off-by: Jeremy Cline <jcline@redhat.com>
---
 Documentation/admin-guide/kdump/kdump.rst | 11 +++++++++++
 kernel/crash_core.c                       | 14 ++++++++++++++
 2 files changed, 25 insertions(+)

diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
index 2da65fef2a1c..d53a524f80f0 100644
--- a/Documentation/admin-guide/kdump/kdump.rst
+++ b/Documentation/admin-guide/kdump/kdump.rst
@@ -285,6 +285,17 @@ This would mean:
     2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
     3) if the RAM size is larger than 2G, then reserve 128M

+Or you can use crashkernel=auto if you have enough memory.  The threshold
+is 2G on x86_64, arm64, ppc64 and ppc64le. The threshold is 4G for s390x.
+If your system memory is less than the threshold crashkernel=auto will not
+reserve memory.
+
+The automatically reserved memory size varies based on architecture.
+The size changes according to system memory size like below:
+    x86_64: 1G-64G:160M,64G-1T:256M,1T-:512M
+    s390x:  4G-64G:160M,64G-1T:256M,1T-:512M
+    arm64:  2G-:512M
+    ppc64:  2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G


 Boot into System Kernel
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
index e4dfe2a05a31..8c6f59932247 100644
--- a/kernel/crash_core.c
+++ b/kernel/crash_core.c
@@ -258,6 +258,20 @@ static int __init __parse_crashkernel(char *cmdline,
 	if (suffix)
 		return parse_crashkernel_suffix(ck_cmdline, crash_size,
 				suffix);
+
+	if (strncmp(ck_cmdline, "auto", 4) == 0) {
+#ifdef CONFIG_X86_64
+		ck_cmdline = "1G-64G:160M,64G-1T:256M,1T-:512M";
+#elif defined(CONFIG_S390)
+		ck_cmdline = "4G-64G:160M,64G-1T:256M,1T-:512M";
+#elif defined(CONFIG_ARM64)
+		ck_cmdline = "2G-:512M";
+#elif defined(CONFIG_PPC64)
+		ck_cmdline = "2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G";
+#endif
+		pr_info("Using crashkernel=auto, the size choosed is a best effort estimation.\n");
+	}
+
 	/*
 	 * if the commandline contains a ':', then that's the extended
 	 * syntax -- if not, it must be the classic syntax
-- 
2.28.0