chengshan / rpms / kernel

Forked from rpms/kernel 2 years ago
Clone
Justin Vreeland 794d92
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
Justin Vreeland 794d92
From: Jeremy Cline <jcline@redhat.com>
Justin Vreeland 794d92
Date: Tue, 23 Jul 2019 15:24:30 +0000
Justin Vreeland 794d92
Subject: [PATCH] kdump: add support for crashkernel=auto
Justin Vreeland 794d92
Justin Vreeland 794d92
Rebased for v5.3-rc1 because the documentation has moved.
Justin Vreeland 794d92
Justin Vreeland 794d92
    Message-id: <20180604013831.574215750@redhat.com>
Justin Vreeland 794d92
    Patchwork-id: 8166
Justin Vreeland 794d92
    O-Subject: [kernel team] [PATCH RHEL8.0 V2 2/2] kdump: add support for crashkernel=auto
Justin Vreeland 794d92
    Bugzilla: 1507353
Justin Vreeland 794d92
    RH-Acked-by: Don Zickus <dzickus@redhat.com>
Justin Vreeland 794d92
    RH-Acked-by: Baoquan He <bhe@redhat.com>
Justin Vreeland 794d92
    RH-Acked-by: Pingfan Liu <piliu@redhat.com>
Justin Vreeland 794d92
Justin Vreeland 794d92
    Bugzilla: https://bugzilla.redhat.com/show_bug.cgi?id=1507353
Justin Vreeland 794d92
    Build: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=16534135
Justin Vreeland 794d92
    Tested: ppc64le, x86_64 with several memory sizes.
Justin Vreeland 794d92
            kdump qe tested 160M on various x86 machines in lab.
Justin Vreeland 794d92
Justin Vreeland 794d92
    We continue to provide crashkernel=auto like we did in RHEL6
Justin Vreeland 794d92
    and RHEL7,  this will simplify the kdump deployment for common
Justin Vreeland 794d92
    use cases that kdump just works with the auto reserved values.
Justin Vreeland 794d92
    But this is still a best effort estimation, we can not know the
Justin Vreeland 794d92
    exact memory requirement because it depends on a lot of different
Justin Vreeland 794d92
    factors.
Justin Vreeland 794d92
Justin Vreeland 794d92
    The implementation of crashkernel=auto is simplified as a wrapper
Justin Vreeland 794d92
    to use below kernel cmdline:
Justin Vreeland 794d92
    x86_64: crashkernel=1G-64G:160M,64G-1T:256M,1T-:512M
Justin Vreeland 794d92
    s390x:  crashkernel=4G-64G:160M,64G-1T:256M,1T-:512M
Justin Vreeland 794d92
    arm64:  crashkernel=2G-:512M
Justin Vreeland 794d92
    ppc64:  crashkernel=2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G
Justin Vreeland 794d92
Justin Vreeland 794d92
    The difference between this way and the old implementation in
Justin Vreeland 794d92
    RHEL6/7 is we do not scale the crash reserved memory size according
Justin Vreeland 794d92
    to system memory size anymore.
Justin Vreeland 794d92
Justin Vreeland 794d92
    Latest effort to move upstream is below thread:
Justin Vreeland 794d92
    https://lkml.org/lkml/2018/5/20/262
Justin Vreeland 794d92
    But unfortunately it is still unlikely to be accepted, thus we
Justin Vreeland 794d92
    will still use a RHEL only patch in RHEL8.
Justin Vreeland 794d92
Justin Vreeland 794d92
    Copied old patch description about the history reason see below:
Justin Vreeland 794d92
    '''
Justin Vreeland 794d92
        Non-upstream explanations:
Justin Vreeland 794d92
        Besides "crashkenrel=X@Y" format, upstream also has advanced
Justin Vreeland 794d92
        "crashkernel=range1:size1[,range2:size2,...][@offset]", and
Justin Vreeland 794d92
        "crashkernel=X,high{low}" formats, but they need more careful
Justin Vreeland 794d92
        manual configuration, and have different values for different
Justin Vreeland 794d92
        architectures.
Justin Vreeland 794d92
Justin Vreeland 794d92
        Most of the distributions use the standard "crashkernel=X@Y"
Justin Vreeland 794d92
        upstream format, and use crashkernel range format for advanced
Justin Vreeland 794d92
        scenarios, heavily relying on the user's involvement.
Justin Vreeland 794d92
Justin Vreeland 794d92
        While "crashkernel=auto" is redhat's special feature, it exists
Justin Vreeland 794d92
        and has been used as the default boot cmdline since 2008 rhel6.
Justin Vreeland 794d92
        It does not require users to figure out how many crash memory
Justin Vreeland 794d92
        size for their systems, also has been proved to be able to work
Justin Vreeland 794d92
        pretty well for common scenarios.
Justin Vreeland 794d92
Justin Vreeland 794d92
        "crashkernel=auto" was tested/based on rhel-related products, as
Justin Vreeland 794d92
        we have stable kernel configurations which means more or less
Justin Vreeland 794d92
        stable memory consumption. In 2014 we tried to post them again to
Justin Vreeland 794d92
        upstream but NACKed by people because they think it's not general
Justin Vreeland 794d92
        and unnecessary, users can specify their own values or do that by
Justin Vreeland 794d92
        scripts. However our customers insist on having it added to rhel.
Justin Vreeland 794d92
Justin Vreeland 794d92
        Also see one previous discussion related to this backport to Pegas:
Justin Vreeland 794d92
        On 10/17/2016 at 10:15 PM, Don Zickus wrote:
Justin Vreeland 794d92
        > On Fri, Oct 14, 2016 at 10:57:41AM +0800, Dave Young wrote:
Justin Vreeland 794d92
        >> Don, agree with you we should evaluate them instead of just inherit
Justin Vreeland 794d92
        >> them blindly. Below is what I think about kdump auto memory:
Justin Vreeland 794d92
        >> There are two issues for crashkernel=auto in upstream:
Justin Vreeland 794d92
        >> 1) It will be seen as a policy which should not go to kernel
Justin Vreeland 794d92
        >> 2) It is hard to get a good number for the crash reserved size,
Justin Vreeland 794d92
        >> considering various different kernel config options one can setups.
Justin Vreeland 794d92
        >> In RHEL we are easier because our supported Kconfig is limited.
Justin Vreeland 794d92
        >> I digged the upstream mail archive, but I'm not sure I got all the
Justin Vreeland 794d92
        >> information, at least Michael Ellerman was objecting the series for
Justin Vreeland 794d92
        >> 1).
Justin Vreeland 794d92
        > Yes, I know.  Vivek and I have argued about this for years.  :-)
Justin Vreeland 794d92
        >
Justin Vreeland 794d92
        > I had hoped all the changes internally to the makedumpfile would allow
Justin Vreeland 794d92
        > the memory configuration to stabilize at a number like 192M or 128M and
Justin Vreeland 794d92
        > only in the rare cases extend beyond that.
Justin Vreeland 794d92
        >
Justin Vreeland 794d92
        > So I always treated that as a temporary hack until things were better.
Justin Vreeland 794d92
        > With the hope of every new RHEL release we get smarter and better. :-)
Justin Vreeland 794d92
        > Ideally it would be great if we could get the number down to 64M for most
Justin Vreeland 794d92
        > cases and just turn it on in Fedora.  Maybe someday.... ;-)
Justin Vreeland 794d92
        >
Justin Vreeland 794d92
        > We can have this conversation when the patch gets reposted/refreshed
Justin Vreeland 794d92
        > for upstream on rhkl?
Justin Vreeland 794d92
        >
Justin Vreeland 794d92
        > Cheers,
Justin Vreeland 794d92
        > Don
Justin Vreeland 794d92
Justin Vreeland 794d92
        We had proposed to drop the historic crashkernel=auto code and move
Justin Vreeland 794d92
        to use crashkernel=range:size format and pass them in anaconda.
Justin Vreeland 794d92
Justin Vreeland 794d92
        The initial reason is crashkernel=range:size works just fine because
Justin Vreeland 794d92
        we do not need complex algorithm to scale crashkernel reserved size
Justin Vreeland 794d92
        any more.  The old linear scaling is mainly for old makedumpfile
Justin Vreeland 794d92
        requirements, now it is not necessary.
Justin Vreeland 794d92
Justin Vreeland 794d92
        But With the new approach, backward compatibility is potentially at risk.
Justin Vreeland 794d92
        For e.g. let's consider the following cases:
Justin Vreeland 794d92
        1) When we upgrade from an older distribution like rhel-alt-7.4(which
Justin Vreeland 794d92
        uses crashkernel=auto) to rhel-alt-7.5 (which uses the crashkernel=xY
Justin Vreeland 794d92
        format)
Justin Vreeland 794d92
        In this case we can use anaconda scripts for checking
Justin Vreeland 794d92
        'crashkernel=auto' in kernel spec and update to the new
Justin Vreeland 794d92
        'crashkernel=range:size' format.
Justin Vreeland 794d92
        2) When we upgrade from rhel-alt-7.5(which uses crashkernel=xY format)
Justin Vreeland 794d92
        to rhel-alt-7.6(which uses crashkernel=xY format), but the x and/or Y
Justin Vreeland 794d92
        values are changed in rhel-alt-7.6.
Justin Vreeland 794d92
        For example from crashkernel=2G-:160M to crashkernel=2G-:192M, then we have
Justin Vreeland 794d92
        no way to determine if the X and/or Y values were distribution
Justin Vreeland 794d92
        provided or user specified ones.
Justin Vreeland 794d92
        Since it is recommended to give precedence to user-specified values,
Justin Vreeland 794d92
        so we cannot do an upgrade in such a case."
Justin Vreeland 794d92
Justin Vreeland 794d92
        Thus turn back to resolve it in kernel, and add a simpler version
Justin Vreeland 794d92
        which just hacks to use the range:size style in code, and make
Justin Vreeland 794d92
        rhel-only code easily to maintain.
Justin Vreeland 794d92
    '''
Justin Vreeland 794d92
Justin Vreeland 794d92
    Signed-off-by: Dave Young <dyoung@redhat.com>
Justin Vreeland 794d92
    Signed-off-by: Herton R. Krzesinski <herton@redhat.com>
Justin Vreeland 794d92
Justin Vreeland 794d92
Upstream Status: RHEL only
Justin Vreeland 794d92
Signed-off-by: Jeremy Cline <jcline@redhat.com>
Justin Vreeland 794d92
---
Justin Vreeland 794d92
 Documentation/admin-guide/kdump/kdump.rst | 11 +++++++++++
Justin Vreeland 794d92
 kernel/crash_core.c                       | 14 ++++++++++++++
Justin Vreeland 794d92
 2 files changed, 25 insertions(+)
Justin Vreeland 794d92
Justin Vreeland 794d92
diff --git a/Documentation/admin-guide/kdump/kdump.rst b/Documentation/admin-guide/kdump/kdump.rst
Justin Vreeland 794d92
index 2da65fef2a1c..d53a524f80f0 100644
Justin Vreeland 794d92
--- a/Documentation/admin-guide/kdump/kdump.rst
Justin Vreeland 794d92
+++ b/Documentation/admin-guide/kdump/kdump.rst
Justin Vreeland 794d92
@@ -285,6 +285,17 @@ This would mean:
Justin Vreeland 794d92
     2) if the RAM size is between 512M and 2G (exclusive), then reserve 64M
Justin Vreeland 794d92
     3) if the RAM size is larger than 2G, then reserve 128M
Justin Vreeland 794d92
Justin Vreeland 794d92
+Or you can use crashkernel=auto if you have enough memory.  The threshold
Justin Vreeland 794d92
+is 2G on x86_64, arm64, ppc64 and ppc64le. The threshold is 4G for s390x.
Justin Vreeland 794d92
+If your system memory is less than the threshold crashkernel=auto will not
Justin Vreeland 794d92
+reserve memory.
Justin Vreeland 794d92
+
Justin Vreeland 794d92
+The automatically reserved memory size varies based on architecture.
Justin Vreeland 794d92
+The size changes according to system memory size like below:
Justin Vreeland 794d92
+    x86_64: 1G-64G:160M,64G-1T:256M,1T-:512M
Justin Vreeland 794d92
+    s390x:  4G-64G:160M,64G-1T:256M,1T-:512M
Justin Vreeland 794d92
+    arm64:  2G-:512M
Justin Vreeland 794d92
+    ppc64:  2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G
Justin Vreeland 794d92
Justin Vreeland 794d92
Justin Vreeland 794d92
 Boot into System Kernel
Justin Vreeland 794d92
diff --git a/kernel/crash_core.c b/kernel/crash_core.c
Justin Vreeland 794d92
index e4dfe2a05a31..8c6f59932247 100644
Justin Vreeland 794d92
--- a/kernel/crash_core.c
Justin Vreeland 794d92
+++ b/kernel/crash_core.c
Justin Vreeland 794d92
@@ -258,6 +258,20 @@ static int __init __parse_crashkernel(char *cmdline,
Justin Vreeland 794d92
 	if (suffix)
Justin Vreeland 794d92
 		return parse_crashkernel_suffix(ck_cmdline, crash_size,
Justin Vreeland 794d92
 				suffix);
Justin Vreeland 794d92
+
Justin Vreeland 794d92
+	if (strncmp(ck_cmdline, "auto", 4) == 0) {
Justin Vreeland 794d92
+#ifdef CONFIG_X86_64
Justin Vreeland 794d92
+		ck_cmdline = "1G-64G:160M,64G-1T:256M,1T-:512M";
Justin Vreeland 794d92
+#elif defined(CONFIG_S390)
Justin Vreeland 794d92
+		ck_cmdline = "4G-64G:160M,64G-1T:256M,1T-:512M";
Justin Vreeland 794d92
+#elif defined(CONFIG_ARM64)
Justin Vreeland 794d92
+		ck_cmdline = "2G-:512M";
Justin Vreeland 794d92
+#elif defined(CONFIG_PPC64)
Justin Vreeland 794d92
+		ck_cmdline = "2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G";
Justin Vreeland 794d92
+#endif
Justin Vreeland 794d92
+		pr_info("Using crashkernel=auto, the size choosed is a best effort estimation.\n");
Justin Vreeland 794d92
+	}
Justin Vreeland 794d92
+
Justin Vreeland 794d92
 	/*
Justin Vreeland 794d92
 	 * if the commandline contains a ':', then that's the extended
Justin Vreeland 794d92
 	 * syntax -- if not, it must be the classic syntax
Justin Vreeland 794d92
-- 
Justin Vreeland 794d92
2.28.0
Justin Vreeland 794d92