810b72 sysconfig: add pcie_ports compat to KDUMP_COMMANDLINE_APPEND on x86_64

Authored and Committed by liutgnu 4 months ago
    sysconfig: add pcie_ports compat to KDUMP_COMMANDLINE_APPEND on x86_64
    
    Upstream: fedora
    Resolves: RHEL-3929
    Conflict: Yes, for fedora there is no kdump.sysconfig.x86_64,
              but gen-kdump-sysconfig.sh. So for backporting, the
              modification is made on kdump.sysconfig.x86_64.
    
    commit ada6f5edf1ae06fc88759aa2f94d09e2a98d21ef
    Author: Tao Liu <ltao@redhat.com>
    Date:   Wed May 1 16:53:19 2024 +0800
    
        sysconfig: add pcie_ports compat to KDUMP_COMMANDLINE_APPEND on x86_64
    
        There have been some of failing cases of kdump in 2nd kernel, where
        ususally only one cpu is enabled by "nr_cpus=1", but with a large
        number of devices, which may easily exceed the maximum IRQ resources of
        one cpu can handle. As a result, the 2nd kernel will hang and kdump
        fails. This issue is often observed on machines with many cpus and many
        devices.
    
        On those systems, pcieports consume quite proportion of IRQ resources,
        many following message can be seen in dmesg log:
    
           pcieport 0000:18:01.0: PME: Signaling with IRQ 109
    
        According to kernel doc[1], when "pcie_ports=compat" applied, it will disable
        native PCIe services (PME, AER, DPC, PCIe hotplug). Those functions are
        power management events, error reporting, performance, hotplug related,
        which are not the must-have functions for kdump. In addition, after
        testing, no side effects such as cannot writing vmcore into sdx, nvme
        etc been noticed.
    
        This patch will disable native PCIe services for 2nd kernel, to saving the
        scarce IRQ resources and increase the kdump success.
    
        Attach Prarit's comments:
    
        This makes sense to me. The only concern anyone should have is that a PCIE
        error could have been responsible for taking down the kernel in the first
        place, and booting into the second kernel could then also have a fatal
        problem. I'm not sure we can ever fix that type of cascade of panics :)
        so it makes sense to disable these features.
    
        [1]: https://www.kernel.org/doc/html/v6.9-rc1/admin-guide/kernel-parameters.html
    
        Signed-off-by: Tao Liu <ltao@redhat.com>
        Acked-by: Prarit Bhargava <prarit@redhat.com>
        Acked-by: Dave Young <dyoung@redhat.com>
    
    Signed-off-by: Tao Liu <ltao@redhat.com>
    
        
file modified
+1 -1