73ea9d
Firmware assisted dump (fadump) HOWTO
73ea9d
73ea9d
Introduction
73ea9d
73ea9d
Firmware assisted dump is a new feature in the 3.4 mainline kernel supported
73ea9d
only on powerpc architecture. The goal of firmware-assisted dump is to enable
73ea9d
the dump of a crashed system, and to do so from a fully-reset system, and to
73ea9d
minimize the total elapsed time until the system is back in production use. A
73ea9d
complete documentation on implementation can be found at
73ea9d
Documentation/powerpc/firmware-assisted-dump.txt in upstream linux kernel tree
73ea9d
from 3.4 version and above.
73ea9d
73ea9d
Please note that the firmware-assisted dump feature is only available on Power6
73ea9d
and above systems with recent firmware versions.
73ea9d
73ea9d
Overview
73ea9d
73ea9d
Fadump
73ea9d
73ea9d
Fadump is a robust kernel crash dumping mechanism to get reliable kernel crash
73ea9d
dump with assistance from firmware. This approach does not use kexec, instead
73ea9d
firmware assists in booting the kdump kernel while preserving memory contents.
73ea9d
Unlike kdump, the system is fully reset, and loaded with a fresh copy of the
73ea9d
kernel. In particular, PCI and I/O devices are reinitialized and are in a
73ea9d
clean, consistent state.  This second kernel, often called a capture kernel,
73ea9d
boots with very little memory and captures the dump image.
73ea9d
73ea9d
The first kernel registers the sections of memory with the Power firmware for
73ea9d
dump preservation during OS initialization. These registered sections of memory
73ea9d
are reserved by the first kernel during early boot. When a system crashes, the
73ea9d
Power firmware fully resets the system, preserves all the system memory
73ea9d
contents, save the low memory (boot memory of size larger of 5% of system
73ea9d
RAM or 256MB) of RAM to the previous registered region. It will also save
73ea9d
system registers, and hardware PTE's.
73ea9d
73ea9d
Fadump is supported only on ppc64 platform. The standard kernel and capture
73ea9d
kernel are one and the same on ppc64.
73ea9d
73ea9d
If you're reading this document, you should already have kexec-tools
73ea9d
installed. If not, you install it via the following command:
73ea9d
73ea9d
    # yum install kexec-tools
73ea9d
73ea9d
Fadump Operational Flow:
73ea9d
73ea9d
Like kdump, fadump also exports the ELF formatted kernel crash dump through
73ea9d
/proc/vmcore. Hence existing kdump infrastructure can be used to capture fadump
73ea9d
vmcore. The idea is to keep the functionality transparent to end user. From
73ea9d
user perspective there is no change in the way kdump init script works.
73ea9d
73ea9d
However, unlike kdump, fadump does not pre-load kdump kernel and initrd into
73ea9d
reserved memory, instead it always uses default OS initrd during second boot
73ea9d
after crash. Hence, for fadump, we rebuild the new kdump initrd and replace it
73ea9d
with default initrd. Before replacing existing default initrd we take a backup
73ea9d
of original default initrd for user's reference. The dracut package has been
73ea9d
enhanced to rebuild the default initrd with vmcore capture steps. The initrd
73ea9d
image is rebuilt as per the configuration in /etc/kdump.conf file.
73ea9d
73ea9d
The control flow of fadump works as follows:
73ea9d
01. System panics.
73ea9d
02. At the crash, kernel informs power firmware that kernel has crashed.
73ea9d
03. Firmware takes the control and reboots the entire system preserving
73ea9d
    only the memory (resets all other devices).
73ea9d
04. The reboot follows the normal booting process (non-kexec).
73ea9d
05. The boot loader loads the default kernel and initrd from /boot
73ea9d
06. The default initrd loads and runs /init
73ea9d
07. dracut-kdump.sh script present in fadump aware default initrd checks if
73ea9d
    '/proc/device-tree/rtas/ibm,kernel-dump'  file exists  before executing
73ea9d
    steps to capture vmcore.
73ea9d
    (This check will help to bypass the vmcore capture steps during normal boot
73ea9d
     process.)
73ea9d
09. Captures dump according to /etc/kdump.conf
73ea9d
10. Is dump capture successful (yes goto 12, no goto 11)
73ea9d
11. Perform the failure action specified in /etc/kdump.conf
73ea9d
    (The default failure action is reboot, if unspecified)
73ea9d
12. Perform the final action specified in /etc/kdump.conf
73ea9d
    (The default final action is reboot, if unspecified)
73ea9d
73ea9d
73ea9d
How to configure fadump:
73ea9d
73ea9d
Again, we assume if you're reading this document, you should already have
73ea9d
kexec-tools installed. If not, you install it via the following command:
73ea9d
73ea9d
    # yum install kexec-tools
73ea9d
73ea9d
Make the kernel to be configured with FADump as the default boot entry, if
73ea9d
it isn't already:
73ea9d
73ea9d
   # grubby --set-default=/boot/vmlinuz-<kver>
73ea9d
73ea9d
Boot into the kernel to be configured for FADump. To be able to do much of
73ea9d
anything interesting in the way of debug analysis, you'll also need to install
73ea9d
the kernel-debuginfo package, of the same arch as your running kernel, and the
73ea9d
crash utility:
73ea9d
73ea9d
    # yum --enablerepo=\*debuginfo install kernel-debuginfo.$(uname -m) crash
73ea9d
73ea9d
Next up, we need to modify some boot parameters to enable firmware assisted
73ea9d
dump. With the help of grubby, it's very easy to append "fadump=on" to the end
73ea9d
of your kernel boot parameters. To reserve the appropriate amount of memory
73ea9d
for boot memory preservation, pass 'crashkernel=X' kernel cmdline parameter.
73ea9d
For the recommended value of X, see 'FADump Memory Requirements' section.
73ea9d
73ea9d
   # grubby --args="fadump=on crashkernel=6G" --update-kernel=/boot/vmlinuz-`uname -r`
73ea9d
fe2ad6
By default, FADump reserved memory will be initialized as CMA area to make the
fe2ad6
memory available through CMA allocator on the production kernel. We can opt out
fe2ad6
of this, making reserved memory unavailable to production kernel, by booting the
fe2ad6
linux kernel with 'fadump=nocma' instead of 'fadump=on'.
fe2ad6
73ea9d
The term 'boot memory' means size of the low memory chunk that is required for
73ea9d
a kernel to boot successfully when booted with restricted memory.  By default,
73ea9d
the boot memory size will be the larger of 5% of system RAM or 256MB.
73ea9d
Alternatively, user can also specify boot memory size through boot parameter
73ea9d
'fadump_reserve_mem=' which will override the default calculated size. Use this
73ea9d
option if default boot memory size is not sufficient for second kernel to boot
73ea9d
successfully.
73ea9d
73ea9d
After making said changes, reboot your system, so that the specified memory is
73ea9d
reserved and left untouched by the normal system. Take note that the output of
73ea9d
'free -m' will show X MB less memory than without this parameter, which is
73ea9d
expected. If you see OOM (Out Of Memory) error messages while loading capture
73ea9d
kernel, then you should bump up the memory reservation size.
73ea9d
73ea9d
Now that you've got that reserved memory region set up, you want to turn on
73ea9d
the kdump init script:
73ea9d
73ea9d
    # systemctl enable kdump.service
73ea9d
73ea9d
Then, start up kdump as well:
73ea9d
73ea9d
    # systemctl start kdump.service
73ea9d
73ea9d
This should turn on the firmware assisted functionality in kernel by
73ea9d
echo'ing 1 to /sys/kernel/fadump_registered, leaving the system ready
73ea9d
to capture a vmcore upon crashing. For journaling filesystems like XFS an
73ea9d
additional step is required to ensure bootloader does not pick the
73ea9d
older initrd (without vmcore capture scripts):
73ea9d
73ea9d
  * If /boot is a separate partition, run the below commands as the root user,
73ea9d
    or as a user with CAP_SYS_ADMIN rights:
73ea9d
73ea9d
        # fsfreeze -f
73ea9d
        # fsfreeze -u
73ea9d
73ea9d
  * If /boot is not a separate partition, reboot the system.
73ea9d
73ea9d
After reboot check if the kdump service is up and running with:
73ea9d
73ea9d
  # systemctl status kdump.service
73ea9d
73ea9d
To test out whether FADump is configured properly, you can force-crash your
73ea9d
system by echo'ing a 'c' into /proc/sysrq-trigger:
73ea9d
73ea9d
    # echo c > /proc/sysrq-trigger
73ea9d
73ea9d
You should see some panic output, followed by the system reset and booting into
73ea9d
fresh copy of kernel. When default initrd loads and runs /init, vmcore should
73ea9d
be copied out to disk (by default, in /var/crash/<YYYY.MM.DD-HH:MM:SS>/vmcore),
73ea9d
then the system rebooted back into your normal kernel.
73ea9d
73ea9d
Once back to your normal kernel, you can use the previously installed crash
73ea9d
kernel in conjunction with the previously installed kernel-debuginfo to
73ea9d
perform postmortem analysis:
73ea9d
73ea9d
    # crash /usr/lib/debug/lib/modules/2.6.17-1.2621.el5/vmlinux
73ea9d
    /var/crash/2006-08-23-15:34/vmcore
73ea9d
73ea9d
    crash> bt
73ea9d
73ea9d
and so on...
73ea9d
73ea9d
Saving vmcore-dmesg.txt
73ea9d
-----------------------
73ea9d
Kernel log bufferes are one of the most important information available
73ea9d
in vmcore. Now before saving vmcore, kernel log bufferes are extracted
73ea9d
from /proc/vmcore and saved into a file vmcore-dmesg.txt. After
73ea9d
vmcore-dmesg.txt, vmcore is saved. Destination disk and directory for
73ea9d
vmcore-dmesg.txt is same as vmcore. Note that kernel log buffers will
73ea9d
not be available if dump target is raw device.
73ea9d
73ea9d
FADump Memory Requirements:
73ea9d
73ea9d
  System Memory          Recommended memory
73ea9d
--------------------- ----------------------
73ea9d
    4 GB - 16 GB     :        768 MB
73ea9d
   16 GB - 64 GB     :       1024 MB
73ea9d
   64 GB - 128 GB    :          2 GB
73ea9d
  128 GB - 1 TB      :          4 GB
73ea9d
    1 TB - 2 TB      :          6 GB
73ea9d
    2 TB - 4 TB      :         12 GB
73ea9d
    4 TB - 8 TB      :         20 GB
73ea9d
    8 TB - 16 TB     :         36 GB
73ea9d
   16 TB - 32 TB     :         64 GB
73ea9d
   32 TB - 64 TB     :        128 GB
73ea9d
   64 TB & above     :        180 GB
73ea9d
73ea9d
Things to remember:
73ea9d
73ea9d
1) The memory required to boot capture Kernel is a moving target that depends
73ea9d
   on many factors like hardware attached to the system, kernel and modules in
73ea9d
   use, packages installed and services enabled, there is no one-size-fits-all.
73ea9d
   But the above recommendations are based on system memory. So, the above
73ea9d
   recommendations for FADump come with a few assumptions, based on available
73ea9d
   system memory, about the resources the system could have. So, please take
73ea9d
   the recommendations with a pinch of salt and remember to try capturing dump
73ea9d
   a few times to confirm that the system is configured successfully with dump
73ea9d
   capturing support.
73ea9d
73ea9d
2) Though the memory requirements for FADump seem high, this memory is not
73ea9d
   completely set aside but made available for userspace applications to use,
73ea9d
   through the CMA allocator.
73ea9d
73ea9d
3) As the same initrd is used for booting production kernel as well as capture
73ea9d
   kernel and with dump being captured in a restricted memory environment, few
73ea9d
   optimizations (like not inclding network dracut module, disabling multipath
73ea9d
   and such) are applied while building the initrd. In case, the production
73ea9d
   environment needs these optimizations to be avoided, dracut_args option in
73ea9d
   /etc/kdump.conf file could be leveraged. For example, if a user wishes for
73ea9d
   network module to be included in the initrd, adding the below entry in
73ea9d
   /etc/kdump.conf file and restarting kdump service would take care of it.
73ea9d
73ea9d
   dracut_args --add "network"
73ea9d
73ea9d
4) If FADump is configured to capture vmcore to a remote dump target using SSH
cf4816
   or NFS protocol, the corresponding network interface '<interface-name>' is
cf4816
   renamed to 'kdump-<interface-name>', if it is generic (like *eth# or net#).
cf4816
   It happens because vmcore capture scripts in the initial RAM disk (initrd)
cf4816
   add the 'kdump-' prefix to the network interface name to secure persistent
cf4816
   naming. And as capture kernel and production kernel use the same initrd in
cf4816
   case of FADump, the interface name is changed for the production kernel too.
cf4816
   This is likely to impact network configuration setup for production kernel.
cf4816
   So, it is recommended to use a non-generic name for a network interface,
cf4816
   before setting up FADump to capture vmcore to a remote dump target based on
cf4816
   that network interface, to avoid running into network configuration issues.
73ea9d
73ea9d
Dump Triggering methods:
73ea9d
73ea9d
This section talks about the various ways, other than a Kernel Panic, in which
73ea9d
fadump can be triggered. The following methods assume that fadump is configured
73ea9d
on your system, with the scripts enabled as described in the section above.
73ea9d
73ea9d
1) AltSysRq C
73ea9d
73ea9d
FAdump can be triggered with the combination of the 'Alt','SysRq' and 'C'
73ea9d
keyboard keys. Please refer to the following link for more details:
73ea9d
73ea9d
https://fedoraproject.org/wiki/QA/Sysrq
73ea9d
73ea9d
In addition, on PowerPC boxes, fadump can also be triggered via Hardware
73ea9d
Management Console(HMC) using 'Ctrl', 'O' and 'C' keyboard keys.
73ea9d
73ea9d
2) Kernel OOPs
73ea9d
73ea9d
If we want to generate a dump everytime the Kernel OOPses, we can achieve this
73ea9d
by setting the 'Panic On OOPs' option as follows:
73ea9d
73ea9d
    # echo 1 > /proc/sys/kernel/panic_on_oops
73ea9d
73ea9d
3) PowerPC specific methods:
73ea9d
73ea9d
On IBM PowerPC machines, issuing a soft reset invokes the XMON debugger(if
73ea9d
XMON is configured). To configure XMON one needs to compile the kernel with
73ea9d
the CONFIG_XMON and CONFIG_XMON_DEFAULT options, or by compiling with
73ea9d
CONFIG_XMON and booting the kernel with xmon=on option.
73ea9d
73ea9d
Following are the ways to remotely issue a soft reset on PowerPC boxes, which
73ea9d
would drop you to XMON. Pressing a 'X' (capital alphabet X) followed by an
73ea9d
'Enter' here will trigger the dump.
73ea9d
73ea9d
3.1) HMC
73ea9d
73ea9d
Hardware Management Console(HMC) available on Power4 and Power5 machines allow
73ea9d
partitions to be reset remotely. This is specially useful in hang situations
73ea9d
where the system is not accepting any keyboard inputs.
73ea9d
73ea9d
Once you have HMC configured, the following steps will enable you to trigger
73ea9d
fadump via a soft reset:
73ea9d
73ea9d
On Power4
73ea9d
  Using GUI
73ea9d
73ea9d
    * In the right pane, right click on the partition you wish to dump.
73ea9d
    * Select "Operating System->Reset".
73ea9d
    * Select "Soft Reset".
73ea9d
    * Select "Yes".
73ea9d
73ea9d
  Using HMC Commandline
73ea9d
73ea9d
    # reset_partition -m <machine> -p <partition> -t soft
73ea9d
73ea9d
On Power5
73ea9d
  Using GUI
73ea9d
73ea9d
    * In the right pane, right click on the partition you wish to dump.
73ea9d
    * Select "Restart Partition".
73ea9d
    * Select "Dump".
73ea9d
    * Select "OK".
73ea9d
73ea9d
  Using HMC Commandline
73ea9d
73ea9d
    # chsysstate -m <managed system name> -n <lpar name> -o dumprestart -r lpar
73ea9d
73ea9d
3.2) Blade Management Console for Blade Center
73ea9d
73ea9d
To initiate a dump operation, go to Power/Restart option under "Blade Tasks" in
73ea9d
the Blade Management Console. Select the corresponding blade for which you want
73ea9d
to initate the dump and then click "Restart blade with NMI". This issues a
73ea9d
system reset and invokes xmon debugger.
73ea9d
73ea9d
73ea9d
Advanced Setups & Failure action:
73ea9d
73ea9d
Kdump and fadump exhibit similar behavior in terms of setup & failure action.
73ea9d
For fadump advanced setup related information see section "Advanced Setups" in
73ea9d
"kexec-kdump-howto.txt" document. Refer to "Failure action" section in "kexec-
73ea9d
kdump-howto.txt" document for fadump failure action related information.
73ea9d
73ea9d
Compression and filtering
73ea9d
73ea9d
Refer "Compression and filtering" section in "kexec-kdump-howto.txt" document.
73ea9d
Compression and filtering are same for kdump & fadump.
73ea9d
73ea9d
73ea9d
Notes on rootfs mount:
73ea9d
Dracut is designed to mount rootfs by default. If rootfs mounting fails it
73ea9d
will refuse to go on. So fadump leaves rootfs mounting to dracut currently.
73ea9d
We make the assumtion that proper root= cmdline is being passed to dracut
73ea9d
initramfs for the time being. If you need modify "KDUMP_COMMANDLINE=" in
73ea9d
/etc/sysconfig/kdump, you will need to make sure that appropriate root=
73ea9d
options are copied from /proc/cmdline. In general it is best to append
73ea9d
command line options using "KDUMP_COMMANDLINE_APPEND=" instead of replacing
73ea9d
the original command line completely.
73ea9d
73ea9d
How to disable FADump:
73ea9d
fe2ad6
Remove "fadump=on"/"fadump=nocma" from kernel cmdline parameters OR replace
fe2ad6
it with "fadump=off" kernel cmdline parameter:
73ea9d
73ea9d
   # grubby --update-kernel=/boot/vmlinuz-`uname -r` --remove-args="fadump=on"
fe2ad6
or
fe2ad6
   # grubby --update-kernel=/boot/vmlinuz-`uname -r` --remove-args="fadump=nocma"
fe2ad6
OR
fe2ad6
   # grubby --update-kernel=/boot/vmlinuz-`uname -r` --args="fadump=off"
73ea9d
73ea9d
If KDump is to be used as the dump capturing mechanism, update the crashkernel
73ea9d
parameter (Else, remove "crashkernel=" parameter too, using grubby):
73ea9d
73ea9d
   # grubby --update-kernel=/boot/vmlinuz-$kver --args="crashkernl=auto"
73ea9d
73ea9d
Reboot the system for the settings to take effect.