|
|
73ea9d |
Firmware assisted dump (fadump) HOWTO
|
|
|
73ea9d |
|
|
|
73ea9d |
Introduction
|
|
|
73ea9d |
|
|
|
73ea9d |
Firmware assisted dump is a new feature in the 3.4 mainline kernel supported
|
|
|
73ea9d |
only on powerpc architecture. The goal of firmware-assisted dump is to enable
|
|
|
73ea9d |
the dump of a crashed system, and to do so from a fully-reset system, and to
|
|
|
73ea9d |
minimize the total elapsed time until the system is back in production use. A
|
|
|
73ea9d |
complete documentation on implementation can be found at
|
|
|
73ea9d |
Documentation/powerpc/firmware-assisted-dump.txt in upstream linux kernel tree
|
|
|
73ea9d |
from 3.4 version and above.
|
|
|
73ea9d |
|
|
|
73ea9d |
Please note that the firmware-assisted dump feature is only available on Power6
|
|
|
73ea9d |
and above systems with recent firmware versions.
|
|
|
73ea9d |
|
|
|
73ea9d |
Overview
|
|
|
73ea9d |
|
|
|
73ea9d |
Fadump
|
|
|
73ea9d |
|
|
|
73ea9d |
Fadump is a robust kernel crash dumping mechanism to get reliable kernel crash
|
|
|
73ea9d |
dump with assistance from firmware. This approach does not use kexec, instead
|
|
|
73ea9d |
firmware assists in booting the kdump kernel while preserving memory contents.
|
|
|
73ea9d |
Unlike kdump, the system is fully reset, and loaded with a fresh copy of the
|
|
|
73ea9d |
kernel. In particular, PCI and I/O devices are reinitialized and are in a
|
|
|
73ea9d |
clean, consistent state. This second kernel, often called a capture kernel,
|
|
|
73ea9d |
boots with very little memory and captures the dump image.
|
|
|
73ea9d |
|
|
|
73ea9d |
The first kernel registers the sections of memory with the Power firmware for
|
|
|
73ea9d |
dump preservation during OS initialization. These registered sections of memory
|
|
|
73ea9d |
are reserved by the first kernel during early boot. When a system crashes, the
|
|
|
73ea9d |
Power firmware fully resets the system, preserves all the system memory
|
|
|
73ea9d |
contents, save the low memory (boot memory of size larger of 5% of system
|
|
|
73ea9d |
RAM or 256MB) of RAM to the previous registered region. It will also save
|
|
|
73ea9d |
system registers, and hardware PTE's.
|
|
|
73ea9d |
|
|
|
73ea9d |
Fadump is supported only on ppc64 platform. The standard kernel and capture
|
|
|
73ea9d |
kernel are one and the same on ppc64.
|
|
|
73ea9d |
|
|
|
73ea9d |
If you're reading this document, you should already have kexec-tools
|
|
|
73ea9d |
installed. If not, you install it via the following command:
|
|
|
73ea9d |
|
|
|
73ea9d |
# yum install kexec-tools
|
|
|
73ea9d |
|
|
|
73ea9d |
Fadump Operational Flow:
|
|
|
73ea9d |
|
|
|
73ea9d |
Like kdump, fadump also exports the ELF formatted kernel crash dump through
|
|
|
73ea9d |
/proc/vmcore. Hence existing kdump infrastructure can be used to capture fadump
|
|
|
73ea9d |
vmcore. The idea is to keep the functionality transparent to end user. From
|
|
|
73ea9d |
user perspective there is no change in the way kdump init script works.
|
|
|
73ea9d |
|
|
|
73ea9d |
However, unlike kdump, fadump does not pre-load kdump kernel and initrd into
|
|
|
73ea9d |
reserved memory, instead it always uses default OS initrd during second boot
|
|
|
73ea9d |
after crash. Hence, for fadump, we rebuild the new kdump initrd and replace it
|
|
|
73ea9d |
with default initrd. Before replacing existing default initrd we take a backup
|
|
|
73ea9d |
of original default initrd for user's reference. The dracut package has been
|
|
|
73ea9d |
enhanced to rebuild the default initrd with vmcore capture steps. The initrd
|
|
|
73ea9d |
image is rebuilt as per the configuration in /etc/kdump.conf file.
|
|
|
73ea9d |
|
|
|
73ea9d |
The control flow of fadump works as follows:
|
|
|
73ea9d |
01. System panics.
|
|
|
73ea9d |
02. At the crash, kernel informs power firmware that kernel has crashed.
|
|
|
73ea9d |
03. Firmware takes the control and reboots the entire system preserving
|
|
|
73ea9d |
only the memory (resets all other devices).
|
|
|
73ea9d |
04. The reboot follows the normal booting process (non-kexec).
|
|
|
73ea9d |
05. The boot loader loads the default kernel and initrd from /boot
|
|
|
73ea9d |
06. The default initrd loads and runs /init
|
|
|
73ea9d |
07. dracut-kdump.sh script present in fadump aware default initrd checks if
|
|
|
73ea9d |
'/proc/device-tree/rtas/ibm,kernel-dump' file exists before executing
|
|
|
73ea9d |
steps to capture vmcore.
|
|
|
73ea9d |
(This check will help to bypass the vmcore capture steps during normal boot
|
|
|
73ea9d |
process.)
|
|
|
73ea9d |
09. Captures dump according to /etc/kdump.conf
|
|
|
73ea9d |
10. Is dump capture successful (yes goto 12, no goto 11)
|
|
|
73ea9d |
11. Perform the failure action specified in /etc/kdump.conf
|
|
|
73ea9d |
(The default failure action is reboot, if unspecified)
|
|
|
73ea9d |
12. Perform the final action specified in /etc/kdump.conf
|
|
|
73ea9d |
(The default final action is reboot, if unspecified)
|
|
|
73ea9d |
|
|
|
73ea9d |
|
|
|
73ea9d |
How to configure fadump:
|
|
|
73ea9d |
|
|
|
73ea9d |
Again, we assume if you're reading this document, you should already have
|
|
|
73ea9d |
kexec-tools installed. If not, you install it via the following command:
|
|
|
73ea9d |
|
|
|
73ea9d |
# yum install kexec-tools
|
|
|
73ea9d |
|
|
|
73ea9d |
Make the kernel to be configured with FADump as the default boot entry, if
|
|
|
73ea9d |
it isn't already:
|
|
|
73ea9d |
|
|
|
73ea9d |
# grubby --set-default=/boot/vmlinuz-<kver>
|
|
|
73ea9d |
|
|
|
73ea9d |
Boot into the kernel to be configured for FADump. To be able to do much of
|
|
|
73ea9d |
anything interesting in the way of debug analysis, you'll also need to install
|
|
|
73ea9d |
the kernel-debuginfo package, of the same arch as your running kernel, and the
|
|
|
73ea9d |
crash utility:
|
|
|
73ea9d |
|
|
|
73ea9d |
# yum --enablerepo=\*debuginfo install kernel-debuginfo.$(uname -m) crash
|
|
|
73ea9d |
|
|
|
73ea9d |
Next up, we need to modify some boot parameters to enable firmware assisted
|
|
|
73ea9d |
dump. With the help of grubby, it's very easy to append "fadump=on" to the end
|
|
|
73ea9d |
of your kernel boot parameters. To reserve the appropriate amount of memory
|
|
|
73ea9d |
for boot memory preservation, pass 'crashkernel=X' kernel cmdline parameter.
|
|
|
73ea9d |
For the recommended value of X, see 'FADump Memory Requirements' section.
|
|
|
73ea9d |
|
|
|
73ea9d |
# grubby --args="fadump=on crashkernel=6G" --update-kernel=/boot/vmlinuz-`uname -r`
|
|
|
73ea9d |
|
|
|
fe2ad6 |
By default, FADump reserved memory will be initialized as CMA area to make the
|
|
|
fe2ad6 |
memory available through CMA allocator on the production kernel. We can opt out
|
|
|
fe2ad6 |
of this, making reserved memory unavailable to production kernel, by booting the
|
|
|
fe2ad6 |
linux kernel with 'fadump=nocma' instead of 'fadump=on'.
|
|
|
fe2ad6 |
|
|
|
73ea9d |
The term 'boot memory' means size of the low memory chunk that is required for
|
|
|
73ea9d |
a kernel to boot successfully when booted with restricted memory. By default,
|
|
|
73ea9d |
the boot memory size will be the larger of 5% of system RAM or 256MB.
|
|
|
73ea9d |
Alternatively, user can also specify boot memory size through boot parameter
|
|
|
73ea9d |
'fadump_reserve_mem=' which will override the default calculated size. Use this
|
|
|
73ea9d |
option if default boot memory size is not sufficient for second kernel to boot
|
|
|
73ea9d |
successfully.
|
|
|
73ea9d |
|
|
|
73ea9d |
After making said changes, reboot your system, so that the specified memory is
|
|
|
73ea9d |
reserved and left untouched by the normal system. Take note that the output of
|
|
|
73ea9d |
'free -m' will show X MB less memory than without this parameter, which is
|
|
|
73ea9d |
expected. If you see OOM (Out Of Memory) error messages while loading capture
|
|
|
73ea9d |
kernel, then you should bump up the memory reservation size.
|
|
|
73ea9d |
|
|
|
73ea9d |
Now that you've got that reserved memory region set up, you want to turn on
|
|
|
73ea9d |
the kdump init script:
|
|
|
73ea9d |
|
|
|
73ea9d |
# systemctl enable kdump.service
|
|
|
73ea9d |
|
|
|
73ea9d |
Then, start up kdump as well:
|
|
|
73ea9d |
|
|
|
73ea9d |
# systemctl start kdump.service
|
|
|
73ea9d |
|
|
|
73ea9d |
This should turn on the firmware assisted functionality in kernel by
|
|
|
73ea9d |
echo'ing 1 to /sys/kernel/fadump_registered, leaving the system ready
|
|
|
73ea9d |
to capture a vmcore upon crashing. For journaling filesystems like XFS an
|
|
|
73ea9d |
additional step is required to ensure bootloader does not pick the
|
|
|
73ea9d |
older initrd (without vmcore capture scripts):
|
|
|
73ea9d |
|
|
|
73ea9d |
* If /boot is a separate partition, run the below commands as the root user,
|
|
|
73ea9d |
or as a user with CAP_SYS_ADMIN rights:
|
|
|
73ea9d |
|
|
|
73ea9d |
# fsfreeze -f
|
|
|
73ea9d |
# fsfreeze -u
|
|
|
73ea9d |
|
|
|
73ea9d |
* If /boot is not a separate partition, reboot the system.
|
|
|
73ea9d |
|
|
|
73ea9d |
After reboot check if the kdump service is up and running with:
|
|
|
73ea9d |
|
|
|
73ea9d |
# systemctl status kdump.service
|
|
|
73ea9d |
|
|
|
73ea9d |
To test out whether FADump is configured properly, you can force-crash your
|
|
|
73ea9d |
system by echo'ing a 'c' into /proc/sysrq-trigger:
|
|
|
73ea9d |
|
|
|
73ea9d |
# echo c > /proc/sysrq-trigger
|
|
|
73ea9d |
|
|
|
73ea9d |
You should see some panic output, followed by the system reset and booting into
|
|
|
73ea9d |
fresh copy of kernel. When default initrd loads and runs /init, vmcore should
|
|
|
73ea9d |
be copied out to disk (by default, in /var/crash/<YYYY.MM.DD-HH:MM:SS>/vmcore),
|
|
|
73ea9d |
then the system rebooted back into your normal kernel.
|
|
|
73ea9d |
|
|
|
73ea9d |
Once back to your normal kernel, you can use the previously installed crash
|
|
|
73ea9d |
kernel in conjunction with the previously installed kernel-debuginfo to
|
|
|
73ea9d |
perform postmortem analysis:
|
|
|
73ea9d |
|
|
|
73ea9d |
# crash /usr/lib/debug/lib/modules/2.6.17-1.2621.el5/vmlinux
|
|
|
73ea9d |
/var/crash/2006-08-23-15:34/vmcore
|
|
|
73ea9d |
|
|
|
73ea9d |
crash> bt
|
|
|
73ea9d |
|
|
|
73ea9d |
and so on...
|
|
|
73ea9d |
|
|
|
73ea9d |
Saving vmcore-dmesg.txt
|
|
|
73ea9d |
-----------------------
|
|
|
73ea9d |
Kernel log bufferes are one of the most important information available
|
|
|
73ea9d |
in vmcore. Now before saving vmcore, kernel log bufferes are extracted
|
|
|
73ea9d |
from /proc/vmcore and saved into a file vmcore-dmesg.txt. After
|
|
|
73ea9d |
vmcore-dmesg.txt, vmcore is saved. Destination disk and directory for
|
|
|
73ea9d |
vmcore-dmesg.txt is same as vmcore. Note that kernel log buffers will
|
|
|
73ea9d |
not be available if dump target is raw device.
|
|
|
73ea9d |
|
|
|
73ea9d |
FADump Memory Requirements:
|
|
|
73ea9d |
|
|
|
73ea9d |
System Memory Recommended memory
|
|
|
73ea9d |
--------------------- ----------------------
|
|
|
73ea9d |
4 GB - 16 GB : 768 MB
|
|
|
73ea9d |
16 GB - 64 GB : 1024 MB
|
|
|
73ea9d |
64 GB - 128 GB : 2 GB
|
|
|
73ea9d |
128 GB - 1 TB : 4 GB
|
|
|
73ea9d |
1 TB - 2 TB : 6 GB
|
|
|
73ea9d |
2 TB - 4 TB : 12 GB
|
|
|
73ea9d |
4 TB - 8 TB : 20 GB
|
|
|
73ea9d |
8 TB - 16 TB : 36 GB
|
|
|
73ea9d |
16 TB - 32 TB : 64 GB
|
|
|
73ea9d |
32 TB - 64 TB : 128 GB
|
|
|
73ea9d |
64 TB & above : 180 GB
|
|
|
73ea9d |
|
|
|
73ea9d |
Things to remember:
|
|
|
73ea9d |
|
|
|
73ea9d |
1) The memory required to boot capture Kernel is a moving target that depends
|
|
|
73ea9d |
on many factors like hardware attached to the system, kernel and modules in
|
|
|
73ea9d |
use, packages installed and services enabled, there is no one-size-fits-all.
|
|
|
73ea9d |
But the above recommendations are based on system memory. So, the above
|
|
|
73ea9d |
recommendations for FADump come with a few assumptions, based on available
|
|
|
73ea9d |
system memory, about the resources the system could have. So, please take
|
|
|
73ea9d |
the recommendations with a pinch of salt and remember to try capturing dump
|
|
|
73ea9d |
a few times to confirm that the system is configured successfully with dump
|
|
|
73ea9d |
capturing support.
|
|
|
73ea9d |
|
|
|
73ea9d |
2) Though the memory requirements for FADump seem high, this memory is not
|
|
|
73ea9d |
completely set aside but made available for userspace applications to use,
|
|
|
73ea9d |
through the CMA allocator.
|
|
|
73ea9d |
|
|
|
73ea9d |
3) As the same initrd is used for booting production kernel as well as capture
|
|
|
73ea9d |
kernel and with dump being captured in a restricted memory environment, few
|
|
|
73ea9d |
optimizations (like not inclding network dracut module, disabling multipath
|
|
|
73ea9d |
and such) are applied while building the initrd. In case, the production
|
|
|
73ea9d |
environment needs these optimizations to be avoided, dracut_args option in
|
|
|
73ea9d |
/etc/kdump.conf file could be leveraged. For example, if a user wishes for
|
|
|
73ea9d |
network module to be included in the initrd, adding the below entry in
|
|
|
73ea9d |
/etc/kdump.conf file and restarting kdump service would take care of it.
|
|
|
73ea9d |
|
|
|
73ea9d |
dracut_args --add "network"
|
|
|
73ea9d |
|
|
|
73ea9d |
4) If FADump is configured to capture vmcore to a remote dump target using SSH
|
|
|
cf4816 |
or NFS protocol, the corresponding network interface '<interface-name>' is
|
|
|
cf4816 |
renamed to 'kdump-<interface-name>', if it is generic (like *eth# or net#).
|
|
|
cf4816 |
It happens because vmcore capture scripts in the initial RAM disk (initrd)
|
|
|
cf4816 |
add the 'kdump-' prefix to the network interface name to secure persistent
|
|
|
cf4816 |
naming. And as capture kernel and production kernel use the same initrd in
|
|
|
cf4816 |
case of FADump, the interface name is changed for the production kernel too.
|
|
|
cf4816 |
This is likely to impact network configuration setup for production kernel.
|
|
|
cf4816 |
So, it is recommended to use a non-generic name for a network interface,
|
|
|
cf4816 |
before setting up FADump to capture vmcore to a remote dump target based on
|
|
|
cf4816 |
that network interface, to avoid running into network configuration issues.
|
|
|
73ea9d |
|
|
|
73ea9d |
Dump Triggering methods:
|
|
|
73ea9d |
|
|
|
73ea9d |
This section talks about the various ways, other than a Kernel Panic, in which
|
|
|
73ea9d |
fadump can be triggered. The following methods assume that fadump is configured
|
|
|
73ea9d |
on your system, with the scripts enabled as described in the section above.
|
|
|
73ea9d |
|
|
|
73ea9d |
1) AltSysRq C
|
|
|
73ea9d |
|
|
|
73ea9d |
FAdump can be triggered with the combination of the 'Alt','SysRq' and 'C'
|
|
|
73ea9d |
keyboard keys. Please refer to the following link for more details:
|
|
|
73ea9d |
|
|
|
73ea9d |
https://fedoraproject.org/wiki/QA/Sysrq
|
|
|
73ea9d |
|
|
|
73ea9d |
In addition, on PowerPC boxes, fadump can also be triggered via Hardware
|
|
|
73ea9d |
Management Console(HMC) using 'Ctrl', 'O' and 'C' keyboard keys.
|
|
|
73ea9d |
|
|
|
73ea9d |
2) Kernel OOPs
|
|
|
73ea9d |
|
|
|
73ea9d |
If we want to generate a dump everytime the Kernel OOPses, we can achieve this
|
|
|
73ea9d |
by setting the 'Panic On OOPs' option as follows:
|
|
|
73ea9d |
|
|
|
73ea9d |
# echo 1 > /proc/sys/kernel/panic_on_oops
|
|
|
73ea9d |
|
|
|
73ea9d |
3) PowerPC specific methods:
|
|
|
73ea9d |
|
|
|
73ea9d |
On IBM PowerPC machines, issuing a soft reset invokes the XMON debugger(if
|
|
|
73ea9d |
XMON is configured). To configure XMON one needs to compile the kernel with
|
|
|
73ea9d |
the CONFIG_XMON and CONFIG_XMON_DEFAULT options, or by compiling with
|
|
|
73ea9d |
CONFIG_XMON and booting the kernel with xmon=on option.
|
|
|
73ea9d |
|
|
|
73ea9d |
Following are the ways to remotely issue a soft reset on PowerPC boxes, which
|
|
|
73ea9d |
would drop you to XMON. Pressing a 'X' (capital alphabet X) followed by an
|
|
|
73ea9d |
'Enter' here will trigger the dump.
|
|
|
73ea9d |
|
|
|
73ea9d |
3.1) HMC
|
|
|
73ea9d |
|
|
|
73ea9d |
Hardware Management Console(HMC) available on Power4 and Power5 machines allow
|
|
|
73ea9d |
partitions to be reset remotely. This is specially useful in hang situations
|
|
|
73ea9d |
where the system is not accepting any keyboard inputs.
|
|
|
73ea9d |
|
|
|
73ea9d |
Once you have HMC configured, the following steps will enable you to trigger
|
|
|
73ea9d |
fadump via a soft reset:
|
|
|
73ea9d |
|
|
|
73ea9d |
On Power4
|
|
|
73ea9d |
Using GUI
|
|
|
73ea9d |
|
|
|
73ea9d |
* In the right pane, right click on the partition you wish to dump.
|
|
|
73ea9d |
* Select "Operating System->Reset".
|
|
|
73ea9d |
* Select "Soft Reset".
|
|
|
73ea9d |
* Select "Yes".
|
|
|
73ea9d |
|
|
|
73ea9d |
Using HMC Commandline
|
|
|
73ea9d |
|
|
|
73ea9d |
# reset_partition -m <machine> -p <partition> -t soft
|
|
|
73ea9d |
|
|
|
73ea9d |
On Power5
|
|
|
73ea9d |
Using GUI
|
|
|
73ea9d |
|
|
|
73ea9d |
* In the right pane, right click on the partition you wish to dump.
|
|
|
73ea9d |
* Select "Restart Partition".
|
|
|
73ea9d |
* Select "Dump".
|
|
|
73ea9d |
* Select "OK".
|
|
|
73ea9d |
|
|
|
73ea9d |
Using HMC Commandline
|
|
|
73ea9d |
|
|
|
73ea9d |
# chsysstate -m <managed system name> -n <lpar name> -o dumprestart -r lpar
|
|
|
73ea9d |
|
|
|
73ea9d |
3.2) Blade Management Console for Blade Center
|
|
|
73ea9d |
|
|
|
73ea9d |
To initiate a dump operation, go to Power/Restart option under "Blade Tasks" in
|
|
|
73ea9d |
the Blade Management Console. Select the corresponding blade for which you want
|
|
|
73ea9d |
to initate the dump and then click "Restart blade with NMI". This issues a
|
|
|
73ea9d |
system reset and invokes xmon debugger.
|
|
|
73ea9d |
|
|
|
73ea9d |
|
|
|
73ea9d |
Advanced Setups & Failure action:
|
|
|
73ea9d |
|
|
|
73ea9d |
Kdump and fadump exhibit similar behavior in terms of setup & failure action.
|
|
|
73ea9d |
For fadump advanced setup related information see section "Advanced Setups" in
|
|
|
73ea9d |
"kexec-kdump-howto.txt" document. Refer to "Failure action" section in "kexec-
|
|
|
73ea9d |
kdump-howto.txt" document for fadump failure action related information.
|
|
|
73ea9d |
|
|
|
73ea9d |
Compression and filtering
|
|
|
73ea9d |
|
|
|
73ea9d |
Refer "Compression and filtering" section in "kexec-kdump-howto.txt" document.
|
|
|
73ea9d |
Compression and filtering are same for kdump & fadump.
|
|
|
73ea9d |
|
|
|
73ea9d |
|
|
|
73ea9d |
Notes on rootfs mount:
|
|
|
73ea9d |
Dracut is designed to mount rootfs by default. If rootfs mounting fails it
|
|
|
73ea9d |
will refuse to go on. So fadump leaves rootfs mounting to dracut currently.
|
|
|
73ea9d |
We make the assumtion that proper root= cmdline is being passed to dracut
|
|
|
73ea9d |
initramfs for the time being. If you need modify "KDUMP_COMMANDLINE=" in
|
|
|
73ea9d |
/etc/sysconfig/kdump, you will need to make sure that appropriate root=
|
|
|
73ea9d |
options are copied from /proc/cmdline. In general it is best to append
|
|
|
73ea9d |
command line options using "KDUMP_COMMANDLINE_APPEND=" instead of replacing
|
|
|
73ea9d |
the original command line completely.
|
|
|
73ea9d |
|
|
|
73ea9d |
How to disable FADump:
|
|
|
73ea9d |
|
|
|
fe2ad6 |
Remove "fadump=on"/"fadump=nocma" from kernel cmdline parameters OR replace
|
|
|
fe2ad6 |
it with "fadump=off" kernel cmdline parameter:
|
|
|
73ea9d |
|
|
|
73ea9d |
# grubby --update-kernel=/boot/vmlinuz-`uname -r` --remove-args="fadump=on"
|
|
|
fe2ad6 |
or
|
|
|
fe2ad6 |
# grubby --update-kernel=/boot/vmlinuz-`uname -r` --remove-args="fadump=nocma"
|
|
|
fe2ad6 |
OR
|
|
|
fe2ad6 |
# grubby --update-kernel=/boot/vmlinuz-`uname -r` --args="fadump=off"
|
|
|
73ea9d |
|
|
|
73ea9d |
If KDump is to be used as the dump capturing mechanism, update the crashkernel
|
|
|
73ea9d |
parameter (Else, remove "crashkernel=" parameter too, using grubby):
|
|
|
73ea9d |
|
|
|
73ea9d |
# grubby --update-kernel=/boot/vmlinuz-$kver --args="crashkernl=auto"
|
|
|
73ea9d |
|
|
|
73ea9d |
Reboot the system for the settings to take effect.
|