399b37
=================
b9e861
Kexec/Kdump HOWTO
399b37
=================
399b37
b9e861
b9e861
Introduction
399b37
============
b9e861
b9e861
Kexec and kdump are new features in the 2.6 mainstream kernel. These features
b9e861
are included in Red Hat Enterprise Linux 5. The purpose of these features
b9e861
is to ensure faster boot up and creation of reliable kernel vmcores for
b9e861
diagnostic purposes.
b9e861
399b37
b9e861
Overview
399b37
========
b9e861
b9e861
Kexec
399b37
-----
b9e861
b9e861
Kexec is a fastboot mechanism which allows booting a Linux kernel from the
b9e861
context of already running kernel without going through BIOS. BIOS can be very
b9e861
time consuming especially on the big servers with lots of peripherals. This can
b9e861
save a lot of time for developers who end up booting a machine numerous times.
b9e861
b9e861
Kdump
399b37
-----
b9e861
b9e861
Kdump is a new kernel crash dumping mechanism and is very reliable because
b9e861
the crash dump is captured from the context of a freshly booted kernel and
b9e861
not from the context of the crashed kernel. Kdump uses kexec to boot into
b9e861
a second kernel whenever system crashes. This second kernel, often called
b9e861
a capture kernel, boots with very little memory and captures the dump image.
b9e861
b9e861
The first kernel reserves a section of memory that the second kernel uses
b9e861
to boot. Kexec enables booting the capture kernel without going through BIOS
b9e861
hence contents of first kernel's memory are preserved, which is essentially
b9e861
the kernel crash dump.
b9e861
b9e861
Kdump is supported on the i686, x86_64, ia64 and ppc64 platforms. The
b9e861
standard kernel and capture kernel are one in the same on i686, x86_64,
b9e861
ia64 and ppc64.
b9e861
b9e861
If you're reading this document, you should already have kexec-tools
b9e861
installed. If not, you install it via the following command:
b9e861
b9e861
    # yum install kexec-tools
b9e861
b9e861
Now load a kernel with kexec:
b9e861
b9e861
    # kver=`uname -r` # kexec -l /boot/vmlinuz-$kver
b9e861
    --initrd=/boot/initrd-$kver.img \
b9e861
        --command-line="`cat /proc/cmdline`"
b9e861
b9e861
NOTE: The above will boot you back into the kernel you're currently running,
b9e861
if you want to load a different kernel, substitute it in place of `uname -r`.
b9e861
b9e861
Now reboot your system, taking note that it should bypass the BIOS:
b9e861
b9e861
    # reboot
b9e861
b9e861
399b37
How to configure kdump
399b37
======================
b9e861
b9e861
Again, we assume if you're reading this document, you should already have
b9e861
kexec-tools installed. If not, you install it via the following command:
b9e861
b9e861
    # yum install kexec-tools
b9e861
b9e861
To be able to do much of anything interesting in the way of debug analysis,
b9e861
you'll also need to install the kernel-debuginfo package, of the same arch
b9e861
as your running kernel, and the crash utility:
b9e861
b9e861
    # yum --enablerepo=\*debuginfo install kernel-debuginfo.$(uname -m) crash
b9e861
b9e861
Next up, we need to modify some boot parameters to reserve a chunk of memory for
b9e861
the capture kernel. With the help of grubby, it's very easy to append
b9e861
"crashkernel=128M" to the end of your kernel boot parameters. Note that the X
b9e861
values are such that X = the amount of memory to reserve for the capture kernel.
b9e861
And based on arch and system configuration, one might require more than 128M to
b9e861
be reserved for kdump. One need to experiment and test kdump, if 128M is not
b9e861
sufficient, try reserving more memory.
b9e861
b9e861
   # grubby --args="crashkernel=128M" --update-kernel=/boot/vmlinuz-`uname -r`
b9e861
b9e861
Note that there is an alternative form in which to specify a crashkernel
b9e861
memory reservation, in the event that more control is needed over the size and
b9e861
placement of the reserved memory.  The format is:
b9e861
b9e861
crashkernel=range1:size1[,range2:size2,...][@offset]
b9e861
b9e861
Where range<n> specifies a range of values that are matched against the amount
b9e861
of physical RAM present in the system, and the corresponding size<n> value
b9e861
specifies the amount of kexec memory to reserve.  For example:
b9e861
b9e861
crashkernel=512M-2G:64M,2G-:128M
b9e861
b9e861
This line tells kexec to reserve 64M of ram if the system contains between
b9e861
512M and 2G of physical memory.  If the system contains 2G or more of physical
b9e861
memory, 128M should be reserved.
b9e861
b9e861
Besides, since kdump needs to access /proc/kallsyms during a kernel
b9e861
loading if KASLR is enabled, check /proc/sys/kernel/kptr_restrict to
b9e861
make sure that the content of /proc/kallsyms is exposed correctly.
b9e861
We recommend to set the value of kptr_restrict to '1'. Otherwise
b9e861
capture kernel loading could fail.
b9e861
b9e861
After making said changes, reboot your system, so that the X MB of memory is
b9e861
left untouched by the normal system, reserved for the capture kernel. Take note
b9e861
that the output of 'free -m' will show X MB less memory than without this
b9e861
parameter, which is expected. You may be able to get by with less than 128M, but
b9e861
testing with only 64M has proven unreliable of late. On ia64, as much as 512M
b9e861
may be required.
b9e861
b9e861
Now that you've got that reserved memory region set up, you want to turn on
b9e861
the kdump init script:
b9e861
b9e861
    # chkconfig kdump on
b9e861
b9e861
Then, start up kdump as well:
b9e861
b9e861
    # systemctl start kdump.service
b9e861
b9e861
This should load your kernel-kdump image via kexec, leaving the system ready
b9e861
to capture a vmcore upon crashing. To test this out, you can force-crash
b9e861
your system by echo'ing a c into /proc/sysrq-trigger:
b9e861
b9e861
    # echo c > /proc/sysrq-trigger
b9e861
b9e861
You should see some panic output, followed by the system restarting into
b9e861
the kdump kernel. When the boot process gets to the point where it starts
b9e861
the kdump service, your vmcore should be copied out to disk (by default,
b9e861
in /var/crash/<YYYY-MM-DD-HH:MM>/vmcore), then the system rebooted back into
b9e861
your normal kernel.
b9e861
b9e861
Once back to your normal kernel, you can use the previously installed crash
b9e861
kernel in conjunction with the previously installed kernel-debuginfo to
b9e861
perform postmortem analysis:
b9e861
b9e861
    # crash /usr/lib/debug/lib/modules/2.6.17-1.2621.el5/vmlinux
b9e861
    /var/crash/2006-08-23-15:34/vmcore
b9e861
b9e861
    crash> bt
b9e861
b9e861
and so on...
b9e861
399b37
399b37
Notes on kdump
399b37
==============
b9e861
b9e861
When kdump starts, the kdump kernel is loaded together with the kdump
b9e861
initramfs. To save memory usage and disk space, the kdump initramfs is
b9e861
generated strictly against the system it will run on, and contains the
b9e861
minimum set of kernel modules and utilities to boot the machine to a stage
b9e861
where the dump target could be mounted.
b9e861
b9e861
With kdump service enabled, kdumpctl will try to detect possible system
b9e861
change and rebuild the kdump initramfs if needed. But it can not guarantee
b9e861
to cover every possible case. So after a hardware change, disk migration,
b9e861
storage setup update or any similar system level changes, it's highly
b9e861
recommended to rebuild the initramfs manually with following command:
b9e861
b9e861
    # kdumpctl rebuild
b9e861
399b37
b9e861
Saving vmcore-dmesg.txt
399b37
=======================
399b37
b9e861
Kernel log bufferes are one of the most important information available
b9e861
in vmcore. Now before saving vmcore, kernel log bufferes are extracted
b9e861
from /proc/vmcore and saved into a file vmcore-dmesg.txt. After
b9e861
vmcore-dmesg.txt, vmcore is saved. Destination disk and directory for
b9e861
vmcore-dmesg.txt is same as vmcore. Note that kernel log buffers will
b9e861
not be available if dump target is raw device.
b9e861
399b37
399b37
Dump Triggering methods
399b37
=======================
b9e861
b9e861
This section talks about the various ways, other than a Kernel Panic, in which
b9e861
Kdump can be triggered. The following methods assume that Kdump is configured
b9e861
on your system, with the scripts enabled as described in the section above.
b9e861
b9e861
1) AltSysRq C
b9e861
b9e861
Kdump can be triggered with the combination of the 'Alt','SysRq' and 'C'
b9e861
keyboard keys. Please refer to the following link for more details:
b9e861
b9e861
https://access.redhat.com/solutions/2023
b9e861
b9e861
In addition, on PowerPC boxes, Kdump can also be triggered via Hardware
b9e861
Management Console(HMC) using 'Ctrl', 'O' and 'C' keyboard keys.
b9e861
b9e861
2) NMI_WATCHDOG
b9e861
b9e861
In case a machine has a hard hang, it is quite possible that it does not
b9e861
respond to keyboard interrupts. As a result 'Alt-SysRq' keys will not help
b9e861
trigger a dump. In such scenarios Nmi Watchdog feature can prove to be useful.
b9e861
The following link has more details on configuring Nmi watchdog option.
b9e861
b9e861
https://access.redhat.com/solutions/125103
b9e861
b9e861
Once this feature has been enabled in the kernel, any lockups will result in an
b9e861
OOPs message to be generated, followed by Kdump being triggered.
b9e861
b9e861
3) Kernel OOPs
b9e861
b9e861
If we want to generate a dump everytime the Kernel OOPses, we can achieve this
b9e861
by setting the 'Panic On OOPs' option as follows:
b9e861
b9e861
    # echo 1 > /proc/sys/kernel/panic_on_oops
b9e861
b9e861
This is enabled by default on RHEL5.
b9e861
b9e861
4) NMI(Non maskable interrupt) button
b9e861
b9e861
In cases where the system is in a hung state, and is not accepting keyboard
b9e861
interrupts, using NMI button for triggering Kdump can be very useful. NMI
b9e861
button is present on most of the newer x86 and x86_64 machines. Please refer
b9e861
to the User guides/manuals to locate the button, though in most occasions it
b9e861
is not very well documented. In most cases it is hidden behind a small hole
b9e861
on the front or back panel of the machine. You could use a toothpick or some
b9e861
other non-conducting probe to press the button.
b9e861
b9e861
For example, on the IBM X series 366 machine, the NMI button is located behind
b9e861
a small hole on the bottom center of the rear panel.
b9e861
b9e861
To enable this method of dump triggering using NMI button, you will need to set
b9e861
the 'unknown_nmi_panic' option as follows:
b9e861
b9e861
   # echo 1 > /proc/sys/kernel/unknown_nmi_panic
b9e861
b9e861
5) PowerPC specific methods:
b9e861
b9e861
On IBM PowerPC machines, issuing a soft reset invokes the XMON debugger(if
b9e861
XMON is configured). To configure XMON one needs to compile the kernel with
b9e861
the CONFIG_XMON and CONFIG_XMON_DEFAULT options, or by compiling with
b9e861
CONFIG_XMON and booting the kernel with xmon=on option.
b9e861
b9e861
Following are the ways to remotely issue a soft reset on PowerPC boxes, which
b9e861
would drop you to XMON. Pressing a 'X' (capital alphabet X) followed by an
b9e861
'Enter' here will trigger the dump.
b9e861
b9e861
5.1) HMC
b9e861
b9e861
Hardware Management Console(HMC) available on Power4 and Power5 machines allow
b9e861
partitions to be reset remotely. This is specially useful in hang situations
b9e861
where the system is not accepting any keyboard inputs.
b9e861
b9e861
Once you have HMC configured, the following steps will enable you to trigger
b9e861
Kdump via a soft reset:
b9e861
b9e861
On Power4
b9e861
  Using GUI
b9e861
b9e861
    * In the right pane, right click on the partition you wish to dump.
b9e861
    * Select "Operating System->Reset".
b9e861
    * Select "Soft Reset".
b9e861
    * Select "Yes".
b9e861
b9e861
  Using HMC Commandline
b9e861
b9e861
    # reset_partition -m <machine> -p <partition> -t soft
b9e861
b9e861
On Power5
b9e861
  Using GUI
b9e861
b9e861
    * In the right pane, right click on the partition you wish to dump.
b9e861
    * Select "Restart Partition".
b9e861
    * Select "Dump".
b9e861
    * Select "OK".
b9e861
b9e861
  Using HMC Commandline
b9e861
b9e861
    # chsysstate -m <managed system name> -n <lpar name> -o dumprestart -r lpar
b9e861
b9e861
5.2) Blade Management Console for Blade Center
b9e861
b9e861
To initiate a dump operation, go to Power/Restart option under "Blade Tasks" in
b9e861
the Blade Management Console. Select the corresponding blade for which you want
b9e861
to initate the dump and then click "Restart blade with NMI". This issues a
b9e861
system reset and invokes xmon debugger.
b9e861
b9e861
399b37
Dump targets
399b37
============
b9e861
b9e861
In addition to being able to capture a vmcore to your system's local file
b9e861
system, kdump can be configured to capture a vmcore to a number of other
b9e861
locations, including a raw disk partition, a dedicated file system, an NFS
b9e861
mounted file system, or a remote system via ssh/scp. Additional options
b9e861
exist for specifying the relative path under which the dump is captured,
b9e861
what to do if the capture fails, and for compressing and filtering the dump
399b37
(so as to produce smaller, more manageable, vmcore files, see "Advanced Setups"
399b37
for more detail on these options).
b9e861
b9e861
In theory, dumping to a location other than the local file system should be
b9e861
safer than kdump's default setup, as its possible the default setup will try
b9e861
dumping to a file system that has become corrupted. The raw disk partition and
b9e861
dedicated file system options allow you to still dump to the local system,
b9e861
but without having to remount your possibly corrupted file system(s),
b9e861
thereby decreasing the chance a vmcore won't be captured. Dumping to an
b9e861
NFS server or remote system via ssh/scp also has this advantage, as well
b9e861
as allowing for the centralization of vmcore files, should you have several
b9e861
systems from which you'd like to obtain vmcore files. Of course, note that
b9e861
these configurations could present problems if your network is unreliable.
b9e861
399b37
Kdump target and advanced setups are configured via modifications to
399b37
/etc/kdump.conf, which out of the box, is fairly well documented itself.
399b37
Any alterations to /etc/kdump.conf should be followed by a restart of the
399b37
kdump service, so the changes can be incorporated in the kdump initrd.
399b37
Restarting the kdump service is as simple as '/sbin/systemctl restart kdump.service'.
399b37
399b37
There are two ways to config the dump target, config dump target only
399b37
using "path", and config dump target explicitly. Interpretation of "path"
399b37
also differs in two config styles.
399b37
399b37
Config dump target only using "path"
399b37
------------------------------------
399b37
399b37
You can change the dump target by setting "path" to a mount point where
399b37
dump target is mounted. When there is no explicitly configured dump target,
399b37
"path" in kdump.conf represents the current file system path in which vmcore
399b37
will be saved.  Kdump will automatically detect the underlying device of
399b37
"path" and use that as the dump target.
399b37
399b37
In fact, upon dump, kdump creates a directory $hostip-$date with-in "path"
399b37
and saves vmcore there. So practically dump is saved in $path/$hostip-$date/.
399b37
399b37
Kdump will only check current mount status for mount entry corresponding to
399b37
"path". So please ensure the dump target is mounted on "path" before kdump
399b37
service starts.
399b37
399b37
NOTES:
399b37
399b37
- It's strongly recommanded to put an mount entry for "path" in /etc/fstab
399b37
  and have it auto mounted on boot. This make sure the dump target is
399b37
  reachable from the machine and kdump's configuration is stable.
399b37
399b37
EXAMPLES:
399b37
399b37
- path /var/crash/
399b37
399b37
  This is the default configuration. Assuming there is no disk mounted
399b37
  on /var/ or on /var/crash, dump will be saved on disk backing rootfs
399b37
  in directory /var/crash.
399b37
399b37
- path /var/crash/ (A separate disk mounted on /var/crash)
399b37
399b37
  Say a disk /dev/sdb is mounted on /var. In this case dump target will
399b37
  become /dev/sdb and path will become "/" and dump will be saved
399b37
  on "sdb:/var/crash/" directory.
399b37
399b37
- path /var/crash/ (NFS mounted on /var)
399b37
399b37
  Say foo.com:/export/tmp is mounted on /var. In this case dump target is
399b37
  nfs server and path will be adjusted to "/crash" and dump will be saved to
399b37
  foo.com:/export/tmp/crash/ directory.
399b37
399b37
Config dump target explicitely
399b37
------------------------------
399b37
399b37
You can set the dump target explicitly in kdump.conf, and "path" will be
399b37
the relative path in the specified dump target. For example, if dump
399b37
target is "ext4 /dev/sda", then dump will be saved in "path" directory
399b37
on /dev/sda.
399b37
399b37
Same is the case for nfs dump. If user specified "nfs foo.com:/export/tmp/"
399b37
as dump target, then dump will effectively be saved in
399b37
"foo.com:/export/tmp/var/crash/" directory.
399b37
399b37
If the dump target is "raw", then "path" is ignored.
399b37
399b37
If it's a filesystem target, kdump will need to know the right mount option.
399b37
Kdump will check current mount status, and then /etc/fstab for mount options
399b37
corresponding to the specified dump target and use it. If there are
399b37
special mount option required for the dump target, it could be set by put
399b37
an entry in fstab.
399b37
399b37
If there are no related mount entry, mount option is set to "defaults".
399b37
399b37
NOTES:
399b37
399b37
- It's recommended to put an entry for the dump target in /etc/fstab
399b37
  and have it auto mounted on boot. This make sure the dump target is
399b37
  reachable from the machine and kdump won't fail.
399b37
399b37
- Kdump ignores some mount options, including "noauto", "ro". This
399b37
  make it possible to keep the dump target unmounted or read-only
399b37
  when not used.
399b37
399b37
EXAMPLES:
399b37
399b37
- ext4 /dev/sda (mounted)
399b37
  path /var/crash/
399b37
399b37
  In this case dump target is set to /dev/sdb, path is the absolute path
399b37
  "/var/crash" in /dev/sda, vmcore path will saved on
399b37
  "sda:/var/crash" directory.
399b37
399b37
- nfs foo.com:/export/tmp (mounted)
399b37
  path /var/crash/
399b37
399b37
  In this case dump target is nfs server, path is the absolute path
399b37
  "/var/crash", vmcore path will saved on "foo.com:/export/tmp/crash/" directory.
399b37
399b37
- nfs foo.com:/export/tmp (not mounted)
399b37
  path /var/crash/
399b37
399b37
  Same with above case, kdump will use "defaults" as the mount option
399b37
  for the dump target.
399b37
399b37
- nfs foo.com:/export/tmp (not mounted, entry with option "noauto,nolock" exists in /etc/fstab)
399b37
  path /var/crash/
399b37
399b37
  In this case dump target is nfs server, vmcore path will saved on
399b37
  "foo.com:/export/tmp/crash/" directory, and kdump will inherit "nolock" option.
399b37
399b37
Dump target and mkdumprd
399b37
------------------------
399b37
399b37
MKdumprd is the tool used to create kdump initramfs, and it may change
399b37
the mount status of the dump target in some condition.
399b37
b9e861
Usually the dump target should be used only for kdump. If you worry about
b9e861
someone uses the filesystem for something else other than dumping vmcore
399b37
you can mount it as read-only or make it a noauto mount. Mkdumprd will
399b37
mount/remount it as read-write for creating dump directory and will
399b37
move it back to it's original state afterwards.
b9e861
399b37
Supported dump target types and requirements
399b37
--------------------------------------------
399b37
399b37
1) Raw partition
b9e861
b9e861
Raw partition dumping requires that a disk partition in the system, at least
b9e861
as large as the amount of memory in the system, be left unformatted. Assuming
b9e861
/dev/vg/lv_kdump is left unformatted, kdump.conf can be configured with
b9e861
'raw /dev/vg/lv_kdump', and the vmcore file will be copied via dd directly
b9e861
onto partition /dev/vg/lv_kdump. Restart the kdump service via
b9e861
'/sbin/systemctl restart kdump.service' to commit this change to your kdump
b9e861
initrd. Dump target should be persistent device name, such as lvm or device
b9e861
mapper canonical name.
b9e861
399b37
2) Dedicated file system
b9e861
b9e861
Similar to raw partition dumping, you can format a partition with the file
b9e861
system of your choice, Again, it should be at least as large as the amount
b9e861
of memory in the system. Assuming it should be at least as large as the
b9e861
amount of memory in the system. Assuming /dev/vg/lv_kdump has been
b9e861
formatted ext4, specify 'ext4 /dev/vg/lv_kdump' in kdump.conf, and a
b9e861
vmcore file will be copied onto the file system after it has been mounted.
b9e861
Dumping to a dedicated partition has the advantage that you can dump multiple
b9e861
vmcores to the file system, space permitting, without overwriting previous ones,
b9e861
as would be the case in a raw partition setup. Restart the kdump service via
b9e861
'/sbin/systemctl restart kdump.service' to commit this change to
b9e861
your kdump initrd.  Note that for local file systems ext4 and ext2 are
b9e861
supported as dumpable targets.  Kdump will not prevent you from specifying
b9e861
other filesystems, and they will most likely work, but their operation
b9e861
cannot be guaranteed.  for instance specifying a vfat filesystem or msdos
b9e861
filesystem will result in a successful load of the kdump service, but during
b9e861
crash recovery, the dump will fail if the system has more than 2GB of memory
b9e861
(since vfat and msdos filesystems do not support more than 2GB files).
b9e861
Be careful of your filesystem selection when using this target.
b9e861
b9e861
It is recommended to use persistent device names or UUID/LABEL for file system
b9e861
dumps. One example of persistent device is /dev/vg/<devname>.
b9e861
399b37
3) NFS mount
b9e861
b9e861
Dumping over NFS requires an NFS server configured to export a file system
b9e861
with full read/write access for the root user. All operations done within
b9e861
the kdump initial ramdisk are done as root, and to write out a vmcore file,
b9e861
we obviously must be able to write to the NFS mount. Configuring an NFS
b9e861
server is outside the scope of this document, but either the no_root_squash
b9e861
or anonuid options on the NFS server side are likely of interest to permit
b9e861
the kdump initrd operations write to the NFS mount as root.
b9e861
b9e861
Assuming your're exporting /dump on the machine nfs-server.example.com,
b9e861
once the mount is properly configured, specify it in kdump.conf, via
b9e861
'nfs nfs-server.example.com:/dump'. The server portion can be specified either
b9e861
by host name or IP address. Following a system crash, the kdump initrd will
b9e861
mount the NFS mount and copy out the vmcore to your NFS server. Restart the
b9e861
kdump service via '/sbin/systemctl restart kdump.service' to commit this change
b9e861
to your kdump initrd.
b9e861
399b37
4) Special mount via "dracut_args"
b9e861
b9e861
You can utilize "dracut_args" to pass "--mount" to kdump, see dracut manpage
b9e861
about the format of "--mount" for details. If there is any "--mount" specified
b9e861
via "dracut_args", kdump will build it as the mount target without doing any
b9e861
validation (mounting or checking like mount options, fs size, save path, etc),
b9e861
so you must test it to ensure all the correctness. You cannot use other targets
b9e861
in /etc/kdump.conf if you use "--mount" in "dracut_args". You also cannot specify
b9e861
mutliple "--mount" targets via "dracut_args".
b9e861
b9e861
One use case of "--mount" in "dracut_args" is you do not want to mount dump target
b9e861
before kdump service startup, for example, to reduce the burden of the shared nfs
b9e861
server. Such as the example below:
b9e861
dracut_args --mount "192.168.1.1:/share /mnt/test nfs4 defaults"
b9e861
b9e861
NOTE:
b9e861
- <mountpoint> must be specified as an absolute path.
b9e861
399b37
5) Remote system via ssh/scp
b9e861
b9e861
Dumping over ssh/scp requires setting up passwordless ssh keys for every
b9e861
machine you wish to have dump via this method. First up, configure kdump.conf
b9e861
for ssh/scp dumping, adding a config line of 'ssh user@server', where 'user'
b9e861
can be any user on the target system you choose, and 'server' is the host
b9e861
name or IP address of the target system. Using a dedicated, restricted user
b9e861
account on the target system is recommended, as there will be keyless ssh
b9e861
access to this account.
b9e861
b9e861
Once kdump.conf is appropriately configured, issue the command
b9e861
'kdumpctl propagate' to automatically set up the ssh host keys and transmit
b9e861
the necessary bits to the target server. You'll have to type in 'yes'
b9e861
to accept the host key for your targer server if this is the first time
b9e861
you've connected to it, and then input the target system user's password
b9e861
to send over the necessary ssh key file. Restart the kdump service via
b9e861
'/sbin/systemctl restart kdump.service' to commit this change to your kdump initrd.
b9e861
399b37
Advanced Setups
399b37
===============
b9e861
b9e861
Kdump boot directory
399b37
--------------------
399b37
b9e861
Usually kdump kernel is the same as 1st kernel. So kdump will try to find
b9e861
kdump kernel under /boot according to /proc/cmdline. E.g we execute below
b9e861
command and get an output:
b9e861
	cat /proc/cmdline
b9e861
	BOOT_IMAGE=/xxx/vmlinuz-3.yyy.zzz  root=xxxx .....
b9e861
Then kdump kernel will be /boot/xxx/vmlinuz-3.yyy.zzz.
b9e861
However a variable KDUMP_BOOTDIR in /etc/sysconfig/kdump is provided to
b9e861
user if kdump kernel is put in a different directory.
b9e861
b9e861
Kdump Post-Capture Executable
399b37
-----------------------------
b9e861
b9e861
It is possible to specify a custom script or binary you wish to run following
b9e861
an attempt to capture a vmcore. The executable is passed an exit code from
b9e861
the capture process, which can be used to trigger different actions from
b9e861
within your post-capture executable.
a6b1d2
If /etc/kdump/post.d directory exist, All files in the directory are
a6b1d2
collectively sorted and executed in lexical order, before binary or script
a6b1d2
specified kdump_post parameter is executed.
b9e861
b9e861
Kdump Pre-Capture Executable
399b37
----------------------------
b9e861
b9e861
It is possible to specify a custom script or binary you wish to run before
b9e861
capturing a vmcore. Exit status of this binary is interpreted:
b9e861
0 - continue with dump process as usual
e7b8b8
non 0 - run the final action (reboot/poweroff/halt)
a6b1d2
If /etc/kdump/pre.d directory exists, all files in the directory are collectively
a6b1d2
sorted and executed in lexical order, after binary or script specified
a6b1d2
kdump_pre parameter is executed.
a6b1d2
Even if the binary or script in /etc/kdump/pre.d directory returns non 0
a6b1d2
exit status, the processing is continued.
b9e861
b9e861
Extra Binaries
399b37
--------------
b9e861
b9e861
If you have specific binaries or scripts you want to have made available
b9e861
within your kdump initrd, you can specify them by their full path, and they
b9e861
will be included in your kdump initrd, along with all dependent libraries.
b9e861
This may be particularly useful for those running post-capture scripts that
b9e861
rely on other binaries.
b9e861
b9e861
Extra Modules
399b37
-------------
b9e861
b9e861
By default, only the bare minimum of kernel modules will be included in your
b9e861
kdump initrd. Should you wish to capture your vmcore files to a non-boot-path
b9e861
storage device, such as an iscsi target disk or clustered file system, you may
b9e861
need to manually specify additional kernel modules to load into your kdump
b9e861
initrd.
b9e861
b9e861
Failure action
399b37
--------------
399b37
b9e861
Failure action specifies what to do when dump to configured dump target
b9e861
fails. By default, failure action is "reboot" and that is system reboots
b9e861
if attempt to save dump to dump target fails.
b9e861
b9e861
There are other failure actions available though.
b9e861
b9e861
- dump_to_rootfs
399b37
  This option tries to mount root and save dump on root filesystem
399b37
  in a path specified by "path". This option will generally make
399b37
  sense when dump target is not root filesystem. For example, if
399b37
  dump is being saved over network using "ssh" then one can specify
399b37
  failure action to "dump_to_rootfs" to try saving dump to root
399b37
  filesystem if dump over network fails.
b9e861
b9e861
- shell
399b37
  Drop into a shell session inside initramfs.
399b37
b9e861
- halt
399b37
  Halt system after failure
399b37
b9e861
- poweroff
399b37
  Poweroff system after failure.
b9e861
b9e861
Compression and filtering
399b37
-------------------------
b9e861
b9e861
The 'core_collector' parameter in kdump.conf allows you to specify a custom
b9e861
dump capture method. The most common alternate method is makedumpfile, which
b9e861
is a dump filtering and compression utility provided with kexec-tools. On
b9e861
some architectures, it can drastically reduce the size of your vmcore files,
b9e861
which becomes very useful on systems with large amounts of memory.
b9e861
b9e861
A typical setup is 'core_collector makedumpfile -F -l --message-level 1 -d 31',
b9e861
but check the output of '/sbin/makedumpfile --help' for a list of all available
b9e861
options (-i and -g don't need to be specified, they're automatically taken care
b9e861
of). Note that use of makedumpfile requires that the kernel-debuginfo package
b9e861
corresponding with your running kernel be installed.
b9e861
b9e861
Core collector command format depends on dump target type. Typically for
b9e861
filesystem (local/remote), core_collector should accept two arguments.
b9e861
First one is source file and second one is target file. For ex.
b9e861
399b37
- ex1.
399b37
399b37
  core_collector "cp --sparse=always"
b9e861
399b37
  Above will effectively be translated to:
b9e861
399b37
  cp --sparse=always /proc/vmcore <dest-path>/vmcore
b9e861
399b37
- ex2.
b9e861
399b37
  core_collector "makedumpfile -l --message-level 1 -d 31"
b9e861
399b37
  Above will effectively be translated to:
b9e861
399b37
  makedumpfile -l --message-level 1 -d 31 /proc/vmcore <dest-path>/vmcore
b9e861
b9e861
For dump targets like raw and ssh, in general, core collector should expect
b9e861
one argument (source file) and should output the processed core on standard
b9e861
output (There is one exception of "scp", discussed later). This standard
b9e861
output will be saved to destination using appropriate commands.
b9e861
b9e861
raw dumps core_collector examples:
b9e861
399b37
- ex3.
399b37
399b37
  core_collector "cat"
b9e861
399b37
  Above will effectively be translated to.
b9e861
399b37
  cat /proc/vmcore | dd of=<target-device>
b9e861
399b37
- ex4.
b9e861
399b37
  core_collector "makedumpfile -F -l --message-level 1 -d 31"
399b37
399b37
  Above will effectively be translated to.
399b37
399b37
  makedumpfile -F -l --message-level 1 -d 31 | dd of=<target-device>
b9e861
b9e861
ssh dumps core_collector examples:
b9e861
399b37
- ex5.
399b37
399b37
  core_collector "cat"
b9e861
399b37
  Above will effectively be translated to.
b9e861
399b37
  cat /proc/vmcore | ssh <options> <remote-location> "dd of=path/vmcore"
b9e861
399b37
- ex6.
b9e861
399b37
  core_collector "makedumpfile -F -l --message-level 1 -d 31"
399b37
399b37
  Above will effectively be translated to.
399b37
399b37
  makedumpfile -F -l --message-level 1 -d 31 | ssh <options> <remote-location> "dd of=path/vmcore"
b9e861
b9e861
There is one exception to standard output rule for ssh dumps. And that is
b9e861
scp. As scp can handle ssh destinations for file transfers, one can
b9e861
specify "scp" as core collector for ssh targets (no output on stdout).
b9e861
399b37
- ex7.
399b37
399b37
  core_collector "scp"
b9e861
399b37
  Above will effectively be translated to.
b9e861
399b37
  scp /proc/vmcore <user@host>:path/vmcore
b9e861
b9e861
About default core collector
b9e861
----------------------------
399b37
b9e861
Default core_collector for ssh/raw dump is:
b9e861
"makedumpfile -F -l --message-level 1 -d 31".
b9e861
Default core_collector for other targets is:
b9e861
"makedumpfile -l --message-level 1 -d 31".
b9e861
b9e861
Even if core_collector option is commented out in kdump.conf, makedumpfile
b9e861
is default core collector and kdump uses it internally.
b9e861
b9e861
If one does not want makedumpfile as default core_collector, then they
b9e861
need to specify one using core_collector option to change the behavior.
b9e861
b9e861
Note: If "makedumpfile -F" is used then you will get a flattened format
b9e861
vmcore.flat, you will need to use "makedumpfile -R" to rearrange the
b9e861
dump data from stdard input to a normal dumpfile (readable with analysis
b9e861
tools).
b9e861
For example: "makedumpfile -R vmcore < vmcore.flat"
b9e861
399b37
399b37
Caveats
399b37
=======
b9e861
b9e861
Console frame-buffers and X are not properly supported. If you typically run
b9e861
with something along the lines of "vga=791" in your kernel config line or
b9e861
have X running, console video will be garbled when a kernel is booted via
b9e861
kexec. Note that the kdump kernel should still be able to create a dump,
b9e861
and when the system reboots, video should be restored to normal.
b9e861
b9e861
399b37
Notes
399b37
=====
399b37
b9e861
Notes on resetting video:
399b37
-------------------------
b9e861
b9e861
Video is a notoriously difficult issue with kexec.  Video cards contain ROM code
b9e861
that controls their initial configuration and setup.  This code is nominally
b9e861
accessed and executed from the Bios, and otherwise not safely executable. Since
b9e861
the purpose of kexec is to reboot the system without re-executing the Bios, it
b9e861
is rather difficult if not impossible to reset video cards with kexec.  The
b9e861
result is, that if a system crashes while running in a graphical mode (i.e.
b9e861
running X), the screen may appear to become 'frozen' while the dump capture is
b9e861
taking place.  A serial console will of course reveal that the system is
b9e861
operating and capturing a vmcore image, but a casual observer will see the
b9e861
system as hung until the dump completes and a true reboot is executed.
b9e861
b9e861
There are two possiblilties to work around this issue.  One is by adding
b9e861
--reset-vga to the kexec command line options in /etc/sysconfig/kdump.  This
b9e861
tells kdump to write some reasonable default values to the video card register
b9e861
file, in the hopes of returning it to a text mode such that boot messages are
b9e861
visible on the screen.  It does not work with all video cards however.
b9e861
Secondly, it may be worth trying to add vga15fb.ko to the extra_modules list in
b9e861
/etc/kdump.conf.  This will attempt to use the video card in framebuffer mode,
b9e861
which can blank the screen prior to the start of a dump capture.
b9e861
399b37
Notes on rootfs mount
399b37
---------------------
399b37
b9e861
Dracut is designed to mount rootfs by default. If rootfs mounting fails it
b9e861
will refuse to go on. So kdump leaves rootfs mounting to dracut currently.
b9e861
We make the assumtion that proper root= cmdline is being passed to dracut
b9e861
initramfs for the time being. If you need modify "KDUMP_COMMANDLINE=" in
b9e861
/etc/sysconfig/kdump, you will need to make sure that appropriate root=
b9e861
options are copied from /proc/cmdline. In general it is best to append
b9e861
command line options using "KDUMP_COMMANDLINE_APPEND=" instead of replacing
b9e861
the original command line completely.
b9e861
399b37
Notes on watchdog module handling
399b37
---------------------------------
b9e861
b9e861
If a watchdog is active in first kernel then, we must have it's module
b9e861
loaded in crash kernel, so that either watchdog is deactivated or started
b9e861
being kicked in second kernel. Otherwise, we might face watchdog reboot
b9e861
when vmcore is being saved. When dracut watchdog module is enabled, it
b9e861
installs kernel watchdog module of active watchdog device in initrd.
b9e861
kexec-tools always add "-a watchdog" to the dracut_args if there exists at
b9e861
least one active watchdog and user has not added specifically "-o watchdog"
b9e861
in dracut_args of kdump.conf. If a watchdog module (such as hp_wdt) has
b9e861
not been written in watchdog-core framework then this option will not have
b9e861
any effect and module will not be added. Please note that only systemd
b9e861
watchdog daemon is supported as watchdog kick application.
b9e861
399b37
Notes for disk images
399b37
---------------------
b9e861
b9e861
Kdump initramfs is a critical component for capturing the crash dump.
b9e861
But it's strictly generated for the machine it will run on, and have
b9e861
no generality. If you install a new machine with a previous disk image
b9e861
(eg. VMs created with disk image or snapshot), kdump could be broken
b9e861
easily due to hardware changes or disk ID changes. So it's strongly
b9e861
recommended to not include the kdump initramfs in the disk image in the
b9e861
first place, this helps to save space, and kdumpctl will build the
b9e861
initramfs automatically if it's missing. If you have already installed
b9e861
a machine with a disk image which have kdump initramfs embedded, you
b9e861
should rebuild the initramfs using "kdumpctl rebuild" command manually,
b9e861
or else kdump may not work as expeceted.
b9e861
399b37
Notes on encrypted dump target
399b37
------------------------------
b9e861
b9e861
Currently, kdump is not working well with encrypted dump target.
b9e861
First, user have to give the password manually in capture kernel,
b9e861
so a working interactive terminal is required in the capture kernel.
b9e861
And another major issue is that an OOM problem will occur with certain
b9e861
encryption setup. For example, the default setup for LUKS2 will use a
b9e861
memory hard key derivation function to mitigate brute force attach,
b9e861
it's impossible to reduce the memory usage for mounting the encrypted
b9e861
target. In such case, you have to either reserved enough memory for
b9e861
crash kernel according, or update your encryption setup.
b9e861
It's recommanded to use a non-encrypted target (eg. remote target)
b9e861
instead.
b9e861
399b37
Notes on device dump
399b37
--------------------
b9e861
b9e861
Device dump allows drivers to append dump data to vmcore, so you can
b9e861
collect driver specified debug info. The drivers could append the
b9e861
data without any limit, and the data is stored in memory, this may
b9e861
bring a significant memory stress. So device dump is disabled by default
b9e861
by passing "novmcoredd" command line option to the kdump capture kernel.
b9e861
If you want to collect debug data with device dump, you need to modify
b9e861
"KDUMP_COMMANDLINE_APPEND=" value in /etc/sysconfig/kdump and remove the
b9e861
"novmcoredd" option. You also need to increase the "crashkernel=" value
b9e861
accordingly in case of OOM issue.
b9e861
Besides, kdump initramfs won't automatically include the device drivers
b9e861
which support device dump, only device drivers that are required for
b9e861
the dump target setup will be included. To ensure the device dump data
b9e861
will be included in the vmcore, you need to force include related
b9e861
device drivers by using "extra_modules" option in /etc/kdump.conf
b9e861
399b37
b9e861
Parallel Dumping Operation
b9e861
==========================
399b37
b9e861
Kexec allows kdump using multiple cpus. So parallel feature can accelerate
b9e861
dumping substantially, especially in executing compression and filter.
b9e861
For example:
b9e861
b9e861
	1."makedumpfile -c --num-threads [THREAD_NUM] /proc/vmcore dumpfile"
b9e861
	2."makedumpfile -c /proc/vmcore dumpfile",
b9e861
b9e861
	1 has better performance than 2, if THREAD_NUM is larger than two
b9e861
	and the usable cpus number is larger than THREAD_NUM.
b9e861
b9e861
Notes on how to use multiple cpus on a capture kernel on x86 system:
b9e861
b9e861
Make sure that you are using a kernel that supports disable_cpu_apicid
b9e861
kernel option as a capture kernel, which is needed to avoid x86 specific
b9e861
hardware issue (*). The disable_cpu_apicid kernel option is automatically
b9e861
appended by kdumpctl script and is ignored if the kernel doesn't support it.
b9e861
b9e861
You need to specify how many cpus to be used in a capture kernel by specifying
b9e861
the number of cpus in nr_cpus kernel option in /etc/sysconfig/kdump. nr_cpus
b9e861
is 1 at default.
b9e861
b9e861
You should use necessary and sufficient number of cpus on a capture kernel.
b9e861
Warning: Don't use too many cpus on a capture kernel, or the capture kernel
b9e861
may lead to panic due to Out Of Memory.
b9e861
b9e861
(*) Without disable_cpu_apicid kernel option, capture kernel may lead to
b9e861
hang, system reset or power-off at boot, depending on your system and runtime
b9e861
situation at the time of crash.
b9e861
399b37
b9e861
Debugging Tips
399b37
==============
399b37
b9e861
- One can drop into a shell before/after saving vmcore with the help of
b9e861
  using kdump_pre/kdump_post hooks. Use following in one of the pre/post
b9e861
  scripts to drop into a shell.
b9e861
b9e861
  #!/bin/bash
b9e861
  _ctty=/dev/ttyS0
b9e861
  setsid /bin/sh -i -l 0<>$_ctty 1<>$_ctty 2<>$_ctty
b9e861
b9e861
  One might have to change the terminal depending on what they are using.
b9e861
b9e861
- Serial console logging for virtual machines
b9e861
b9e861
  I generally use "virsh console <domain-name>" to get to serial console.
b9e861
  I noticed after dump saving system reboots and when grub menu shows up
b9e861
  some of the previously logged messages are no more there. That means
b9e861
  any important debugging info at the end will be lost.
b9e861
b9e861
  One can log serial console as follows to make sure messages are not lost.
b9e861
b9e861
  virsh ttyconsole <domain-name>
b9e861
  ln -s <name-of-tty> /dev/modem
b9e861
  minicom -C /tmp/console-logs
b9e861
b9e861
  Now minicom should be logging serial console in file console-logs.