23ef29
Kexec/Kdump HOWTO
23ef29
23ef29
Introduction
23ef29
23ef29
Kexec and kdump are new features in the 2.6 mainstream kernel. These features
23ef29
are included in Red Hat Enterprise Linux 5. The purpose of these features
23ef29
is to ensure faster boot up and creation of reliable kernel vmcores for
23ef29
diagnostic purposes.
23ef29
23ef29
Overview
23ef29
23ef29
Kexec
23ef29
23ef29
Kexec is a fastboot mechanism which allows booting a Linux kernel from the
23ef29
context of already running kernel without going through BIOS. BIOS can be very
23ef29
time consuming especially on the big servers with lots of peripherals. This can
23ef29
save a lot of time for developers who end up booting a machine numerous times.
23ef29
23ef29
Kdump
23ef29
23ef29
Kdump is a new kernel crash dumping mechanism and is very reliable because
23ef29
the crash dump is captured from the context of a freshly booted kernel and
23ef29
not from the context of the crashed kernel. Kdump uses kexec to boot into
23ef29
a second kernel whenever system crashes. This second kernel, often called
23ef29
a capture kernel, boots with very little memory and captures the dump image.
23ef29
23ef29
The first kernel reserves a section of memory that the second kernel uses
23ef29
to boot. Kexec enables booting the capture kernel without going through BIOS
23ef29
hence contents of first kernel's memory are preserved, which is essentially
23ef29
the kernel crash dump.
23ef29
23ef29
Kdump is supported on the i686, x86_64, ia64 and ppc64 platforms. The
23ef29
standard kernel and capture kernel are one in the same on i686, x86_64,
23ef29
ia64 and ppc64.
23ef29
23ef29
If you're reading this document, you should already have kexec-tools
23ef29
installed. If not, you install it via the following command:
23ef29
23ef29
    # yum install kexec-tools
23ef29
23ef29
Now load a kernel with kexec:
23ef29
23ef29
    # kver=`uname -r` # kexec -l /boot/vmlinuz-$kver
23ef29
    --initrd=/boot/initrd-$kver.img \
23ef29
        --command-line="`cat /proc/cmdline`"
23ef29
23ef29
NOTE: The above will boot you back into the kernel you're currently running,
23ef29
if you want to load a different kernel, substitute it in place of `uname -r`.
23ef29
23ef29
Now reboot your system, taking note that it should bypass the BIOS:
23ef29
23ef29
    # reboot
23ef29
23ef29
23ef29
How to configure kdump:
23ef29
23ef29
Again, we assume if you're reading this document, you should already have
23ef29
kexec-tools installed. If not, you install it via the following command:
23ef29
23ef29
    # yum install kexec-tools
23ef29
23ef29
To be able to do much of anything interesting in the way of debug analysis,
23ef29
you'll also need to install the kernel-debuginfo package, of the same arch
23ef29
as your running kernel, and the crash utility:
23ef29
23ef29
    # yum --enablerepo=\*debuginfo install kernel-debuginfo.$(uname -m) crash
23ef29
23ef29
Next up, we need to modify some boot parameters to reserve a chunk of memory for
23ef29
the capture kernel. With the help of grubby, it's very easy to append
23ef29
"crashkernel=128M" to the end of your kernel boot parameters. Note that the X
23ef29
values are such that X = the amount of memory to reserve for the capture kernel.
23ef29
And based on arch and system configuration, one might require more than 128M to
23ef29
be reserved for kdump. One need to experiment and test kdump, if 128M is not
23ef29
sufficient, try reserving more memory.
23ef29
23ef29
   # grubby --args="crashkernel=128M" --update-kernel=/boot/vmlinuz-`uname -r`
23ef29
23ef29
Note that there is an alternative form in which to specify a crashkernel
23ef29
memory reservation, in the event that more control is needed over the size and
23ef29
placement of the reserved memory.  The format is:
23ef29
23ef29
crashkernel=range1:size1[,range2:size2,...][@offset]
23ef29
23ef29
Where range<n> specifies a range of values that are matched against the amount
23ef29
of physical RAM present in the system, and the corresponding size<n> value
23ef29
specifies the amount of kexec memory to reserve.  For example:
23ef29
23ef29
crashkernel=512M-2G:64M,2G-:128M
23ef29
23ef29
This line tells kexec to reserve 64M of ram if the system contains between
23ef29
512M and 2G of physical memory.  If the system contains 2G or more of physical
23ef29
memory, 128M should be reserved.
23ef29
23ef29
You can also use the default crashkernel=auto to let kernel set the
23ef29
crashkernel size.
23ef29
23ef29
crashkernel=auto indicates a best effort estimation for usual use cases,
23ef29
however one still needs do a test to ensure that the kernel reserved
23ef29
memory size is enough.
23ef29
23ef29
NOTE:
23ef29
When a debug variant kernel is used as the capture kernel and the
23ef29
primary kernel was booted with 'crashkernel=auto' set in the bootargs,
23ef29
the capture kernel boot can fail.
23ef29
23ef29
A debug variant kernel usually is the same stable kernel with some
23ef29
debug options enabled which uses much more memory in the kdump kernel.
23ef29
Thus when you use 'crashkernel=auto', kdump kernel will likely run out
23ef29
of memory.
23ef29
23ef29
So it is not advisable to use a debug variant kernel as the capture
23ef29
kernel when primary kernel is booted with 'crashkernel=auto' set in
23ef29
bootargs.
23ef29
23ef29
After making said changes, reboot your system, so that the X MB of memory is
23ef29
left untouched by the normal system, reserved for the capture kernel. Take note
23ef29
that the output of 'free -m' will show X MB less memory than without this
23ef29
parameter, which is expected. You may be able to get by with less than 128M, but
23ef29
testing with only 64M has proven unreliable of late. On ia64, as much as 512M
23ef29
may be required.
23ef29
23ef29
Now that you've got that reserved memory region set up, you want to turn on
23ef29
the kdump init script:
23ef29
23ef29
    # chkconfig kdump on
23ef29
23ef29
Then, start up kdump as well:
23ef29
23ef29
    # systemctl start kdump.service
23ef29
23ef29
This should load your kernel-kdump image via kexec, leaving the system ready
23ef29
to capture a vmcore upon crashing. To test this out, you can force-crash
23ef29
your system by echo'ing a c into /proc/sysrq-trigger:
23ef29
23ef29
    # echo c > /proc/sysrq-trigger
23ef29
23ef29
You should see some panic output, followed by the system restarting into
23ef29
the kdump kernel. When the boot process gets to the point where it starts
23ef29
the kdump service, your vmcore should be copied out to disk (by default,
23ef29
in /var/crash/<YYYY-MM-DD-HH:MM>/vmcore), then the system rebooted back into
23ef29
your normal kernel.
23ef29
23ef29
Once back to your normal kernel, you can use the previously installed crash
23ef29
kernel in conjunction with the previously installed kernel-debuginfo to
23ef29
perform postmortem analysis:
23ef29
23ef29
    # crash /usr/lib/debug/lib/modules/2.6.17-1.2621.el5/vmlinux
23ef29
    /var/crash/2006-08-23-15:34/vmcore
23ef29
23ef29
    crash> bt
23ef29
23ef29
and so on...
23ef29
23ef29
Saving vmcore-dmesg.txt
23ef29
----------------------
23ef29
Kernel log bufferes are one of the most important information available
23ef29
in vmcore. Now before saving vmcore, kernel log bufferes are extracted
23ef29
from /proc/vmcore and saved into a file vmcore-dmesg.txt. After
23ef29
vmcore-dmesg.txt, vmcore is saved. Destination disk and directory for
23ef29
vmcore-dmesg.txt is same as vmcore. Note that kernel log buffers will
23ef29
not be available if dump target is raw device.
23ef29
23ef29
Dump Triggering methods:
23ef29
23ef29
This section talks about the various ways, other than a Kernel Panic, in which
23ef29
Kdump can be triggered. The following methods assume that Kdump is configured
23ef29
on your system, with the scripts enabled as described in the section above.
23ef29
23ef29
1) AltSysRq C
23ef29
23ef29
Kdump can be triggered with the combination of the 'Alt','SysRq' and 'C'
23ef29
keyboard keys. Please refer to the following link for more details:
23ef29
23ef29
http://kbase.redhat.com/faq/FAQ_43_5559.shtm
23ef29
23ef29
In addition, on PowerPC boxes, Kdump can also be triggered via Hardware
23ef29
Management Console(HMC) using 'Ctrl', 'O' and 'C' keyboard keys.
23ef29
23ef29
2) NMI_WATCHDOG
23ef29
23ef29
In case a machine has a hard hang, it is quite possible that it does not
23ef29
respond to keyboard interrupts. As a result 'Alt-SysRq' keys will not help
23ef29
trigger a dump. In such scenarios Nmi Watchdog feature can prove to be useful.
23ef29
The following link has more details on configuring Nmi watchdog option.
23ef29
23ef29
http://kbase.redhat.com/faq/FAQ_85_9129.shtm
23ef29
23ef29
Once this feature has been enabled in the kernel, any lockups will result in an
23ef29
OOPs message to be generated, followed by Kdump being triggered.
23ef29
23ef29
3) Kernel OOPs
23ef29
23ef29
If we want to generate a dump everytime the Kernel OOPses, we can achieve this
23ef29
by setting the 'Panic On OOPs' option as follows:
23ef29
23ef29
    # echo 1 > /proc/sys/kernel/panic_on_oops
23ef29
23ef29
This is enabled by default on RHEL5.
23ef29
23ef29
4) NMI(Non maskable interrupt) button
23ef29
23ef29
In cases where the system is in a hung state, and is not accepting keyboard
23ef29
interrupts, using NMI button for triggering Kdump can be very useful. NMI
23ef29
button is present on most of the newer x86 and x86_64 machines. Please refer
23ef29
to the User guides/manuals to locate the button, though in most occasions it
23ef29
is not very well documented. In most cases it is hidden behind a small hole
23ef29
on the front or back panel of the machine. You could use a toothpick or some
23ef29
other non-conducting probe to press the button.
23ef29
23ef29
For example, on the IBM X series 366 machine, the NMI button is located behind
23ef29
a small hole on the bottom center of the rear panel.
23ef29
23ef29
To enable this method of dump triggering using NMI button, you will need to set
23ef29
the 'unknown_nmi_panic' option as follows:
23ef29
23ef29
   # echo 1 > /proc/sys/kernel/unknown_nmi_panic
23ef29
23ef29
5) PowerPC specific methods:
23ef29
23ef29
On IBM PowerPC machines, issuing a soft reset invokes the XMON debugger(if
23ef29
XMON is configured). To configure XMON one needs to compile the kernel with
23ef29
the CONFIG_XMON and CONFIG_XMON_DEFAULT options, or by compiling with
23ef29
CONFIG_XMON and booting the kernel with xmon=on option.
23ef29
23ef29
Following are the ways to remotely issue a soft reset on PowerPC boxes, which
23ef29
would drop you to XMON. Pressing a 'X' (capital alphabet X) followed by an
23ef29
'Enter' here will trigger the dump.
23ef29
23ef29
5.1) HMC
23ef29
23ef29
Hardware Management Console(HMC) available on Power4 and Power5 machines allow
23ef29
partitions to be reset remotely. This is specially useful in hang situations
23ef29
where the system is not accepting any keyboard inputs.
23ef29
23ef29
Once you have HMC configured, the following steps will enable you to trigger
23ef29
Kdump via a soft reset:
23ef29
23ef29
On Power4
23ef29
  Using GUI
23ef29
23ef29
    * In the right pane, right click on the partition you wish to dump.
23ef29
    * Select "Operating System->Reset".
23ef29
    * Select "Soft Reset".
23ef29
    * Select "Yes".
23ef29
23ef29
  Using HMC Commandline
23ef29
23ef29
    # reset_partition -m <machine> -p <partition> -t soft
23ef29
23ef29
On Power5
23ef29
  Using GUI
23ef29
23ef29
    * In the right pane, right click on the partition you wish to dump.
23ef29
    * Select "Restart Partition".
23ef29
    * Select "Dump".
23ef29
    * Select "OK".
23ef29
23ef29
  Using HMC Commandline
23ef29
23ef29
    # chsysstate -m <managed system name> -n <lpar name> -o dumprestart -r lpar
23ef29
23ef29
5.2) Blade Management Console for Blade Center
23ef29
23ef29
To initiate a dump operation, go to Power/Restart option under "Blade Tasks" in
23ef29
the Blade Management Console. Select the corresponding blade for which you want
23ef29
to initate the dump and then click "Restart blade with NMI". This issues a
23ef29
system reset and invokes xmon debugger.
23ef29
23ef29
23ef29
Advanced Setups:
23ef29
23ef29
In addition to being able to capture a vmcore to your system's local file
23ef29
system, kdump can be configured to capture a vmcore to a number of other
23ef29
locations, including a raw disk partition, a dedicated file system, an NFS
23ef29
mounted file system, or a remote system via ssh/scp. Additional options
23ef29
exist for specifying the relative path under which the dump is captured,
23ef29
what to do if the capture fails, and for compressing and filtering the dump
23ef29
(so as to produce smaller, more manageable, vmcore files).
23ef29
23ef29
In theory, dumping to a location other than the local file system should be
23ef29
safer than kdump's default setup, as its possible the default setup will try
23ef29
dumping to a file system that has become corrupted. The raw disk partition and
23ef29
dedicated file system options allow you to still dump to the local system,
23ef29
but without having to remount your possibly corrupted file system(s),
23ef29
thereby decreasing the chance a vmcore won't be captured. Dumping to an
23ef29
NFS server or remote system via ssh/scp also has this advantage, as well
23ef29
as allowing for the centralization of vmcore files, should you have several
23ef29
systems from which you'd like to obtain vmcore files. Of course, note that
23ef29
these configurations could present problems if your network is unreliable.
23ef29
23ef29
Advanced setups are configured via modifications to /etc/kdump.conf,
23ef29
which out of the box, is fairly well documented itself. Any alterations to
23ef29
/etc/kdump.conf should be followed by a restart of the kdump service, so
23ef29
the changes can be incorporated in the kdump initrd. Restarting the kdump
23ef29
service is as simple as '/sbin/systemctl restart kdump.service'.
23ef29
23ef29
23ef29
Note that kdump.conf is used as a configuration mechanism for capturing dump
23ef29
files from the initramfs (in the interests of safety), the root file system is
23ef29
mounted, and the init process is started, only as a last resort if the
23ef29
initramfs fails to capture the vmcore.  As such, configuration made in
23ef29
/etc/kdump.conf is only applicable to capture recorded in the initramfs.  If
23ef29
for any reason the init process is started on the root file system, only a
23ef29
simple copying of the vmcore from /proc/vmcore to /var/crash/$DATE/vmcore will
23ef29
be preformed.
23ef29
23ef29
For both local filesystem and nfs dump the dump target must be mounted before
23ef29
building kdump initramfs. That means one needs to put an entry for the dump
23ef29
file system in /etc/fstab so that after reboot when kdump service starts,
23ef29
it can find the dump target and build initramfs instead of failing.
23ef29
Usually the dump target should be used only for kdump. If you worry about
23ef29
someone uses the filesystem for something else other than dumping vmcore
23ef29
you can mount it as read-only. Mkdumprd will still remount it as read-write
23ef29
for creating dump directory and will move it back to read-only afterwards.
23ef29
23ef29
Raw partition
23ef29
23ef29
Raw partition dumping requires that a disk partition in the system, at least
23ef29
as large as the amount of memory in the system, be left unformatted. Assuming
23ef29
/dev/vg/lv_kdump is left unformatted, kdump.conf can be configured with
23ef29
'raw /dev/vg/lv_kdump', and the vmcore file will be copied via dd directly
23ef29
onto partition /dev/vg/lv_kdump. Restart the kdump service via
23ef29
'/sbin/systemctl restart kdump.service' to commit this change to your kdump
23ef29
initrd. Dump target should be persistent device name, such as lvm or device
23ef29
mapper canonical name.
23ef29
23ef29
Dedicated file system
23ef29
23ef29
Similar to raw partition dumping, you can format a partition with the file
23ef29
system of your choice, Again, it should be at least as large as the amount
23ef29
of memory in the system. Assuming it should be at least as large as the
23ef29
amount of memory in the system. Assuming /dev/vg/lv_kdump has been
23ef29
formatted ext4, specify 'ext4 /dev/vg/lv_kdump' in kdump.conf, and a
23ef29
vmcore file will be copied onto the file system after it has been mounted.
23ef29
Dumping to a dedicated partition has the advantage that you can dump multiple
23ef29
vmcores to the file system, space permitting, without overwriting previous ones,
23ef29
as would be the case in a raw partition setup. Restart the kdump service via
23ef29
'/sbin/systemctl restart kdump.service' to commit this change to
23ef29
your kdump initrd.  Note that for local file systems ext4 and ext2 are
23ef29
supported as dumpable targets.  Kdump will not prevent you from specifying
23ef29
other filesystems, and they will most likely work, but their operation
23ef29
cannot be guaranteed.  for instance specifying a vfat filesystem or msdos
23ef29
filesystem will result in a successful load of the kdump service, but during
23ef29
crash recovery, the dump will fail if the system has more than 2GB of memory
23ef29
(since vfat and msdos filesystems do not support more than 2GB files).
23ef29
Be careful of your filesystem selection when using this target.
23ef29
23ef29
It is recommended to use persistent device names or UUID/LABEL for file system
23ef29
dumps. One example of persistent device is /dev/vg/<devname>.
23ef29
23ef29
NFS mount
23ef29
23ef29
Dumping over NFS requires an NFS server configured to export a file system
23ef29
with full read/write access for the root user. All operations done within
23ef29
the kdump initial ramdisk are done as root, and to write out a vmcore file,
23ef29
we obviously must be able to write to the NFS mount. Configuring an NFS
23ef29
server is outside the scope of this document, but either the no_root_squash
23ef29
or anonuid options on the NFS server side are likely of interest to permit
23ef29
the kdump initrd operations write to the NFS mount as root.
23ef29
23ef29
Assuming your're exporting /dump on the machine nfs-server.example.com,
23ef29
once the mount is properly configured, specify it in kdump.conf, via
23ef29
'nfs nfs-server.example.com:/dump'. The server portion can be specified either
23ef29
by host name or IP address. Following a system crash, the kdump initrd will
23ef29
mount the NFS mount and copy out the vmcore to your NFS server. Restart the
23ef29
kdump service via '/sbin/systemctl restart kdump.service' to commit this change
23ef29
to your kdump initrd.
23ef29
23ef29
Special mount via "dracut_args"
23ef29
23ef29
You can utilize "dracut_args" to pass "--mount" to kdump, see dracut manpage
23ef29
about the format of "--mount" for details. If there is any "--mount" specified
23ef29
via "dracut_args", kdump will build it as the mount target without doing any
23ef29
validation (mounting or checking like mount options, fs size, save path, etc),
23ef29
so you must test it to ensure all the correctness. You cannot use other targets
23ef29
in /etc/kdump.conf if you use "--mount" in "dracut_args". You also cannot specify
23ef29
mutliple "--mount" targets via "dracut_args".
23ef29
23ef29
One use case of "--mount" in "dracut_args" is you do not want to mount dump target
23ef29
before kdump service startup, for example, to reduce the burden of the shared nfs
23ef29
server. Such as the example below:
23ef29
dracut_args --mount "192.168.1.1:/share /mnt/test nfs4 defaults"
23ef29
23ef29
NOTE:
23ef29
- <mountpoint> must be specified as an absolute path.
23ef29
23ef29
Remote system via ssh/scp
23ef29
23ef29
Dumping over ssh/scp requires setting up passwordless ssh keys for every
23ef29
machine you wish to have dump via this method. First up, configure kdump.conf
23ef29
for ssh/scp dumping, adding a config line of 'ssh user@server', where 'user'
23ef29
can be any user on the target system you choose, and 'server' is the host
23ef29
name or IP address of the target system. Using a dedicated, restricted user
23ef29
account on the target system is recommended, as there will be keyless ssh
23ef29
access to this account.
23ef29
23ef29
Once kdump.conf is appropriately configured, issue the command
23ef29
'kdumpctl propagate' to automatically set up the ssh host keys and transmit
23ef29
the necessary bits to the target server. You'll have to type in 'yes'
23ef29
to accept the host key for your targer server if this is the first time
23ef29
you've connected to it, and then input the target system user's password
23ef29
to send over the necessary ssh key file. Restart the kdump service via
23ef29
'/sbin/systemctl restart kdump.service' to commit this change to your kdump initrd.
23ef29
23ef29
Path
23ef29
====
23ef29
"path" represents the file system path in which vmcore will be saved. In
23ef29
fact kdump creates a directory $hostip-$date with-in "path" and saves
23ef29
vmcore there. So practically dump is saved in $path/$hostip-$date/. To
23ef29
simplify discussion further, if we say dump will be saved in $path, it
23ef29
is implied that kdump will create another directory inside path and
23ef29
save vmcore there.
23ef29
23ef29
If a dump target is specified in kdump.conf, then "path" is relative to the
23ef29
specified dump target. For example, if dump target is "ext4 /dev/sda", then
23ef29
dump will be saved in "$path" directory on /dev/sda.
23ef29
23ef29
Same is the case for nfs dump. If user specified "nfs foo.com:/export/tmp/"
23ef29
as dump target, then dump will effectively be saved in
23ef29
"foo.com:/export/tmp/var/crash/" directory.
23ef29
23ef29
Interpretation of path changes a bit if user has not specified a dump
23ef29
target explicitly in kdump.conf. In this case, "path" represents the
23ef29
absolute path from root. And dump target and adjusted path are arrived
23ef29
at automatically depending on what's mounted in the current system.
23ef29
23ef29
Following are few examples.
23ef29
23ef29
path /var/crash/
23ef29
----------------
23ef29
Assuming there is no disk mounted on /var/ or on /var/crash, dump will
23ef29
be saved on disk backing rootfs in directory /var/crash.
23ef29
23ef29
path /var/crash/ (A separate disk mounted on /var)
23ef29
--------------------------------------------------
23ef29
Say a disk /dev/sdb is mouted on /var. In this case dump target will
23ef29
become /dev/sdb and path will become "/crash" and dump will be saved
23ef29
on "sdb:/crash/" directory.
23ef29
23ef29
path /var/crash/ (NFS mounted on /var)
23ef29
-------------------------------------
23ef29
Say foo.com:/export/tmp is mounted on /var. In this case dump target is
23ef29
nfs server and path will be adjusted to "/crash" and dump will be saved to
23ef29
foo.com:/export/tmp/crash/ directory.
23ef29
23ef29
Kdump boot directory
23ef29
====================
23ef29
Usually kdump kernel is the same as 1st kernel. So kdump will try to find
23ef29
kdump kernel under /boot according to /proc/cmdline. E.g we execute below
23ef29
command and get an output:
23ef29
	cat /proc/cmdline
23ef29
	BOOT_IMAGE=/xxx/vmlinuz-3.yyy.zzz  root=xxxx .....
23ef29
Then kdump kernel will be /boot/xxx/vmlinuz-3.yyy.zzz.
23ef29
However a variable KDUMP_BOOTDIR in /etc/sysconfig/kdump is provided to
23ef29
user if kdump kernel is put in a different directory.
23ef29
23ef29
Kdump Post-Capture Executable
23ef29
23ef29
It is possible to specify a custom script or binary you wish to run following
23ef29
an attempt to capture a vmcore. The executable is passed an exit code from
23ef29
the capture process, which can be used to trigger different actions from
23ef29
within your post-capture executable.
23ef29
23ef29
Kdump Pre-Capture Executable
23ef29
23ef29
It is possible to specify a custom script or binary you wish to run before
23ef29
capturing a vmcore. Exit status of this binary is interpreted:
23ef29
0 - continue with dump process as usual
23ef29
non 0 - reboot the system
23ef29
23ef29
Extra Binaries
23ef29
23ef29
If you have specific binaries or scripts you want to have made available
23ef29
within your kdump initrd, you can specify them by their full path, and they
23ef29
will be included in your kdump initrd, along with all dependent libraries.
23ef29
This may be particularly useful for those running post-capture scripts that
23ef29
rely on other binaries.
23ef29
23ef29
Extra Modules
23ef29
23ef29
By default, only the bare minimum of kernel modules will be included in your
23ef29
kdump initrd. Should you wish to capture your vmcore files to a non-boot-path
23ef29
storage device, such as an iscsi target disk or clustered file system, you may
23ef29
need to manually specify additional kernel modules to load into your kdump
23ef29
initrd.
23ef29
23ef29
Default action
23ef29
==============
23ef29
Default action specifies what to do when dump to configured dump target
23ef29
fails. By default, default action is "reboot" and that is system reboots
23ef29
if attempt to save dump to dump target fails.
23ef29
23ef29
There are other default actions available though.
23ef29
23ef29
- dump_to_rootfs
23ef29
	This option tries to mount root and save dump on root filesystem
23ef29
	in a path specified by "path". This option will generally make
23ef29
	sense when dump target is not root filesystem. For example, if
23ef29
	dump is being saved over network using "ssh" then one can specify
23ef29
	default to "dump_to_rootfs" to try saving dump to root filesystem
23ef29
	if dump over network fails.
23ef29
23ef29
- shell
23ef29
	Drop into a shell session inside initramfs.
23ef29
- halt
23ef29
	Halt system after failure
23ef29
- poweroff
23ef29
	Poweroff system after failure.
23ef29
23ef29
Compression and filtering
23ef29
23ef29
The 'core_collector' parameter in kdump.conf allows you to specify a custom
23ef29
dump capture method. The most common alternate method is makedumpfile, which
23ef29
is a dump filtering and compression utility provided with kexec-tools. On
23ef29
some architectures, it can drastically reduce the size of your vmcore files,
23ef29
which becomes very useful on systems with large amounts of memory.
23ef29
23ef29
A typical setup is 'core_collector makedumpfile -F -l --message-level 1 -d 31',
23ef29
but check the output of '/sbin/makedumpfile --help' for a list of all available
23ef29
options (-i and -g don't need to be specified, they're automatically taken care
23ef29
of). Note that use of makedumpfile requires that the kernel-debuginfo package
23ef29
corresponding with your running kernel be installed.
23ef29
23ef29
Core collector command format depends on dump target type. Typically for
23ef29
filesystem (local/remote), core_collector should accept two arguments.
23ef29
First one is source file and second one is target file. For ex.
23ef29
23ef29
ex1.
23ef29
---
23ef29
core_collector "cp --sparse=always"
23ef29
23ef29
Above will effectively be translated to:
23ef29
23ef29
cp --sparse=always /proc/vmcore <dest-path>/vmcore
23ef29
23ef29
ex2.
23ef29
---
23ef29
core_collector "makedumpfile -l --message-level 1 -d 31"
23ef29
23ef29
Above will effectively be translated to:
23ef29
23ef29
makedumpfile -l --message-level 1 -d 31 /proc/vmcore <dest-path>/vmcore
23ef29
23ef29
23ef29
For dump targets like raw and ssh, in general, core collector should expect
23ef29
one argument (source file) and should output the processed core on standard
23ef29
output (There is one exception of "scp", discussed later). This standard
23ef29
output will be saved to destination using appropriate commands.
23ef29
23ef29
raw dumps core_collector examples:
23ef29
---------
23ef29
ex3.
23ef29
---
23ef29
core_collector "cat"
23ef29
23ef29
Above will effectively be translated to.
23ef29
23ef29
cat /proc/vmcore | dd of=<target-device>
23ef29
23ef29
ex4.
23ef29
---
23ef29
core_collector "makedumpfile -F -l --message-level 1 -d 31"
23ef29
23ef29
Above will effectively be translated to.
23ef29
23ef29
makedumpfile -F -l --message-level 1 -d 31 | dd of=<target-device>
23ef29
23ef29
ssh dumps core_collector examples:
23ef29
---------
23ef29
ex5.
23ef29
---
23ef29
core_collector "cat"
23ef29
23ef29
Above will effectively be translated to.
23ef29
23ef29
cat /proc/vmcore | ssh <options> <remote-location> "dd of=path/vmcore"
23ef29
23ef29
ex6.
23ef29
---
23ef29
core_collector "makedumpfile -F -l --message-level 1 -d 31"
23ef29
23ef29
Above will effectively be translated to.
23ef29
23ef29
makedumpfile -F -l --message-level 1 -d 31 | ssh <options> <remote-location> "dd of=path/vmcore"
23ef29
23ef29
There is one exception to standard output rule for ssh dumps. And that is
23ef29
scp. As scp can handle ssh destinations for file transfers, one can
23ef29
specify "scp" as core collector for ssh targets (no output on stdout).
23ef29
23ef29
ex7.
23ef29
----
23ef29
core_collector "scp"
23ef29
23ef29
Above will effectively be translated to.
23ef29
23ef29
scp /proc/vmcore <user@host>:path/vmcore
23ef29
23ef29
About default core collector
23ef29
----------------------------
23ef29
Default core_collector for ssh/raw dump is:
23ef29
"makedumpfile -F -l --message-level 1 -d 31".
23ef29
Default core_collector for other targets is:
23ef29
"makedumpfile -l --message-level 1 -d 31".
23ef29
23ef29
Even if core_collector option is commented out in kdump.conf, makedumpfile
23ef29
is default core collector and kdump uses it internally.
23ef29
23ef29
If one does not want makedumpfile as default core_collector, then they
23ef29
need to specify one using core_collector option to change the behavior.
23ef29
23ef29
Note: If "makedumpfile -F" is used then you will get a flattened format
23ef29
vmcore.flat, you will need to use "makedumpfile -R" to rearrange the
23ef29
dump data from stdard input to a normal dumpfile (readable with analysis
23ef29
tools).
23ef29
For example: "makedumpfile -R vmcore < vmcore.flat"
23ef29
23ef29
Caveats:
23ef29
23ef29
Console frame-buffers and X are not properly supported. If you typically run
23ef29
with something along the lines of "vga=791" in your kernel config line or
23ef29
have X running, console video will be garbled when a kernel is booted via
23ef29
kexec. Note that the kdump kernel should still be able to create a dump,
23ef29
and when the system reboots, video should be restored to normal.
23ef29
23ef29
23ef29
Notes on resetting video:
23ef29
23ef29
Video is a notoriously difficult issue with kexec.  Video cards contain ROM code
23ef29
that controls their initial configuration and setup.  This code is nominally
23ef29
accessed and executed from the Bios, and otherwise not safely executable. Since
23ef29
the purpose of kexec is to reboot the system without re-executing the Bios, it
23ef29
is rather difficult if not impossible to reset video cards with kexec.  The
23ef29
result is, that if a system crashes while running in a graphical mode (i.e.
23ef29
running X), the screen may appear to become 'frozen' while the dump capture is
23ef29
taking place.  A serial console will of course reveal that the system is
23ef29
operating and capturing a vmcore image, but a casual observer will see the
23ef29
system as hung until the dump completes and a true reboot is executed.
23ef29
23ef29
There are two possiblilties to work around this issue.  One is by adding
23ef29
--reset-vga to the kexec command line options in /etc/sysconfig/kdump.  This
23ef29
tells kdump to write some reasonable default values to the video card register
23ef29
file, in the hopes of returning it to a text mode such that boot messages are
23ef29
visible on the screen.  It does not work with all video cards however.
23ef29
Secondly, it may be worth trying to add vga15fb.ko to the extra_modules list in
23ef29
/etc/kdump.conf.  This will attempt to use the video card in framebuffer mode,
23ef29
which can blank the screen prior to the start of a dump capture.
23ef29
23ef29
Notes on rootfs mount:
23ef29
Dracut is designed to mount rootfs by default. If rootfs mounting fails it
23ef29
will refuse to go on. So kdump leaves rootfs mounting to dracut currently.
23ef29
We make the assumtion that proper root= cmdline is being passed to dracut
23ef29
initramfs for the time being. If you need modify "KDUMP_COMMANDLINE=" in
23ef29
/etc/sysconfig/kdump, you will need to make sure that appropriate root=
23ef29
options are copied from /proc/cmdline. In general it is best to append
23ef29
command line options using "KDUMP_COMMANDLINE_APPEND=" instead of replacing
23ef29
the original command line completely.
23ef29
23ef29
Notes on watchdog module handling:
23ef29
23ef29
If a watchdog is active in first kernel then, we must have it's module
23ef29
loaded in crash kernel, so that either watchdog is deactivated or started
23ef29
being kicked in second kernel. Otherwise, we might face watchdog reboot
23ef29
when vmcore is being saved. When dracut watchdog module is enabled, it
23ef29
installs kernel watchdog module of active watchdog device in initrd.
23ef29
kexec-tools always add "-a watchdog" to the dracut_args if there exists at
23ef29
least one active watchdog and user has not added specifically "-o watchdog"
23ef29
in dracut_args of kdump.conf. If a watchdog module (such as hp_wdt) has
23ef29
not been written in watchdog-core framework then this option will not have
23ef29
any effect and module will not be added. Please note that only systemd
23ef29
watchdog daemon is supported as watchdog kick application.
23ef29
23ef29
Parallel Dumping Operation
23ef29
==========================
23ef29
Kexec allows kdump using multiple cpus. So parallel feature can accelerate
23ef29
dumping substantially, especially in executing compression and filter.
23ef29
For example:
23ef29
23ef29
	1."makedumpfile -c --num-threads [THREAD_NUM] /proc/vmcore dumpfile"
23ef29
	2."makedumpfile -c /proc/vmcore dumpfile",
23ef29
23ef29
	1 has better performance than 2, if THREAD_NUM is larger than two
23ef29
	and the usable cpus number is larger than THREAD_NUM.
23ef29
23ef29
Notes on how to use multiple cpus on a capture kernel on x86 system:
23ef29
23ef29
Make sure that you are using a kernel that supports disable_cpu_apicid
23ef29
kernel option as a capture kernel, which is needed to avoid x86 specific
23ef29
hardware issue (*). The disable_cpu_apicid kernel option is automatically
23ef29
appended by kdumpctl script and is ignored if the kernel doesn't support it.
23ef29
23ef29
You need to specify how many cpus to be used in a capture kernel by specifying
23ef29
the number of cpus in nr_cpus kernel option in /etc/sysconfig/kdump. nr_cpus
23ef29
is 1 at default.
23ef29
23ef29
You should use necessary and sufficient number of cpus on a capture kernel.
23ef29
Warning: Don't use too many cpus on a capture kernel, or the capture kernel
23ef29
may lead to panic due to Out Of Memory.
23ef29
23ef29
(*) Without disable_cpu_apicid kernel option, capture kernel may lead to
23ef29
hang, system reset or power-off at boot, depending on your system and runtime
23ef29
situation at the time of crash.
23ef29
23ef29
Debugging Tips
23ef29
--------------
23ef29
- One can drop into a shell before/after saving vmcore with the help of
23ef29
  using kdump_pre/kdump_post hooks. Use following in one of the pre/post
23ef29
  scripts to drop into a shell.
23ef29
23ef29
  #!/bin/bash
23ef29
  _ctty=/dev/ttyS0
23ef29
  setsid /bin/sh -i -l 0<>$_ctty 1<>$_ctty 2<>$_ctty
23ef29
23ef29
  One might have to change the terminal depending on what they are using.
23ef29
23ef29
- Serial console logging for virtual machines
23ef29
23ef29
  I generally use "virsh console <domain-name>" to get to serial console.
23ef29
  I noticed after dump saving system reboots and when grub menu shows up
23ef29
  some of the previously logged messages are no more there. That means
23ef29
  any important debugging info at the end will be lost.
23ef29
23ef29
  One can log serial console as follows to make sure messages are not lost.
23ef29
23ef29
  virsh ttyconsole <domain-name>
23ef29
  ln -s <name-of-tty> /dev/modem
23ef29
  minicom -C /tmp/console-logs
23ef29
23ef29
  Now minicom should be logging serial console in file console-logs.
23ef29
23ef29