Tree - rpms/kexec-tools - CentOS Git server

rpms / kexec-tools

Blame SOURCES/kexec-kdump-howto.txt

Blob History Raw

		b9e861	`Kexec/Kdump HOWTO`
		b9e861
		b9e861	`Introduction`
		b9e861
		b9e861	`Kexec and kdump are new features in the 2.6 mainstream kernel. These features`
		b9e861	`are included in Red Hat Enterprise Linux 5. The purpose of these features`
		b9e861	`is to ensure faster boot up and creation of reliable kernel vmcores for`
		b9e861	`diagnostic purposes.`
		b9e861
		b9e861	`Overview`
		b9e861
		b9e861	`Kexec`
		b9e861
		b9e861	`Kexec is a fastboot mechanism which allows booting a Linux kernel from the`
		b9e861	`context of already running kernel without going through BIOS. BIOS can be very`
		b9e861	`time consuming especially on the big servers with lots of peripherals. This can`
		b9e861	`save a lot of time for developers who end up booting a machine numerous times.`
		b9e861
		b9e861	`Kdump`
		b9e861
		b9e861	`Kdump is a new kernel crash dumping mechanism and is very reliable because`
		b9e861	`the crash dump is captured from the context of a freshly booted kernel and`
		b9e861	`not from the context of the crashed kernel. Kdump uses kexec to boot into`
		b9e861	`a second kernel whenever system crashes. This second kernel, often called`
		b9e861	`a capture kernel, boots with very little memory and captures the dump image.`
		b9e861
		b9e861	`The first kernel reserves a section of memory that the second kernel uses`
		b9e861	`to boot. Kexec enables booting the capture kernel without going through BIOS`
		b9e861	`hence contents of first kernel's memory are preserved, which is essentially`
		b9e861	`the kernel crash dump.`
		b9e861
		b9e861	`Kdump is supported on the i686, x86_64, ia64 and ppc64 platforms. The`
		b9e861	`standard kernel and capture kernel are one in the same on i686, x86_64,`
		b9e861	`ia64 and ppc64.`
		b9e861
		b9e861	`If you're reading this document, you should already have kexec-tools`
		b9e861	`installed. If not, you install it via the following command:`
		b9e861
		b9e861	`# yum install kexec-tools`
		b9e861
		b9e861	`Now load a kernel with kexec:`
		b9e861
		b9e861	# kver=`uname -r` # kexec -l /boot/vmlinuz-$kver
		b9e861	`--initrd=/boot/initrd-$kver.img \`
		b9e861	--command-line="`cat /proc/cmdline`"
		b9e861
		b9e861	`NOTE: The above will boot you back into the kernel you're currently running,`
		b9e861	if you want to load a different kernel, substitute it in place of `uname -r`.
		b9e861
		b9e861	`Now reboot your system, taking note that it should bypass the BIOS:`
		b9e861
		b9e861	`# reboot`
		b9e861
		b9e861
		b9e861	`How to configure kdump:`
		b9e861
		b9e861	`Again, we assume if you're reading this document, you should already have`
		b9e861	`kexec-tools installed. If not, you install it via the following command:`
		b9e861
		b9e861	`# yum install kexec-tools`
		b9e861
		b9e861	`To be able to do much of anything interesting in the way of debug analysis,`
		b9e861	`you'll also need to install the kernel-debuginfo package, of the same arch`
		b9e861	`as your running kernel, and the crash utility:`
		b9e861
		b9e861	`# yum --enablerepo=\*debuginfo install kernel-debuginfo.$(uname -m) crash`
		b9e861
		b9e861	`Next up, we need to modify some boot parameters to reserve a chunk of memory for`
		b9e861	`the capture kernel. With the help of grubby, it's very easy to append`
		b9e861	`"crashkernel=128M" to the end of your kernel boot parameters. Note that the X`
		b9e861	`values are such that X = the amount of memory to reserve for the capture kernel.`
		b9e861	`And based on arch and system configuration, one might require more than 128M to`
		b9e861	`be reserved for kdump. One need to experiment and test kdump, if 128M is not`
		b9e861	`sufficient, try reserving more memory.`
		b9e861
		b9e861	# grubby --args="crashkernel=128M" --update-kernel=/boot/vmlinuz-`uname -r`
		b9e861
		b9e861	`Note that there is an alternative form in which to specify a crashkernel`
		b9e861	`memory reservation, in the event that more control is needed over the size and`
		b9e861	`placement of the reserved memory. The format is:`
		b9e861
		b9e861	`crashkernel=range1:size1[,range2:size2,...][@offset]`
		b9e861
		b9e861	`Where range<n> specifies a range of values that are matched against the amount`
		b9e861	`of physical RAM present in the system, and the corresponding size<n> value`
		b9e861	`specifies the amount of kexec memory to reserve. For example:`
		b9e861
		b9e861	`crashkernel=512M-2G:64M,2G-:128M`
		b9e861
		b9e861	`This line tells kexec to reserve 64M of ram if the system contains between`
		b9e861	`512M and 2G of physical memory. If the system contains 2G or more of physical`
		b9e861	`memory, 128M should be reserved.`
		b9e861
		b9e861	`Besides, since kdump needs to access /proc/kallsyms during a kernel`
		b9e861	`loading if KASLR is enabled, check /proc/sys/kernel/kptr_restrict to`
		b9e861	`make sure that the content of /proc/kallsyms is exposed correctly.`
		b9e861	`We recommend to set the value of kptr_restrict to '1'. Otherwise`
		b9e861	`capture kernel loading could fail.`
		b9e861
		b9e861	`After making said changes, reboot your system, so that the X MB of memory is`
		b9e861	`left untouched by the normal system, reserved for the capture kernel. Take note`
		b9e861	`that the output of 'free -m' will show X MB less memory than without this`
		b9e861	`parameter, which is expected. You may be able to get by with less than 128M, but`
		b9e861	`testing with only 64M has proven unreliable of late. On ia64, as much as 512M`
		b9e861	`may be required.`
		b9e861
		b9e861	`Now that you've got that reserved memory region set up, you want to turn on`
		b9e861	`the kdump init script:`
		b9e861
		b9e861	`# chkconfig kdump on`
		b9e861
		b9e861	`Then, start up kdump as well:`
		b9e861
		b9e861	`# systemctl start kdump.service`
		b9e861
		b9e861	`This should load your kernel-kdump image via kexec, leaving the system ready`
		b9e861	`to capture a vmcore upon crashing. To test this out, you can force-crash`
		b9e861	`your system by echo'ing a c into /proc/sysrq-trigger:`
		b9e861
		b9e861	`# echo c > /proc/sysrq-trigger`
		b9e861
		b9e861	`You should see some panic output, followed by the system restarting into`
		b9e861	`the kdump kernel. When the boot process gets to the point where it starts`
		b9e861	`the kdump service, your vmcore should be copied out to disk (by default,`
		b9e861	`in /var/crash/<YYYY-MM-DD-HH:MM>/vmcore), then the system rebooted back into`
		b9e861	`your normal kernel.`
		b9e861
		b9e861	`Once back to your normal kernel, you can use the previously installed crash`
		b9e861	`kernel in conjunction with the previously installed kernel-debuginfo to`
		b9e861	`perform postmortem analysis:`
		b9e861
		b9e861	`# crash /usr/lib/debug/lib/modules/2.6.17-1.2621.el5/vmlinux`
		b9e861	`/var/crash/2006-08-23-15:34/vmcore`
		b9e861
		b9e861	`crash> bt`
		b9e861
		b9e861	`and so on...`
		b9e861
		b9e861	`Notes:`
		b9e861
		b9e861	`When kdump starts, the kdump kernel is loaded together with the kdump`
		b9e861	`initramfs. To save memory usage and disk space, the kdump initramfs is`
		b9e861	`generated strictly against the system it will run on, and contains the`
		b9e861	`minimum set of kernel modules and utilities to boot the machine to a stage`
		b9e861	`where the dump target could be mounted.`
		b9e861
		b9e861	`With kdump service enabled, kdumpctl will try to detect possible system`
		b9e861	`change and rebuild the kdump initramfs if needed. But it can not guarantee`
		b9e861	`to cover every possible case. So after a hardware change, disk migration,`
		b9e861	`storage setup update or any similar system level changes, it's highly`
		b9e861	`recommended to rebuild the initramfs manually with following command:`
		b9e861
		b9e861	`# kdumpctl rebuild`
		b9e861
		b9e861	`Saving vmcore-dmesg.txt`
		b9e861	`----------------------`
		b9e861	`Kernel log bufferes are one of the most important information available`
		b9e861	`in vmcore. Now before saving vmcore, kernel log bufferes are extracted`
		b9e861	`from /proc/vmcore and saved into a file vmcore-dmesg.txt. After`
		b9e861	`vmcore-dmesg.txt, vmcore is saved. Destination disk and directory for`
		b9e861	`vmcore-dmesg.txt is same as vmcore. Note that kernel log buffers will`
		b9e861	`not be available if dump target is raw device.`
		b9e861
		b9e861	`Dump Triggering methods:`
		b9e861
		b9e861	`This section talks about the various ways, other than a Kernel Panic, in which`
		b9e861	`Kdump can be triggered. The following methods assume that Kdump is configured`
		b9e861	`on your system, with the scripts enabled as described in the section above.`
		b9e861
		b9e861	`1) AltSysRq C`
		b9e861
		b9e861	`Kdump can be triggered with the combination of the 'Alt','SysRq' and 'C'`
		b9e861	`keyboard keys. Please refer to the following link for more details:`
		b9e861
		b9e861	`https://access.redhat.com/solutions/2023`
		b9e861
		b9e861	`In addition, on PowerPC boxes, Kdump can also be triggered via Hardware`
		b9e861	`Management Console(HMC) using 'Ctrl', 'O' and 'C' keyboard keys.`
		b9e861
		b9e861	`2) NMI_WATCHDOG`
		b9e861
		b9e861	`In case a machine has a hard hang, it is quite possible that it does not`
		b9e861	`respond to keyboard interrupts. As a result 'Alt-SysRq' keys will not help`
		b9e861	`trigger a dump. In such scenarios Nmi Watchdog feature can prove to be useful.`
		b9e861	`The following link has more details on configuring Nmi watchdog option.`
		b9e861
		b9e861	`https://access.redhat.com/solutions/125103`
		b9e861
		b9e861	`Once this feature has been enabled in the kernel, any lockups will result in an`
		b9e861	`OOPs message to be generated, followed by Kdump being triggered.`
		b9e861
		b9e861	`3) Kernel OOPs`
		b9e861
		b9e861	`If we want to generate a dump everytime the Kernel OOPses, we can achieve this`
		b9e861	`by setting the 'Panic On OOPs' option as follows:`
		b9e861
		b9e861	`# echo 1 > /proc/sys/kernel/panic_on_oops`
		b9e861
		b9e861	`This is enabled by default on RHEL5.`
		b9e861
		b9e861	`4) NMI(Non maskable interrupt) button`
		b9e861
		b9e861	`In cases where the system is in a hung state, and is not accepting keyboard`
		b9e861	`interrupts, using NMI button for triggering Kdump can be very useful. NMI`
		b9e861	`button is present on most of the newer x86 and x86_64 machines. Please refer`
		b9e861	`to the User guides/manuals to locate the button, though in most occasions it`
		b9e861	`is not very well documented. In most cases it is hidden behind a small hole`
		b9e861	`on the front or back panel of the machine. You could use a toothpick or some`
		b9e861	`other non-conducting probe to press the button.`
		b9e861
		b9e861	`For example, on the IBM X series 366 machine, the NMI button is located behind`
		b9e861	`a small hole on the bottom center of the rear panel.`
		b9e861
		b9e861	`To enable this method of dump triggering using NMI button, you will need to set`
		b9e861	`the 'unknown_nmi_panic' option as follows:`
		b9e861
		b9e861	`# echo 1 > /proc/sys/kernel/unknown_nmi_panic`
		b9e861
		b9e861	`5) PowerPC specific methods:`
		b9e861
		b9e861	`On IBM PowerPC machines, issuing a soft reset invokes the XMON debugger(if`
		b9e861	`XMON is configured). To configure XMON one needs to compile the kernel with`
		b9e861	`the CONFIG_XMON and CONFIG_XMON_DEFAULT options, or by compiling with`
		b9e861	`CONFIG_XMON and booting the kernel with xmon=on option.`
		b9e861
		b9e861	`Following are the ways to remotely issue a soft reset on PowerPC boxes, which`
		b9e861	`would drop you to XMON. Pressing a 'X' (capital alphabet X) followed by an`
		b9e861	`'Enter' here will trigger the dump.`
		b9e861
		b9e861	`5.1) HMC`
		b9e861
		b9e861	`Hardware Management Console(HMC) available on Power4 and Power5 machines allow`
		b9e861	`partitions to be reset remotely. This is specially useful in hang situations`
		b9e861	`where the system is not accepting any keyboard inputs.`
		b9e861
		b9e861	`Once you have HMC configured, the following steps will enable you to trigger`
		b9e861	`Kdump via a soft reset:`
		b9e861
		b9e861	`On Power4`
		b9e861	`Using GUI`
		b9e861
		b9e861	`* In the right pane, right click on the partition you wish to dump.`
		b9e861	`* Select "Operating System->Reset".`
		b9e861	`* Select "Soft Reset".`
		b9e861	`* Select "Yes".`
		b9e861
		b9e861	`Using HMC Commandline`
		b9e861
		b9e861	`# reset_partition -m <machine> -p <partition> -t soft`
		b9e861
		b9e861	`On Power5`
		b9e861	`Using GUI`
		b9e861
		b9e861	`* In the right pane, right click on the partition you wish to dump.`
		b9e861	`* Select "Restart Partition".`
		b9e861	`* Select "Dump".`
		b9e861	`* Select "OK".`
		b9e861
		b9e861	`Using HMC Commandline`
		b9e861
		b9e861	`# chsysstate -m <managed system name> -n <lpar name> -o dumprestart -r lpar`
		b9e861
		b9e861	`5.2) Blade Management Console for Blade Center`
		b9e861
		b9e861	`To initiate a dump operation, go to Power/Restart option under "Blade Tasks" in`
		b9e861	`the Blade Management Console. Select the corresponding blade for which you want`
		b9e861	`to initate the dump and then click "Restart blade with NMI". This issues a`
		b9e861	`system reset and invokes xmon debugger.`
		b9e861
		b9e861
		b9e861	`Advanced Setups:`
		b9e861
		b9e861	`In addition to being able to capture a vmcore to your system's local file`
		b9e861	`system, kdump can be configured to capture a vmcore to a number of other`
		b9e861	`locations, including a raw disk partition, a dedicated file system, an NFS`
		b9e861	`mounted file system, or a remote system via ssh/scp. Additional options`
		b9e861	`exist for specifying the relative path under which the dump is captured,`
		b9e861	`what to do if the capture fails, and for compressing and filtering the dump`
		b9e861	`(so as to produce smaller, more manageable, vmcore files).`
		b9e861
		b9e861	`In theory, dumping to a location other than the local file system should be`
		b9e861	`safer than kdump's default setup, as its possible the default setup will try`
		b9e861	`dumping to a file system that has become corrupted. The raw disk partition and`
		b9e861	`dedicated file system options allow you to still dump to the local system,`
		b9e861	`but without having to remount your possibly corrupted file system(s),`
		b9e861	`thereby decreasing the chance a vmcore won't be captured. Dumping to an`
		b9e861	`NFS server or remote system via ssh/scp also has this advantage, as well`
		b9e861	`as allowing for the centralization of vmcore files, should you have several`
		b9e861	`systems from which you'd like to obtain vmcore files. Of course, note that`
		b9e861	`these configurations could present problems if your network is unreliable.`
		b9e861
		b9e861	`Advanced setups are configured via modifications to /etc/kdump.conf,`
		b9e861	`which out of the box, is fairly well documented itself. Any alterations to`
		b9e861	`/etc/kdump.conf should be followed by a restart of the kdump service, so`
		b9e861	`the changes can be incorporated in the kdump initrd. Restarting the kdump`
		b9e861	`service is as simple as '/sbin/systemctl restart kdump.service'.`
		b9e861
		b9e861
		b9e861	`Note that kdump.conf is used as a configuration mechanism for capturing dump`
		b9e861	`files from the initramfs (in the interests of safety), the root file system is`
		b9e861	`mounted, and the init process is started, only as a last resort if the`
		b9e861	`initramfs fails to capture the vmcore. As such, configuration made in`
		b9e861	`/etc/kdump.conf is only applicable to capture recorded in the initramfs. If`
		b9e861	`for any reason the init process is started on the root file system, only a`
		b9e861	`simple copying of the vmcore from /proc/vmcore to /var/crash/$DATE/vmcore will`
		b9e861	`be preformed.`
		b9e861
		b9e861	`For both local filesystem and nfs dump the dump target must be mounted before`
		b9e861	`building kdump initramfs. That means one needs to put an entry for the dump`
		b9e861	`file system in /etc/fstab so that after reboot when kdump service starts,`
		b9e861	`it can find the dump target and build initramfs instead of failing.`
		b9e861	`Usually the dump target should be used only for kdump. If you worry about`
		b9e861	`someone uses the filesystem for something else other than dumping vmcore`
		b9e861	`you can mount it as read-only. Mkdumprd will still remount it as read-write`
		b9e861	`for creating dump directory and will move it back to read-only afterwards.`
		b9e861
		b9e861	`Raw partition`
		b9e861
		b9e861	`Raw partition dumping requires that a disk partition in the system, at least`
		b9e861	`as large as the amount of memory in the system, be left unformatted. Assuming`
		b9e861	`/dev/vg/lv_kdump is left unformatted, kdump.conf can be configured with`
		b9e861	`'raw /dev/vg/lv_kdump', and the vmcore file will be copied via dd directly`
		b9e861	`onto partition /dev/vg/lv_kdump. Restart the kdump service via`
		b9e861	`'/sbin/systemctl restart kdump.service' to commit this change to your kdump`
		b9e861	`initrd. Dump target should be persistent device name, such as lvm or device`
		b9e861	`mapper canonical name.`
		b9e861
		b9e861	`Dedicated file system`
		b9e861
		b9e861	`Similar to raw partition dumping, you can format a partition with the file`
		b9e861	`system of your choice, Again, it should be at least as large as the amount`
		b9e861	`of memory in the system. Assuming it should be at least as large as the`
		b9e861	`amount of memory in the system. Assuming /dev/vg/lv_kdump has been`
		b9e861	`formatted ext4, specify 'ext4 /dev/vg/lv_kdump' in kdump.conf, and a`
		b9e861	`vmcore file will be copied onto the file system after it has been mounted.`
		b9e861	`Dumping to a dedicated partition has the advantage that you can dump multiple`
		b9e861	`vmcores to the file system, space permitting, without overwriting previous ones,`
		b9e861	`as would be the case in a raw partition setup. Restart the kdump service via`
		b9e861	`'/sbin/systemctl restart kdump.service' to commit this change to`
		b9e861	`your kdump initrd. Note that for local file systems ext4 and ext2 are`
		b9e861	`supported as dumpable targets. Kdump will not prevent you from specifying`
		b9e861	`other filesystems, and they will most likely work, but their operation`
		b9e861	`cannot be guaranteed. for instance specifying a vfat filesystem or msdos`
		b9e861	`filesystem will result in a successful load of the kdump service, but during`
		b9e861	`crash recovery, the dump will fail if the system has more than 2GB of memory`
		b9e861	`(since vfat and msdos filesystems do not support more than 2GB files).`
		b9e861	`Be careful of your filesystem selection when using this target.`
		b9e861
		b9e861	`It is recommended to use persistent device names or UUID/LABEL for file system`
		b9e861	`dumps. One example of persistent device is /dev/vg/<devname>.`
		b9e861
		b9e861	`NFS mount`
		b9e861
		b9e861	`Dumping over NFS requires an NFS server configured to export a file system`
		b9e861	`with full read/write access for the root user. All operations done within`
		b9e861	`the kdump initial ramdisk are done as root, and to write out a vmcore file,`
		b9e861	`we obviously must be able to write to the NFS mount. Configuring an NFS`
		b9e861	`server is outside the scope of this document, but either the no_root_squash`
		b9e861	`or anonuid options on the NFS server side are likely of interest to permit`
		b9e861	`the kdump initrd operations write to the NFS mount as root.`
		b9e861
		b9e861	`Assuming your're exporting /dump on the machine nfs-server.example.com,`
		b9e861	`once the mount is properly configured, specify it in kdump.conf, via`
		b9e861	`'nfs nfs-server.example.com:/dump'. The server portion can be specified either`
		b9e861	`by host name or IP address. Following a system crash, the kdump initrd will`
		b9e861	`mount the NFS mount and copy out the vmcore to your NFS server. Restart the`
		b9e861	`kdump service via '/sbin/systemctl restart kdump.service' to commit this change`
		b9e861	`to your kdump initrd.`
		b9e861
		b9e861	`Special mount via "dracut_args"`
		b9e861
		b9e861	`You can utilize "dracut_args" to pass "--mount" to kdump, see dracut manpage`
		b9e861	`about the format of "--mount" for details. If there is any "--mount" specified`
		b9e861	`via "dracut_args", kdump will build it as the mount target without doing any`
		b9e861	`validation (mounting or checking like mount options, fs size, save path, etc),`
		b9e861	`so you must test it to ensure all the correctness. You cannot use other targets`
		b9e861	`in /etc/kdump.conf if you use "--mount" in "dracut_args". You also cannot specify`
		b9e861	`mutliple "--mount" targets via "dracut_args".`
		b9e861
		b9e861	`One use case of "--mount" in "dracut_args" is you do not want to mount dump target`
		b9e861	`before kdump service startup, for example, to reduce the burden of the shared nfs`
		b9e861	`server. Such as the example below:`
		b9e861	`dracut_args --mount "192.168.1.1:/share /mnt/test nfs4 defaults"`
		b9e861
		b9e861	`NOTE:`
		b9e861	`- <mountpoint> must be specified as an absolute path.`
		b9e861
		b9e861	`Remote system via ssh/scp`
		b9e861
		b9e861	`Dumping over ssh/scp requires setting up passwordless ssh keys for every`
		b9e861	`machine you wish to have dump via this method. First up, configure kdump.conf`
		b9e861	`for ssh/scp dumping, adding a config line of 'ssh user@server', where 'user'`
		b9e861	`can be any user on the target system you choose, and 'server' is the host`
		b9e861	`name or IP address of the target system. Using a dedicated, restricted user`
		b9e861	`account on the target system is recommended, as there will be keyless ssh`
		b9e861	`access to this account.`
		b9e861
		b9e861	`Once kdump.conf is appropriately configured, issue the command`
		b9e861	`'kdumpctl propagate' to automatically set up the ssh host keys and transmit`
		b9e861	`the necessary bits to the target server. You'll have to type in 'yes'`
		b9e861	`to accept the host key for your targer server if this is the first time`
		b9e861	`you've connected to it, and then input the target system user's password`
		b9e861	`to send over the necessary ssh key file. Restart the kdump service via`
		b9e861	`'/sbin/systemctl restart kdump.service' to commit this change to your kdump initrd.`
		b9e861
		b9e861	`Path`
		b9e861	`====`
		b9e861	`"path" represents the file system path in which vmcore will be saved. In`
		b9e861	`fact kdump creates a directory $hostip-$date with-in "path" and saves`
		b9e861	`vmcore there. So practically dump is saved in $path/$hostip-$date/. To`
		b9e861	`simplify discussion further, if we say dump will be saved in $path, it`
		b9e861	`is implied that kdump will create another directory inside path and`
		b9e861	`save vmcore there.`
		b9e861
		b9e861	`If a dump target is specified in kdump.conf, then "path" is relative to the`
		b9e861	`specified dump target. For example, if dump target is "ext4 /dev/sda", then`
		b9e861	`dump will be saved in "$path" directory on /dev/sda.`
		b9e861
		b9e861	`Same is the case for nfs dump. If user specified "nfs foo.com:/export/tmp/"`
		b9e861	`as dump target, then dump will effectively be saved in`
		b9e861	`"foo.com:/export/tmp/var/crash/" directory.`
		b9e861
		b9e861	`Interpretation of path changes a bit if user has not specified a dump`
		b9e861	`target explicitly in kdump.conf. In this case, "path" represents the`
		b9e861	`absolute path from root. And dump target and adjusted path are arrived`
		b9e861	`at automatically depending on what's mounted in the current system.`
		b9e861
		b9e861	`Following are few examples.`
		b9e861
		b9e861	`path /var/crash/`
		b9e861	`----------------`
		b9e861	`Assuming there is no disk mounted on /var/ or on /var/crash, dump will`
		b9e861	`be saved on disk backing rootfs in directory /var/crash.`
		b9e861
		b9e861	`path /var/crash/ (A separate disk mounted on /var)`
		b9e861	`--------------------------------------------------`
		b9e861	`Say a disk /dev/sdb is mouted on /var. In this case dump target will`
		b9e861	`become /dev/sdb and path will become "/crash" and dump will be saved`
		b9e861	`on "sdb:/crash/" directory.`
		b9e861
		b9e861	`path /var/crash/ (NFS mounted on /var)`
		b9e861	`-------------------------------------`
		b9e861	`Say foo.com:/export/tmp is mounted on /var. In this case dump target is`
		b9e861	`nfs server and path will be adjusted to "/crash" and dump will be saved to`
		b9e861	`foo.com:/export/tmp/crash/ directory.`
		b9e861
		b9e861	`Kdump boot directory`
		b9e861	`====================`
		b9e861	`Usually kdump kernel is the same as 1st kernel. So kdump will try to find`
		b9e861	`kdump kernel under /boot according to /proc/cmdline. E.g we execute below`
		b9e861	`command and get an output:`
		b9e861	`cat /proc/cmdline`
		b9e861	`BOOT_IMAGE=/xxx/vmlinuz-3.yyy.zzz root=xxxx .....`
		b9e861	`Then kdump kernel will be /boot/xxx/vmlinuz-3.yyy.zzz.`
		b9e861	`However a variable KDUMP_BOOTDIR in /etc/sysconfig/kdump is provided to`
		b9e861	`user if kdump kernel is put in a different directory.`
		b9e861
		b9e861	`Kdump Post-Capture Executable`
		b9e861
		b9e861	`It is possible to specify a custom script or binary you wish to run following`
		b9e861	`an attempt to capture a vmcore. The executable is passed an exit code from`
		b9e861	`the capture process, which can be used to trigger different actions from`
		b9e861	`within your post-capture executable.`
		b9e861
		b9e861	`Kdump Pre-Capture Executable`
		b9e861
		b9e861	`It is possible to specify a custom script or binary you wish to run before`
		b9e861	`capturing a vmcore. Exit status of this binary is interpreted:`
		b9e861	`0 - continue with dump process as usual`
		b9e861	`non 0 - reboot the system`
		b9e861
		b9e861	`Extra Binaries`
		b9e861
		b9e861	`If you have specific binaries or scripts you want to have made available`
		b9e861	`within your kdump initrd, you can specify them by their full path, and they`
		b9e861	`will be included in your kdump initrd, along with all dependent libraries.`
		b9e861	`This may be particularly useful for those running post-capture scripts that`
		b9e861	`rely on other binaries.`
		b9e861
		b9e861	`Extra Modules`
		b9e861
		b9e861	`By default, only the bare minimum of kernel modules will be included in your`
		b9e861	`kdump initrd. Should you wish to capture your vmcore files to a non-boot-path`
		b9e861	`storage device, such as an iscsi target disk or clustered file system, you may`
		b9e861	`need to manually specify additional kernel modules to load into your kdump`
		b9e861	`initrd.`
		b9e861
		b9e861	`Failure action`
		b9e861	`==============`
		b9e861	`Failure action specifies what to do when dump to configured dump target`
		b9e861	`fails. By default, failure action is "reboot" and that is system reboots`
		b9e861	`if attempt to save dump to dump target fails.`
		b9e861
		b9e861	`There are other failure actions available though.`
		b9e861
		b9e861	`- dump_to_rootfs`
		b9e861	`This option tries to mount root and save dump on root filesystem`
		b9e861	`in a path specified by "path". This option will generally make`
		b9e861	`sense when dump target is not root filesystem. For example, if`
		b9e861	`dump is being saved over network using "ssh" then one can specify`
		b9e861	`failure action to "dump_to_rootfs" to try saving dump to root`
		b9e861	`filesystem if dump over network fails.`
		b9e861
		b9e861	`- shell`
		b9e861	`Drop into a shell session inside initramfs.`
		b9e861	`- halt`
		b9e861	`Halt system after failure`
		b9e861	`- poweroff`
		b9e861	`Poweroff system after failure.`
		b9e861
		b9e861	`Compression and filtering`
		b9e861
		b9e861	`The 'core_collector' parameter in kdump.conf allows you to specify a custom`
		b9e861	`dump capture method. The most common alternate method is makedumpfile, which`
		b9e861	`is a dump filtering and compression utility provided with kexec-tools. On`
		b9e861	`some architectures, it can drastically reduce the size of your vmcore files,`
		b9e861	`which becomes very useful on systems with large amounts of memory.`
		b9e861
		b9e861	`A typical setup is 'core_collector makedumpfile -F -l --message-level 1 -d 31',`
		b9e861	`but check the output of '/sbin/makedumpfile --help' for a list of all available`
		b9e861	`options (-i and -g don't need to be specified, they're automatically taken care`
		b9e861	`of). Note that use of makedumpfile requires that the kernel-debuginfo package`
		b9e861	`corresponding with your running kernel be installed.`
		b9e861
		b9e861	`Core collector command format depends on dump target type. Typically for`
		b9e861	`filesystem (local/remote), core_collector should accept two arguments.`
		b9e861	`First one is source file and second one is target file. For ex.`
		b9e861
		b9e861	`ex1.`
		b9e861	`---`
		b9e861	`core_collector "cp --sparse=always"`
		b9e861
		b9e861	`Above will effectively be translated to:`
		b9e861
		b9e861	`cp --sparse=always /proc/vmcore <dest-path>/vmcore`
		b9e861
		b9e861	`ex2.`
		b9e861	`---`
		b9e861	`core_collector "makedumpfile -l --message-level 1 -d 31"`
		b9e861
		b9e861	`Above will effectively be translated to:`
		b9e861
		b9e861	`makedumpfile -l --message-level 1 -d 31 /proc/vmcore <dest-path>/vmcore`
		b9e861
		b9e861
		b9e861	`For dump targets like raw and ssh, in general, core collector should expect`
		b9e861	`one argument (source file) and should output the processed core on standard`
		b9e861	`output (There is one exception of "scp", discussed later). This standard`
		b9e861	`output will be saved to destination using appropriate commands.`
		b9e861
		b9e861	`raw dumps core_collector examples:`
		b9e861	`---------`
		b9e861	`ex3.`
		b9e861	`---`
		b9e861	`core_collector "cat"`
		b9e861
		b9e861	`Above will effectively be translated to.`
		b9e861
		b9e861	`cat /proc/vmcore \| dd of=<target-device>`
		b9e861
		b9e861	`ex4.`
		b9e861	`---`
		b9e861	`core_collector "makedumpfile -F -l --message-level 1 -d 31"`
		b9e861
		b9e861	`Above will effectively be translated to.`
		b9e861
		b9e861	`makedumpfile -F -l --message-level 1 -d 31 \| dd of=<target-device>`
		b9e861
		b9e861	`ssh dumps core_collector examples:`
		b9e861	`---------`
		b9e861	`ex5.`
		b9e861	`---`
		b9e861	`core_collector "cat"`
		b9e861
		b9e861	`Above will effectively be translated to.`
		b9e861
		b9e861	`cat /proc/vmcore \| ssh <options> <remote-location> "dd of=path/vmcore"`
		b9e861
		b9e861	`ex6.`
		b9e861	`---`
		b9e861	`core_collector "makedumpfile -F -l --message-level 1 -d 31"`
		b9e861
		b9e861	`Above will effectively be translated to.`
		b9e861
		b9e861	`makedumpfile -F -l --message-level 1 -d 31 \| ssh <options> <remote-location> "dd of=path/vmcore"`
		b9e861
		b9e861	`There is one exception to standard output rule for ssh dumps. And that is`
		b9e861	`scp. As scp can handle ssh destinations for file transfers, one can`
		b9e861	`specify "scp" as core collector for ssh targets (no output on stdout).`
		b9e861
		b9e861	`ex7.`
		b9e861	`----`
		b9e861	`core_collector "scp"`
		b9e861
		b9e861	`Above will effectively be translated to.`
		b9e861
		b9e861	`scp /proc/vmcore <user@host>:path/vmcore`
		b9e861
		b9e861	`About default core collector`
		b9e861	`----------------------------`
		b9e861	`Default core_collector for ssh/raw dump is:`
		b9e861	`"makedumpfile -F -l --message-level 1 -d 31".`
		b9e861	`Default core_collector for other targets is:`
		b9e861	`"makedumpfile -l --message-level 1 -d 31".`
		b9e861
		b9e861	`Even if core_collector option is commented out in kdump.conf, makedumpfile`
		b9e861	`is default core collector and kdump uses it internally.`
		b9e861
		b9e861	`If one does not want makedumpfile as default core_collector, then they`
		b9e861	`need to specify one using core_collector option to change the behavior.`
		b9e861
		b9e861	`Note: If "makedumpfile -F" is used then you will get a flattened format`
		b9e861	`vmcore.flat, you will need to use "makedumpfile -R" to rearrange the`
		b9e861	`dump data from stdard input to a normal dumpfile (readable with analysis`
		b9e861	`tools).`
		b9e861	`For example: "makedumpfile -R vmcore < vmcore.flat"`
		b9e861
		b9e861	`Caveats:`
		b9e861
		b9e861	`Console frame-buffers and X are not properly supported. If you typically run`
		b9e861	`with something along the lines of "vga=791" in your kernel config line or`
		b9e861	`have X running, console video will be garbled when a kernel is booted via`
		b9e861	`kexec. Note that the kdump kernel should still be able to create a dump,`
		b9e861	`and when the system reboots, video should be restored to normal.`
		b9e861
		b9e861
		b9e861	`Notes on resetting video:`
		b9e861
		b9e861	`Video is a notoriously difficult issue with kexec. Video cards contain ROM code`
		b9e861	`that controls their initial configuration and setup. This code is nominally`
		b9e861	`accessed and executed from the Bios, and otherwise not safely executable. Since`
		b9e861	`the purpose of kexec is to reboot the system without re-executing the Bios, it`
		b9e861	`is rather difficult if not impossible to reset video cards with kexec. The`
		b9e861	`result is, that if a system crashes while running in a graphical mode (i.e.`
		b9e861	`running X), the screen may appear to become 'frozen' while the dump capture is`
		b9e861	`taking place. A serial console will of course reveal that the system is`
		b9e861	`operating and capturing a vmcore image, but a casual observer will see the`
		b9e861	`system as hung until the dump completes and a true reboot is executed.`
		b9e861
		b9e861	`There are two possiblilties to work around this issue. One is by adding`
		b9e861	`--reset-vga to the kexec command line options in /etc/sysconfig/kdump. This`
		b9e861	`tells kdump to write some reasonable default values to the video card register`
		b9e861	`file, in the hopes of returning it to a text mode such that boot messages are`
		b9e861	`visible on the screen. It does not work with all video cards however.`
		b9e861	`Secondly, it may be worth trying to add vga15fb.ko to the extra_modules list in`
		b9e861	`/etc/kdump.conf. This will attempt to use the video card in framebuffer mode,`
		b9e861	`which can blank the screen prior to the start of a dump capture.`
		b9e861
		b9e861	`Notes on rootfs mount:`
		b9e861	`Dracut is designed to mount rootfs by default. If rootfs mounting fails it`
		b9e861	`will refuse to go on. So kdump leaves rootfs mounting to dracut currently.`
		b9e861	`We make the assumtion that proper root= cmdline is being passed to dracut`
		b9e861	`initramfs for the time being. If you need modify "KDUMP_COMMANDLINE=" in`
		b9e861	`/etc/sysconfig/kdump, you will need to make sure that appropriate root=`
		b9e861	`options are copied from /proc/cmdline. In general it is best to append`
		b9e861	`command line options using "KDUMP_COMMANDLINE_APPEND=" instead of replacing`
		b9e861	`the original command line completely.`
		b9e861
		b9e861	`Notes on watchdog module handling:`
		b9e861
		b9e861	`If a watchdog is active in first kernel then, we must have it's module`
		b9e861	`loaded in crash kernel, so that either watchdog is deactivated or started`
		b9e861	`being kicked in second kernel. Otherwise, we might face watchdog reboot`
		b9e861	`when vmcore is being saved. When dracut watchdog module is enabled, it`
		b9e861	`installs kernel watchdog module of active watchdog device in initrd.`
		b9e861	`kexec-tools always add "-a watchdog" to the dracut_args if there exists at`
		b9e861	`least one active watchdog and user has not added specifically "-o watchdog"`
		b9e861	`in dracut_args of kdump.conf. If a watchdog module (such as hp_wdt) has`
		b9e861	`not been written in watchdog-core framework then this option will not have`
		b9e861	`any effect and module will not be added. Please note that only systemd`
		b9e861	`watchdog daemon is supported as watchdog kick application.`
		b9e861
		b9e861	`Notes for disk images:`
		b9e861
		b9e861	`Kdump initramfs is a critical component for capturing the crash dump.`
		b9e861	`But it's strictly generated for the machine it will run on, and have`
		b9e861	`no generality. If you install a new machine with a previous disk image`
		b9e861	`(eg. VMs created with disk image or snapshot), kdump could be broken`
		b9e861	`easily due to hardware changes or disk ID changes. So it's strongly`
		b9e861	`recommended to not include the kdump initramfs in the disk image in the`
		b9e861	`first place, this helps to save space, and kdumpctl will build the`
		b9e861	`initramfs automatically if it's missing. If you have already installed`
		b9e861	`a machine with a disk image which have kdump initramfs embedded, you`
		b9e861	`should rebuild the initramfs using "kdumpctl rebuild" command manually,`
		b9e861	`or else kdump may not work as expeceted.`
		b9e861
		b9e861	`Notes on encrypted dump target:`
		b9e861
		b9e861	`Currently, kdump is not working well with encrypted dump target.`
		b9e861	`First, user have to give the password manually in capture kernel,`
		b9e861	`so a working interactive terminal is required in the capture kernel.`
		b9e861	`And another major issue is that an OOM problem will occur with certain`
		b9e861	`encryption setup. For example, the default setup for LUKS2 will use a`
		b9e861	`memory hard key derivation function to mitigate brute force attach,`
		b9e861	`it's impossible to reduce the memory usage for mounting the encrypted`
		b9e861	`target. In such case, you have to either reserved enough memory for`
		b9e861	`crash kernel according, or update your encryption setup.`
		b9e861	`It's recommanded to use a non-encrypted target (eg. remote target)`
		b9e861	`instead.`
		b9e861
		b9e861	`Notes on device dump:`
		b9e861
		b9e861	`Device dump allows drivers to append dump data to vmcore, so you can`
		b9e861	`collect driver specified debug info. The drivers could append the`
		b9e861	`data without any limit, and the data is stored in memory, this may`
		b9e861	`bring a significant memory stress. So device dump is disabled by default`
		b9e861	`by passing "novmcoredd" command line option to the kdump capture kernel.`
		b9e861	`If you want to collect debug data with device dump, you need to modify`
		b9e861	`"KDUMP_COMMANDLINE_APPEND=" value in /etc/sysconfig/kdump and remove the`
		b9e861	`"novmcoredd" option. You also need to increase the "crashkernel=" value`
		b9e861	`accordingly in case of OOM issue.`
		b9e861	`Besides, kdump initramfs won't automatically include the device drivers`
		b9e861	`which support device dump, only device drivers that are required for`
		b9e861	`the dump target setup will be included. To ensure the device dump data`
		b9e861	`will be included in the vmcore, you need to force include related`
		b9e861	`device drivers by using "extra_modules" option in /etc/kdump.conf`
		b9e861
		b9e861	`Parallel Dumping Operation`
		b9e861	`==========================`
		b9e861	`Kexec allows kdump using multiple cpus. So parallel feature can accelerate`
		b9e861	`dumping substantially, especially in executing compression and filter.`
		b9e861	`For example:`
		b9e861
		b9e861	`1."makedumpfile -c --num-threads [THREAD_NUM] /proc/vmcore dumpfile"`
		b9e861	`2."makedumpfile -c /proc/vmcore dumpfile",`
		b9e861
		b9e861	`1 has better performance than 2, if THREAD_NUM is larger than two`
		b9e861	`and the usable cpus number is larger than THREAD_NUM.`
		b9e861
		b9e861	`Notes on how to use multiple cpus on a capture kernel on x86 system:`
		b9e861
		b9e861	`Make sure that you are using a kernel that supports disable_cpu_apicid`
		b9e861	`kernel option as a capture kernel, which is needed to avoid x86 specific`
		b9e861	`hardware issue (*). The disable_cpu_apicid kernel option is automatically`
		b9e861	`appended by kdumpctl script and is ignored if the kernel doesn't support it.`
		b9e861
		b9e861	`You need to specify how many cpus to be used in a capture kernel by specifying`
		b9e861	`the number of cpus in nr_cpus kernel option in /etc/sysconfig/kdump. nr_cpus`
		b9e861	`is 1 at default.`
		b9e861
		b9e861	`You should use necessary and sufficient number of cpus on a capture kernel.`
		b9e861	`Warning: Don't use too many cpus on a capture kernel, or the capture kernel`
		b9e861	`may lead to panic due to Out Of Memory.`
		b9e861
		b9e861	`(*) Without disable_cpu_apicid kernel option, capture kernel may lead to`
		b9e861	`hang, system reset or power-off at boot, depending on your system and runtime`
		b9e861	`situation at the time of crash.`
		b9e861
		b9e861	`Debugging Tips`
		b9e861	`--------------`
		b9e861	`- One can drop into a shell before/after saving vmcore with the help of`
		b9e861	`using kdump_pre/kdump_post hooks. Use following in one of the pre/post`
		b9e861	`scripts to drop into a shell.`
		b9e861
		b9e861	`#!/bin/bash`
		b9e861	`_ctty=/dev/ttyS0`
		b9e861	`setsid /bin/sh -i -l 0<>$_ctty 1<>$_ctty 2<>$_ctty`
		b9e861
		b9e861	`One might have to change the terminal depending on what they are using.`
		b9e861
		b9e861	`- Serial console logging for virtual machines`
		b9e861
		b9e861	`I generally use "virsh console <domain-name>" to get to serial console.`
		b9e861	`I noticed after dump saving system reboots and when grub menu shows up`
		b9e861	`some of the previously logged messages are no more there. That means`
		b9e861	`any important debugging info at the end will be lost.`
		b9e861
		b9e861	`One can log serial console as follows to make sure messages are not lost.`
		b9e861
		b9e861	`virsh ttyconsole <domain-name>`
		b9e861	`ln -s <name-of-tty> /dev/modem`
		b9e861	`minicom -C /tmp/console-logs`
		b9e861
		b9e861	`Now minicom should be logging serial console in file console-logs.`
		b9e861
		b9e861

rpms / kexec-tools

Source Code

Blame SOURCES/kexec-kdump-howto.txt