|
|
b9e861 |
Kdump-in-cluster-environment HOWTO
|
|
|
b9e861 |
|
|
|
b9e861 |
Introduction
|
|
|
b9e861 |
|
|
|
b9e861 |
Kdump is a kexec based crash dumping mechansim for Linux. This docuement
|
|
|
b9e861 |
illustrate how to configure kdump in cluster environment to allow the kdump
|
|
|
b9e861 |
crash recovery service complete without being preempted by traditional power
|
|
|
b9e861 |
fencing methods.
|
|
|
b9e861 |
|
|
|
b9e861 |
Overview
|
|
|
b9e861 |
|
|
|
b9e861 |
Kexec/Kdump
|
|
|
b9e861 |
|
|
|
b9e861 |
Details about Kexec/Kdump are available in Kexec-Kdump-howto file and will not
|
|
|
b9e861 |
be described here.
|
|
|
b9e861 |
|
|
|
b9e861 |
fence_kdump
|
|
|
b9e861 |
|
|
|
b9e861 |
fence_kdump is an I/O fencing agent to be used with the kdump crash recovery
|
|
|
b9e861 |
service. When the fence_kdump agent is invoked, it will listen for a message
|
|
|
b9e861 |
from the failed node that acknowledges that the failed node is executing the
|
|
|
b9e861 |
kdump crash kernel. Note that fence_kdump is not a replacement for traditional
|
|
|
b9e861 |
fencing methods. The fence_kdump agent can only detect that a node has entered
|
|
|
b9e861 |
the kdump crash recovery service. This allows the kdump crash recovery service
|
|
|
b9e861 |
complete without being preempted by traditional power fencing methods.
|
|
|
b9e861 |
|
|
|
b9e861 |
fence_kdump_send
|
|
|
b9e861 |
|
|
|
b9e861 |
fence_kdump_send is a utility used to send messages that acknowledge that the
|
|
|
b9e861 |
node itself has entered the kdump crash recovery service. The fence_kdump_send
|
|
|
b9e861 |
utility is typically run in the kdump kernel after a cluster node has
|
|
|
b9e861 |
encountered a kernel panic. Once the cluster node has entered the kdump crash
|
|
|
b9e861 |
recovery service, fence_kdump_send will periodically send messages to all
|
|
|
b9e861 |
cluster nodes. When the fence_kdump agent receives a valid message from the
|
|
|
b9e861 |
failed nodes, fencing is complete.
|
|
|
b9e861 |
|
|
|
b9e861 |
How to configure Pacemaker cluster environment:
|
|
|
b9e861 |
|
|
|
b9e861 |
If we want to use kdump in Pacemaker cluster environment, fence-agents-kdump
|
|
|
b9e861 |
should be installed in every nodes in the cluster. You can achieve this via
|
|
|
b9e861 |
the following command:
|
|
|
b9e861 |
|
|
|
b9e861 |
# yum install -y fence-agents-kdump
|
|
|
b9e861 |
|
|
|
b9e861 |
Next is to add kdump_fence to the cluster. Assuming that the cluster consists
|
|
|
b9e861 |
of three nodes, they are node1, node2 and node3, and use Pacemaker to perform
|
|
|
b9e861 |
resource management and pcs as cli configuration tool.
|
|
|
b9e861 |
|
|
|
b9e861 |
With pcs it is easy to add a stonith resource to the cluster. For example, add
|
|
|
b9e861 |
a stonith resource named mykdumpfence with fence type of fence_kdump via the
|
|
|
b9e861 |
following commands:
|
|
|
b9e861 |
|
|
|
b9e861 |
# pcs stonith create mykdumpfence fence_kdump \
|
|
|
b9e861 |
pcmk_host_check=static-list pcmk_host_list="node1 node2 node3"
|
|
|
b9e861 |
# pcs stonith update mykdumpfence pcmk_monitor_action=metadata --force
|
|
|
b9e861 |
# pcs stonith update mykdumpfence pcmk_status_action=metadata --force
|
|
|
b9e861 |
# pcs stonith update mykdumpfence pcmk_reboot_action=off --force
|
|
|
b9e861 |
|
|
|
b9e861 |
Then enable stonith
|
|
|
b9e861 |
# pcs property set stonith-enabled=true
|
|
|
b9e861 |
|
|
|
b9e861 |
How to configure kdump:
|
|
|
b9e861 |
|
|
|
b9e861 |
Actually there are two ways how to configure fence_kdump support:
|
|
|
b9e861 |
|
|
|
b9e861 |
1) Pacemaker based clusters
|
|
|
b9e861 |
If you have successfully configured fence_kdump in Pacemaker, there is
|
|
|
b9e861 |
no need to add some special configuration in kdump. So please refer to
|
|
|
b9e861 |
Kexec-Kdump-howto file for more information.
|
|
|
b9e861 |
|
|
|
b9e861 |
2) Generic clusters
|
|
|
b9e861 |
For other types of clusters there are two configuration options in
|
|
|
b9e861 |
kdump.conf which enables fence_kdump support:
|
|
|
b9e861 |
|
|
|
b9e861 |
fence_kdump_nodes <node(s)>
|
|
|
b9e861 |
Contains list of cluster node(s) separated by space to send
|
|
|
b9e861 |
fence_kdump notification to (this option is mandatory to enable
|
|
|
b9e861 |
fence_kdump)
|
|
|
b9e861 |
|
|
|
b9e861 |
fence_kdump_args <arg(s)>
|
|
|
b9e861 |
Command line arguments for fence_kdump_send (it can contain
|
|
|
b9e861 |
all valid arguments except hosts to send notification to)
|
|
|
b9e861 |
|
|
|
b9e861 |
These options will most probably be configured by your cluster software,
|
|
|
b9e861 |
so please refer to your cluster documentation how to enable fence_kdump
|
|
|
b9e861 |
support.
|
|
|
b9e861 |
|
|
|
b9e861 |
Please be aware that these two ways cannot be combined and 2) has precedence
|
|
|
b9e861 |
over 1). It means that if fence_kdump is configured using fence_kdump_nodes
|
|
|
b9e861 |
and fence_kdump_args options in kdump.conf, Pacemaker configuration is not
|
|
|
b9e861 |
used even if it exists.
|