|
|
903fe0 |
|
|
|
903fe0 |
Instructions for how to set up the watchdog daemon to work with IPMI's hardware watchdog
|
|
|
903fe0 |
----------------------------------------------------------------------------------------
|
|
|
903fe0 |
|
|
|
903fe0 |
First, verify that the ipmitool utility is present on the system to allow
|
|
|
903fe0 |
the watchdog timer to be turned off via the command line (which ipmitool).
|
|
|
903fe0 |
This will allow the hardware watchdog timer to be turned off gracefully
|
|
|
903fe0 |
should it ever become necessary. If ipmitool is not present, install
|
|
|
903fe0 |
it or download the latest version from http://ipmitool.sourceforge.net and
|
|
|
903fe0 |
build and install it on your system.
|
|
|
903fe0 |
|
|
|
903fe0 |
Next, prior to starting up the watchdog daemon, the BMC BIOS should be set
|
|
|
903fe0 |
to enable the IPMI/BMC hardware watchdog timer, the OpenIPMI watchdog driver
|
|
|
903fe0 |
module should be inserted with the desired configuration/startup settings,
|
|
|
903fe0 |
and the watchdog daemon's configuration file should be modified to use /dev/watchdog:
|
|
|
903fe0 |
|
|
|
903fe0 |
1. To setup the IPMI/BMC BIOS to enable the hardware watchdog
|
|
|
903fe0 |
timer, see BMC documentation. The main settings in the BMC BIOS
|
|
|
903fe0 |
requiring modification to turn on the IPMI watchdog timer are:
|
|
|
903fe0 |
|
|
|
903fe0 |
- Set the BMC POST Watchdog to "ENABLED".
|
|
|
903fe0 |
- Set the BMC POST Watchdog Timeout to "5 Minutes".
|
|
|
903fe0 |
|
|
|
903fe0 |
2. To insert the OpenIPMI watchdog driver module with the
|
|
|
903fe0 |
desired configuration settings, two steps are necessary:
|
|
|
903fe0 |
|
|
|
903fe0 |
i.) Configure the OpenIPMI watchdog driver by editing the
|
|
|
903fe0 |
/etc/sysconfig/ipmi configuration file:
|
|
|
903fe0 |
|
|
|
903fe0 |
- Set "IPMI_WATCHDOG=yes".
|
|
|
903fe0 |
- Set desired options via the IPMI_WATCHDOG_OPTIONS
|
|
|
903fe0 |
config entry.
|
|
|
903fe0 |
|
|
|
903fe0 |
EXAMPLE: 'IPMI_WATCHDOG_OPTIONS="timeout=60 start_now=1 \
|
|
|
903fe0 |
preop=preop_give_data action=power_cycle pretimeout=1" '
|
|
|
903fe0 |
|
|
|
903fe0 |
Execute "modinfo ipmi_watchdog" for more detailed information
|
|
|
903fe0 |
on the available ipmi watchdog timer options.
|
|
|
903fe0 |
|
|
|
903fe0 |
- Execute "service ipmi start" (the watchdog driver starts
|
|
|
903fe0 |
automatically along with the other ipmi drivers).
|
|
|
903fe0 |
|
|
|
903fe0 |
IMPORTANT: If "start_now=1" has been set as one of the
|
|
|
903fe0 |
configuration options, be sure to start up the watchdog
|
|
|
903fe0 |
daemon before the BMC timer expires!
|
|
|
903fe0 |
|
|
|
903fe0 |
ii.) Set the OpenIPMI daemon and watchdog to start during bootup:
|
|
|
903fe0 |
|
|
|
903fe0 |
- chkconfig ipmi on
|
|
|
903fe0 |
- chkconfig watchdog on
|
|
|
903fe0 |
|
|
|
903fe0 |
|
|
|
903fe0 |
3. Configure the watchdog daemon by editing the
|
|
|
903fe0 |
/etc/watchdog.conf configuration file:
|
|
|
903fe0 |
|
|
|
903fe0 |
- Uncomment the "watchdog-device = /dev/watchdog" line.
|
|
|
903fe0 |
- Ensure that "realtime = yes" and "priority = 1" are set and not
|
|
|
903fe0 |
commented-out.
|
|
|
903fe0 |
- Uncomment the "interval" line, and set the interval to be less
|
|
|
903fe0 |
than what you set the timeout option to be in the /etc/sysconfig/ipmi
|
|
|
903fe0 |
file (ex "timeout=60" so you might set interval to 50).
|
|
|
903fe0 |
|
|
|
903fe0 |
So in the example described herein, the BMC BIOS setting is in
|
|
|
903fe0 |
minutes (5), and the "interval" and ipmi_watchdog "timeout" settings
|
|
|
903fe0 |
are both in seconds (50 and 60 respectively). Therefore, the BMC
|
|
|
903fe0 |
hardware watchdog timer is set to expire and trigger a system power
|
|
|
903fe0 |
cycle unless reset by the watchdog daemon within 5 minutes, and the
|
|
|
903fe0 |
watchdog daemon will reset the timer every 60 seconds.
|
|
|
903fe0 |
|
|
|
903fe0 |
|
|
|
903fe0 |
4. Start the Watchdog daemon:
|
|
|
903fe0 |
|
|
|
903fe0 |
- execute "service watchdog start"
|
|
|
903fe0 |
|
|
|
903fe0 |
|
|
|
903fe0 |
IMPORTANT: To gracefully stop/kill the watchdog daemon, be sure
|
|
|
903fe0 |
to use "service watchdog stop" (which executes "kill -s SIGTERM <pid>")
|
|
|
903fe0 |
and do *not* use "kill -9 <pid>". Using "kill -9 <pid>" will cause the
|
|
|
903fe0 |
daemon to be shut off without stopping the BMC's watchdog timer, thus
|
|
|
903fe0 |
a system reboot will be triggered when the BMC's watchdog timer expires.
|
|
|
903fe0 |
|
|
|
903fe0 |
Alternately, or in case the watchdog daemon is killed "ungracefully",
|
|
|
903fe0 |
you can stop the BMC timer by executing the following ipmitool utility
|
|
|
903fe0 |
command before the watchdog timer expires:
|
|
|
903fe0 |
|
|
|
903fe0 |
# ipmitool -v raw 0x06 0x24 0x04 0x01 0x00 0x10 0x00 0x0a
|
|
|
903fe0 |
|
|
|
903fe0 |
----------------------------------------------------------------------
|
|
|
903fe0 |
|
|
|
903fe0 |
To test the watchdog after system configuration and setup:
|
|
|
903fe0 |
|
|
|
903fe0 |
. Use kill -9 on the watchdog daemon so it doesn't shut down the watchdog daemon
|
|
|
903fe0 |
gracefully. Verify that the system gets reset after the BMC timer expires.
|
|
|
903fe0 |
|
|
|
903fe0 |
. Use "service watchdog stop" and verify that the watchdog daemon shuts off
|
|
|
903fe0 |
the BMC watchdog timer gracefully (the system doesn't get reset).
|
|
|
903fe0 |
|
|
|
903fe0 |
. Set the timer on the watchdog daemon to be greater than the time set in
|
|
|
903fe0 |
the BMC BIOS for system reset and verify that the system is reset.
|
|
|
903fe0 |
|
|
|
903fe0 |
. Set the timer on the daemon to be less than the time set in the
|
|
|
903fe0 |
BMC timer and verify that the BMC watchdog is poked regularly and the
|
|
|
903fe0 |
system is not reset.
|
|
|
903fe0 |
|
|
|
903fe0 |
. Test some of the other actions the BMC can take when the watchdog timer
|
|
|
903fe0 |
goes off (see modinfo ipmi_watchdog for some other settings to try).
|
|
|
903fe0 |
|