The Linux kernel can reset the system if serious problems are detected. This can be implemented via special watchdog hardware, or via a slightly less reliable software-only watchdog inside the kernel. Either way, there needs to be a daemon that tells the kernel the system is working fine. If the daemon stops doing that, the system is reset.
watchdog is such a daemon. It opens /dev/watchdog, and keeps writing to it often enough to keep the kernel from resetting, at least once per minute. Each write delays the reboot time another minute. After a minute the watchdog hardware will cause the reset. In the case of the software watchdog the ability to reboot will depend on the state of the machines and interrupts.
The watchdog can be stopped without causing a reboot if the device /dev/watchdog is closed correctly, unless your kernel is compiled with the CONFIG_WATCHDOG_NOWAYOUT option enabled.
Tests
The watchdog daemon does several tests to check the system status:
- Is the process table full?
- Is there enough free memory?
- Are some files accessible?
- Have some files changed within a given interval?
- Is the average work load too high?
- Has a file table overflow occurred?
- Is a process still running? The process is specified by a pid file.
- Do some IP addresses answer to ping?
- Do network interfaces receive traffic?
- Is the temperature too high? (Temperature data not always available.)
- Execute a user defined command to do arbitrary tests. If any of these checks fail watchdog will cause a shutdown. Should any of these tests except the user defined binary last longer than one minute the machine will be rebooted, too.
Installation
watchdog can be installed with the Pakfire web interface or via the console:
pakfire install watchdog
Configuration
- Install addon via pakfire. After installation the addon is disabled by default.
- Create /etc/sysconfig/watchdog
# Set run_watchdog to 1 to start watchdog or 0 to disable it.
run_watchdog=1
# Specify additional watchdogoptions here (see manpage).
watchdog_options=""
# If you need verbose message, use following option Specify module to load
watchdog_module="geodewdt"
# watchdog_module="it87_wdt"
# watchdog_module="softdog"
- Edit configuration file /etc/watchdog.conf
Links
- watchdog(8) - Linux man page on die.net
- The main page of watchdog
- watchdog.conf(5) - Linux man page on die.net
- Linux Watchdog -
- good resource for watchdog info