5.7.2. Zookeeper process down alert

This alert is triggered if the ZooKeeper process cannot be determined to be up and listening on the network for the configured critical threshold, given in seconds. It uses the Nagios check_tcp plugin.

 5.7.2.1. Potential causes
  • The Nagios server cannot connect to one or more ZooKeeper processes

  • The ZooKeeper hosts are down

  • The ZooKeeper processes are not down but are not listening to the correct network port/address

 5.7.2.2. Possible remedies
  • Check for dead DataNodes in the Services list.

  • Check for any errors in the ZooKeeper logs (/var/log/hadoop/zookeeper) and restart the ZooKeeper hosts/processes

  • Run the netstat-tuplpn command to check if the ZooKeeper process is bound to the correct network port.


loading table of contents...