5.5.2. RegionServer process down alert

This alert is triggered if the RegionServer processes cannot be confirmed to be up and listening on the network for the configured critical threshold, given in seconds. It uses the Nagios check_tcp plugin.

 5.5.2.1. Potential causes
  • Misconfiguration or less-than-ideal configuration has caused the RegionServers to crash

  • Cascading failures brought on by some workload has caused the RegionServers to crash

  • The RegionServers have shut themselves down on their own because there were problems in the dependent services, ZooKeeper or HDFS

  • GC paused the RegionServer for too long and the RegionServers lost contact with Zoo­keeper

 5.5.2.2. Possible remedies
  • Check the dependent services to make sure they are operating correctly

  • Look at the RegionServer log files (usually /var/log/hbase/*.log) for further information

  • Look at the configuration files (/etc/hbase/conf)

  • If the failure was associated with a particular workload, try to understand the workload better

  • Restart the RegionServers


loading table of contents...