Tuning for NameNode Garbage Collection

The NameNode process can appear hung during Garbage Collection event. To prevent this from triggering immediate failover, a grace period is provided to the NameNode to resume its opera­tion. You can configure this grace period using the following property:

<property>  
<name>service.monitor.probe.timeout</name>
<value>60000</value>
<description>  
Duration in milliseconds for the probe loop to be blocked, before it is considered a liveness failure
</description> 
</property>

A smaller value will cause the VM (where the hung NameNode process is running) faster, but it increases the risk of incorrectly identifying a long GC-related pause as a hung process. On larger clusters (with longer GC pauses), you can increase the value of this property.