The NameNode process can appear hung during Garbage Collection event. To prevent this from triggering immediate failover, a grace period is provided to the NameNode to resume its operation. You can configure this grace period using the following property:
<property> <name>service.monitor.probe.timeout</name> <value>60000</value> <description> Duration in milliseconds for the probe loop to be blocked, before it is considered a liveness failure </description> </property>
A smaller value will cause the VM (where the hung NameNode process is running) faster, but it increases the risk of incorrectly identifying a long GC-related pause as a hung process. On larger clusters (with longer GC pauses), you can increase the value of this property.