Chapter 1. High Availability for Hadoop

This guide provides an overview of the HDFS High Availability (HA) feature and instructions on configuring and managing a highly available (HA) HDFS cluster using the Quorum Journal Manager (QJM) feature and Zookeeper Failover Controller.

This guide provides instructions on configuring and using HDFS HA using the Quorum Journal Manager (QJM) and the Zookeeper Failover Controller in order to share edit logs between the Active and Standby NameNodes.

[Note]Note

This guide assumes that an existing HDP cluster has been installed manually and deployed. It provides instructions on how to enable HA on top of the existing cluster. If Ambari was used to install that existing cluster, refer to the Ambari documentation for more details on configuring NameNode HA using the Ambari wizard.

The NameNode was a single point of failure (SPOF) in an HDFS cluster. Each cluster had a single NameNode and if that machine or process became unavailable, entire cluster would be unavailable until the NameNode was either restarted or started on a separate machine. This situation impacted the total availability of the HDFS cluster in two major ways:

  • In the case of an unplanned event such as a machine crash, the cluster would be unavailable until an operator restarted the NameNode.

  • Planned maintenance events such as software or hardware upgrades on the NameNode machine would result in windows of cluster downtime.

The HDFS HA feature addresses these problems. The HA feature lets you run redundant NameNodes in the same cluster in an Active/Passive configuration with a hot standby. This mechanism thus facilitates either a fast failover to the new NameNode during machine crash or a graceful administrator-initiated failover during planned maintenance.

In this document: