Apache Hadoop High Availability
Also available as:
loading table of contents...


Complete the following instructions:

  1. Configure automatic failover.

    • Set up your cluster for automatic failover. Add the following property to the the hdfs-site.xml file for both the NameNode machines:

    • List the host-port pairs running the ZooKeeper service. Add the following property to the the core-site.xml file for both the NameNode machines:


      Suffix the configuration key with the nameservice ID to configure the above settings on a per-nameservice basis. For example, in a cluster with federation enabled, you can explicitly enable automatic failover for only one of the nameservices by setting dfs.ha.automatic-failover.enabled.$my-nameservice-id.

  2. Initialize HA state in ZooKeeper.

    Execute the following command on NN1:

    hdfs zkfc -formatZK -force

    This command creates a znode in ZooKeeper. The automatic failover system stores uses this znode for data storage.

  3. Check to see if ZooKeeper is running. If not, start ZooKeeper by executing the following command on the ZooKeeper host machine(s).

    su - zookeeper -c "export ZOOCFGDIR=/usr/hdp/current/zookeeper-server/conf ; export ZOOCFG=zoo.cfg; source /usr/hdp/current/zookeeper-server/conf/zookeeper-env.sh ; /usr/hdp/current/zookeeper-server/bin/zkServer.sh start"
  4. Start the JournalNodes, NameNodes, and DataNodes using the instructions provided in the Controlling HDP Services Manually chapter of the HDP Reference Guide. in "

  5. Start the ZooKeeper Failover Controller (ZKFC) by executing the following command:

    su -l hdfs -c "/usr/hdp/current/hadoop-hdfs-namenode/../hadoop/sbin/hadoop-daemon.sh start zkfc"

    The sequence of starting ZKFC determines which NameNode will become Active. For example, if ZKFC is started on NN1 first, it will cause NN1 to become Active.


    To convert a non-HA cluster to an HA cluster, Hortonworks recommends that you run the bootstrapStandby command (this command is used to initialize NN2) before you start ZKFC on any of the NameNode machines.

  6. Verify automatic failover.

    1. Locate the Active NameNode.

      Use the NameNode web UI to check the status for each NameNode host machine.

    2. Cause a failure on the Active NameNode host machine.

      For example, you can use the following command to simulate a JVM crash:

      kill -9 $PID_of_Active_NameNode

      Or, you could power cycle the machine or unplug its network interface to simulate outage.

    3. The Standby NameNode should now automatically become Active within several seconds.


      The amount of time required to detect a failure and trigger a failover depends on the configuration of ha.zookeeper.session-timeout.ms property (default value is 5 seconds).

    4. If the test fails, your HA settings might be incorrectly configured.

      Check the logs for the zkfc daemons and the NameNode daemons to diagnose the issue.