Configuring Fault Tolerance
Also available as:
PDF
loading table of contents...

Deploying the ResourceManager HA Cluster

Update the yarn-site.xml file and configuration files and start Zookeeper, HDFS, and YARN in that order.

  1. Copy the etc/hadoop/conf/yarn-site.xml file from the primary ResourceManager host to the standby ResourceManager host.
  2. Make sure that the clientPort value set in etc/zookeeper/conf/zoo.cfg matches the port set in the following yarn-site.xml property:
    <property>
     <name>yarn.resourcemanager.zk-state-store.address</name>
     <value>localhost:2181</value>
    </property>
  3. Start ZooKeeper. Execute this command on the ZooKeeper host machines:
    su - zookeeper -c "export ZOOCFGDIR=/usr/hdp/current/zookeeper-server/conf ; export ZOOCFG=zoo.cfg; source /usr/hdp/current/zookeeper-server/conf/zookeeper-env.sh ; /usr/hdp/current/zookeeper-server/bin/zkServer.sh start"
  4. Start HDFS.
  5. Start YARN.
  6. Set the active ResourceManager:

    MANUAL FAILOVER ONLY: If you configured manual ResourceManager failover, you must transition one of the ResourceManagers to Active mode. Execute the following CLI command to transition ResourceManager "rm1" to Active:

    yarn rmadmin -transitionToActive rm1

    You can use the following CLI command to transition ResourceManager "rm1" to Standby mode:

    yarn rmadmin -transitionToStandby rm1 

    AUTOMATIC FAILOVER: If you configured automatic ResourceManager failover, no action is required -- the Active ResourceManager will be chosen automatically.

  7. Start all remaining unstarted cluster services.