Apache Hadoop High Availability
Also available as:
PDF
loading table of contents...

Configure Manual or Automatic ResourceManager Failover

Prerequisites

Complete the following prerequisites:

  • Make sure that you have a working ZooKeeper service. If you had an Ambari deployed HDP cluster with ZooKeeper, you can use that ZooKeeper service. If not, deploy ZooKeeper using the instructions provided in the Non-Ambari Cluster Installation Guide.

    [Note]Note

    In a typical deployment, ZooKeeper daemons are configured to run on three or five nodes. It is, however, acceptable to co-locate the ZooKeeper nodes on the same hardware as the HDFS NameNode and Standby Node. Many operators choose to deploy the third ZooKeeper process on the same node as the YARN ResourceManager. To achieve performance and improve isolation, Hortonworks recommends configuring the ZooKeeper nodes such that the ZooKeeper data and HDFS metadata is stored on separate disk drives.

  • Shut down the cluster using the instructions provided in the Controlling HDP Services Manually chapter of the HDP Reference Guide.

Set Common ResourceManager HA Properties

The following properties are required for both manual and automatic ResourceManager HA. Add these properties to the etc/hadoop/conf/yarn-site.xml file:

Property NameRecommended ValueDescription

yarn.resourcemanager. ha.enabled

true

Enable RM HA

yarn.resourcemanager. ha.rm-ids

Cluster-specific, e.g., rm1,rm2

A comma-separated list of ResourceManager IDs in the cluster.

yarn.resourcemanager. hostname.<rm-id>

Cluster-specific

The host name of the ResourceManager. Must be set for all RMs.

yarn.resourcemanager. recovery.enabled

true

Enable job recovery on RM restart or failover.

yarn.resourcemanager. store.class

org.apache.hadoop.yarn. server.resourcemanager. recovery.ZKRMStateStore

The RMStateStore implementation to use to store the ResourceManager's internal state. The ZooKeeper- based store supports fencing implicitly, i.e., allows a single ResourceManager to make multiple changes at a time, and hence is recommended.

yarn.resourcemanager .zk-address

Cluster-specific

The ZooKeeper quorum to use to store the ResourceManager's internal state. For multiple ZK servers, use commas to separate multiple ZK servers.

yarn.client.failover-proxy-provider

org.apache.hadoop.yarn. client. ConfiguredRMFailover ProxyProvider

When HA is enabled, the class to be used by Clients, AMs and NMs to failover to the Active RM. It should extend

org.apache.hadoop.yarn. client.RMFailoverProxyProvider

This is an optional configuration. The default value is “org.apache.hadoop.yarn.client. ConfiguredRMFailoverProxyProvider”

[Note]Note

You can also set values for each of the following properties in yarn-site.xml:

yarn.resourcemanager.address.<rm‐id>
yarn.resourcemanager.scheduler.address.<rm‐id>
yarn.resourcemanager.admin.address.<rm‐id>
yarn.resourcemanager.resource‐tracker.address.<rm‐id>
yarn.resourcemanager.webapp.address.<rm‐id>

If these addresses are not explicitly set, each of these properties will use

yarn.resourcemanager.hostname.<rm-id>:default_port

such as DEFAULT_RM_PORT, DEFAULT_RM_SCHEDULER_PORT, etc.

The following is a sample yarn-site.xml file with these common ResourceManager HA properties configured:

<!-- RM HA Configurations-->

<property>
 <name>yarn.resourcemanager.ha.enabled</name>
 <value>true</value>
</property>

<property>
 <name>yarn.resourcemanager.ha.rm-ids</name>
 <value>rm1,rm2</value>
</property>

<property>
 <name>yarn.resourcemanager.hostname.rm1</name>
 <value>${rm1 address}</value>
</property>

<property>
 <name>yarn.resourcemanager.hostname.rm2</name>
 <value>${rm2 address}</value>
</property>

<property>
 <name>yarn.resourcemanager.webapp.address.rm1</name>
 <value>rm1_web_address:port_num</value>
 <description>We can set rm1_web_address separately.
   If not, it will use
   ${yarn.resourcemanager.hostname.rm1}:DEFAULT_RM_WEBAPP_PORT
   </description>
</property>

<property>
 <name>yarn.resourcemanager.webapp.address.rm2</name>
 <value>rm2_web_address:port_num</value>
</property>

<property>
 <name>yarn.resourcemanager.recovery.enabled</name>
 <value>true</value>
</property>

<property>
 <name>yarn.resourcemanager.store.class</name>
 <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.
    ZKRMStateStore</value>
</property>

<property>
 <name>yarn.resourcemanager.zk-address</name>
 <value>${zk1.address,zk2.address}</value>
</property>

<property>
 <name>yarn.client.failover-proxy-provider</name>
 <value>org.apache.hadoop.yarn.client.
    ConfiguredRMFailoverProxyProvider</value>
</property>

Configure Manual ResourceManager Failover

Automatic ResourceManager failover is enabled by default, so it must be disabled for manual failover.

To configure manual failover for ResourceManager HA, add the yarn.resourcemanager.ha.automatic-failover.enabled configuration property to the etc/hadoop/conf/yarn-site.xml file, and set its value to "false":

<property>
 <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
 <value>false</value>
</property>

Configure Automatic ResourceManager Failover

The preceding section described how to configure manual failover. In that mode, the system will not automatically trigger a failover from the active to the standby ResourceManager, even if the active node has failed. This section describes how to configure automatic failover.

  1. Add the following configuration options to the yarn-site.xml file:

    Property NameRecommended ValueDescription

    yarn.resourcemanager.ha. automatic-failover.zk-base-path

    /yarn-leader-election

    The base znode path to use for storing leader information, when using ZooKeeper-based leader election. This is an optional configuration. The default value is

    /yarn-leader-election

    yarn.resourcemanager. cluster-id

    yarn-cluster

    The name of the cluster. In a HA setting, this is used to ensure the RM participates in leader election for this cluster, and ensures that it does not affect other clusters.

    Example:

    <property>
     <name>yarn.resourcemanager.ha.automatic-failover.zk-base-path</name>
     <value>/yarn-leader-election</value>
     <description>Optional setting. The default value is
       /yarn-leader-election</description>
    </property>
    
    <property>
     <name>yarn.resourcemanager.cluster-id</name>
     <value>yarn-cluster</value>
    </property>
  2. Automatic ResourceManager failover is enabled by default.

    If you previously configured manual ResourceManager failover by setting the value of yarn.resourcemanager.ha.automatic-failover.enabled to "false", you must delete this property to return automatic failover to its default enabled state.