Apache Hadoop High Availability
Also available as:
PDF
loading table of contents...

Use Cases and Failover Scenarios

This section provides information on the use cases and failover scenarios for high availability (HA) in the Hive metastore.

Use Cases

The metastore HA solution is designed to handle metastore service failures. Whenever a deployed metastore service goes down, metastore service can remain unavailable for a considerable time until service is brought back up. To avoid such outages, deploy the metastore service in HA mode.

Deployment Scenarios

Hortonworks recommends deploying the metastore service on multiple boxes concurrently. Each Hive metastore client will read the configuration property hive.metastore.uris to get a list of metastore servers with which it can try to communicate.

<property>
 <name> hive.metastore.uris </name>
 <value> thrift://$Hive_Metastore_Server_Host_Machine_FQDN </value>
 <description> A comma separated list of metastore uris on which metastore service is running </description>
</property>

Note that the relational database that backs the Hive metastore itself should also be made highly available using the best practices defined for the database system in use.

In the case of a secure cluster, add the following configuration property to the hive-site.xml file for each metastore server:

<property>
 <name> hive.cluster.delegation.token.store.class</name>
 <value>org.apache.hadoop.hive.thrift.ZooKeeperTokenStore</value>
</property>

Failover Scenario

A Hive metastore client always uses the first URI to connect with the metastore server. If the metastore server becomes unreachable, the client randomly picks up a URI from the list and attempts to connect with that.