Apache Hadoop High Availability
Also available as:
PDF
loading table of contents...

Use Cases and Failover Scenarios

This section provides information on the use cases and failover scenarios for high availability (HA) in the Hive metastore.

Use Cases

The metastore HA solution is designed to handle metastore service failures. Whenever a deployed metastore service goes down, metastore service can remain unavailable for a considerable time until service is brought back up. To avoid such outages, deploy the metastore service in HA mode.

Deployment Scenarios

Hortonworks recommends deploying the metastore service on multiple boxes concurrently. Each Hive metastore client will read the configuration property hive.metastore.uris to get a list of metastore servers with which it can try to communicate.

<property>
 <name> hive.metastore.uris </name>
 <value> thrift://$Hive_Metastore_Server_Host_Machine_FQDN </value>
 <description> A comma separated list of metastore uris on which metastore service is running </description>
</property>

Note that the relational database that backs the Hive metastore itself should also be made highly available using the best practices defined for the database system in use.

In the case of a secure cluster, add the following configuration property to the hive-site.xml file for each metastore server:

<property>
 <name> hive.cluster.delegation.token.store.class</name>
 <value>org.apache.hadoop.hive.thrift.ZooKeeperTokenStore</value>
</property>

Failover Scenario

The Hive metastore client randomly chooses a metastore URI when multiple metastores are configured, which helps in load-balancing.