Chapter 8. High Availability for Hive Metastore

This document is intended for system administrators who need to configure the Hive Metastore service for High Availability.

 1. Use Cases and Fail Over Scenarios

This section provides information on the use cases and fail over scenarios for high availability (HA) in the Hive metastore.

Use Cases

The metastore HA solution is designed to handle metastore service failures. Whenever a deployed metastore service goes down, metastore service can remain unavailable for a considerable time until service is brought back up. To avoid such outages, deploy the metastore service in HA mode.

Deployment Scenarios

We recommend deploying the metastore service on multiple boxes concurrently. Each Hive metastore client will read the configuration property hive.metastore.uris to get a list of metastore servers with which it can try to communicate.

<property>
 <name> hive.metastore.uris </name>
 <value> thrift://$Hive_Metastore_Server_Host_Machine_FQDN </value>
 <description> A comma separated list of metastore uris on which metastore service is running </description>
 </property>
        

These metastore servers store their state in a MySQL HA cluster, which should be set up as recommended in the whitepaper "MySQL Replication for Failover Protection."

In the case of a secure cluster, each of the metastore servers will additionally need to have the following configuration property in its hive-site.xml file.

<property>
 <name> hive.cluster.delegation.token.store.class </name>
 <value> org.apache.hadoop.hive.thrift.DBTokenStore </value>
 </property>
        

Fail Over Scenario

A Hive metastore client always uses the first URI to connect with the metastore server. In case the metastore server becomes unreachable, the client will randomly pick up a URI from the list and try connecting with that.

 2. Software Configuration

Complete the following tasks to configure Hive HA solution:

 2.1. Install HDP

Use the following instructions to install HDP on your cluster hardware. Ensure that you specify the virtual machine (configured in the previous section) as your NameNode.

  1. Download Apache Ambari using the instructions provided here.

    [Note]Note

    Do not start the Ambari server until you have configured the relevant templates as outlined in the following steps.

  2. Edit the <master-install-machine-for-Hive-Metastore>/etc/hive/conf.server/hive-site.xml configuration file to add the following properties:

    1. Provide the URI for the client to contact Metastore server. The following property can have a comma separated list when your cluster has multiple Hive Metastore servers.

      <property>
       <name> hive.metastore.uris </name>
       <value> thrift://$Hive_Metastore_Server_Host_Machine_FQDN </value>
       <description> URI for client to contact metastore server </description>
      </property>
    2. Configure Hive cluster delegation token storage class.

      <property>
       <name> hive.cluster.delegation.token.store.class </name>
       <value> org.apache.hadoop.hive.thrift.DBTokenStore </value>
       </property>

  3. Complete HDP installation.

    • Continue the Ambari installation process using the instructions provided here.

    • Complete the Ambari installation. Ensure that the installation was successful.

 2.2. Update the Hive Metastore

HDP components configured for HA must use a NameService rather than a NameNode. Use the following instructions to update the Hive Metastore to reference the NameService rather than a Name Node.

[Note]Note

Hadoop administrators also often use the following procedure to update the Hive metastore with the new URI for a node in a Hadoop cluster. For example, administrators sometimes rename an existing node as their cluster grows.

  1. Open a command prompt on the machine hosting the Hive metastore.

  2. Execute the following command to retrieve a list of URIs for the filesystem roots, including the location of the NameService:

    hive --service metatool -listFSRoot

  3. Execute the following command with the -dryRun option to test your configuration change before implementing it:

    hive --service metatool -updateLocation <nameservice-uri> <namenode-uri> -dryRun

  4. Execute the command again, this time without the -dryRun option:

    hive --service metatool -updateLocation <nameservice-uri> <namenode-uri> 

 2.3. Validate configuration

Test various fail over scenarios to validate your configuration.


loading table of contents...