Hadoop High Availability
Also available as:
PDF
loading table of contents...

Dynamic Service Discovery Through ZooKeeper

The HS2 instances register with ZooKeeper under a namespace. When a HiveServer2 instance comes up, it registers itself with ZooKeeper by adding a znode in ZooKeeper. The znode name has the format:

/<hiveserver2_namespace>/serverUri=<host:port>;version=<versionInfo>; sequence=<sequence_number>,

The znode stores the server host:port as its data.

The server instance sets a watch on the znode; when the znode is modified, that watch sends a notification to the server. This notification helps the server instance keep track of whether or not it is on the list of servers available for new client connections.

When a HiveServer2 instance is de-registered from ZooKeeper, it is removed from the list of servers available for new client connections. (Client sessions on the server are not affected.) When the last client session on a server is closed, the server is closed.

To de-register a single HiveServer2, enter hive --service hiveserver2 --deregister <package ID>

Query Execution Path Without ZooKeeper

As shown in the illustration below, query execution without ZooKeeper happens in the traditional client/server model used by most databases:

  1. The JDBC / ODBC driver is given a host:port to an existing HS2 instance.

    This establishes a session where multiple queries can be executed.

    For each query...

  2. Client submits a query to HS2 which in turn submits it for execution to Hadoop.

  3. The results of query are written to a temporary file.

  4. The client driver retrieves the records from HS2 which returns them from the temporary file.

Query Execution Path With ZooKeeper

Query execution with ZooKeeper takes advantage of dynamic discovery. Thus, the client driver needs to know how to use this capability, which is available in HDP 2.2 and later with the JDBC driver and ODBC driver 2.0.0.

Dynamic discovery is implemented by including an additional indirection step through ZooKeeper. As shown in the figure below...

  1. Multiple HiveServer2 instances are registered with ZooKeeper

  2. The client driver connects to the ZooKeeper ensemble:

    jdbc:hive2://<zookeeper_ensemble>;serviceDiscoveryMode=zooKeeper; zooKeeperNamespace=<hiveserver2_namespace

    In the figure below, <zookeeper_ensemble> is Host1:Port1, Host2:Port2, Host3:Port3; <hiveserver_namespace) is hiveServer2.

  3. ZooKeeper randomly returns <host>:<port> for one of the registered HiveServer2 instances.

  4. The client driver can not connect to the returned HiveServer instance and proceed as shown in the previous section (as if ZooKeeper was not present).